An overview of the CareEvolution Identity API
Background
The CareEvolution Identity API facilitates patient matching for national scale leading health plans and provider networks. Using the Identity API, you can easily integrate world-class patient matching into your solution, enabling privacy-preserving patient matching across a spectrum of data sources and use cases.
- Fully automated real-time patient matching
- Modern, cloud native architecture
- Easy to integrate REST API
- 180M+
- 99.99%
Identity API
CareEvolution’s CareEvolution’s Identity API users
Identity management
Identity management, or patient record linkage, is an important and necessary step in enabling the exchange of clinical information between the healthcare information systems of disparate hospitals, payers, and clinics. This linkage must meet functional criteria for sensitivity and specificity, and must be implemented in a secure and privacy-preserving manner. To meet this need, CareEvolution has developed a patient identity management API within its Orchestrate Platform.
Built on top of industry-leading record linking techniques, CareEvolution’s Identity API provides a set of advanced features that improve security and link specificity. It has been designed from the ground up to meet the needs of payers, providers, and academic research institutions at regional or nationwide scale and addresses the unique functional, security, and privacy issues encountered at this scale.
Securing identity management – the privacy mandate
Master patient indexes – weak links in the interoperability security chain
The majority of healthcare data interoperability implementations utilize a master patient index (MPI) backed by a centralized store of accessible demographic information. This solution results in a significant security risk for any organization, however. Data aggregation required for this centralized store of information accentuates three critical risk factors that increase the potential that sensitive information will be improperly disclosed:
- Data aggregation increases the value of the centralized store, creating a lucrative target for potential attackers.
- It increases the number of entities that should have access to the central store; this in turn increases the number of avenues that can be compromised by attackers.
- A centralized store of sensitive data can become a valuable resource that may be susceptible to political pressure for access by legal entities claiming a need to know. A concerted effort by the government to obtain data from large cloud computing and social media providers is a compelling example of this third risk factor. [1]
Given these security risks, securing centralized demographic stores should be a high priority for any company employing a health interoperability implementation.
Blinded record linkage – the solution
The CareEvolution Identity API provides a robust solution for securing the demographic information that is essential for record linkage. The platform achieves a secure, performant solution to record linkage in the distributed system by using a blinded directory for centralized demographic data. A set of techniques are implemented to cryptographically hash (i.e., one-way) the aggregated data to ensure that patient demographic data stored in the centralized index is unrecoverable.
The CareEvolution Identity API uses only FIPS compliant hash functions (HMAC/SHA256) with a unique client-specific key to ensure privacy and guard against brute force or dictionary attacks.
There are two direct results of hashing the centralized index:
- World-class security: From any plaintext string (e.g., “Smith”), a one-way hashing algorithm can quickly produce a long sequence of numbers (a “hash”), which represents the string “Smith”. However, to take this hash and reverse the algorithm to arrive at “Smith” would require years of computation, hence the term “one-way” hash.
- Record linking challenge: Since hashes of similar strings, such as “Smith” and “Smit”, yield dramatically different number sequences, the very process of hashing renders traditional approximate record linking techniques impossible. As a result, most contemporary providers of MPI or identity management solutions have avoided the formidable technical challenges posed by a crypto-hashed central directory. While this may have been acceptable when such solutions were intended to be implemented behind the security firewalls within an institution, extending a non-hashed centralized repository of demographic information across a region, let alone the country, poses an unprecedented and unwarranted privacy risk.
Beyond security
In addition to the critical security component, there are several other important requirements that a production master patient index implementation, like CareEvolution’s Identity API, demands:
- No mistakes: There must be a near-zero false positive link rate. An unacceptable failure for a record linking system is the incorrect linkage of patients, resulting in moving or displaying data on the wrong patient record. The CareEvolution record linking algorithms are tuned to near 100% specificity to prevent these false positives.
- Automated: The platform should leverage record linking techniques that can perform the vast majority of the linking activity hands-off. Where manual review is appropriate, CareEvolution provides functionality allowing administrators to quickly review and accept links.
- Real-time: Record linkage on a nightly or weekly cadence is impractical for low-latency applications and workflows. The system should perform in near real-time so that clinically relevant information from remote institutions can be incorporated into existing workflows without additional latency.
CareEvolution record linking fundamentals
Some of the basic components in the design of the CareEvolution record linking system include data standardization, and deterministic and probabilistic linking strategies.
Standardization
Demographic information is “cleansed” so that comparing this information will yield meaningful results. Casing, white space, special characters, nicknames, and fake or invalid values must be handled uniformly for each submitted record.
Linking strategies
After the record has been standardized and transformed, record pairs are compared to determine their similarity. There are a range of established techniques that assist in this effort. Two primary categories include deterministic linking and probabilistic linking.
- Deterministic record linking employs a series of simple rules (e.g., the SSN, DOB, and last name match exactly) to determine linkage. While deterministic rules can be designed for high specificity, they still leave room for improvement. For example, records with exactly matching SSN, DOB, and last name would satisfy a deterministic record linking strategy. However, suppose the first name and gender varied greatly. These differences cast doubt on the validity of the link; it could be that one record has a mistyped SSN, or perhaps a husband and wife were both entered in the system with the same SSN for insurance reasons.
- Probabilistic record linking, on the other hand, associates statistical importance for agreement and disagreement to each identifier. Two records link if the sums of all the weighted agreements outweigh the disagreements.
CareEvolution’s Identity record linking uses multiple linking strategies that all help determine a record pair’s final link status.
CareEvolution’s advanced record linking techniques
CareEvolution’s Identity API record linking system builds on the solid foundation of mainstream record linking systems with advanced features, including privacy preserving record linkage (PPRL), blindfolded approximate matching, and support for advanced human review.
Privacy preserving record linkage
In traditional master patient index (MPI) models, plaintext demographic information is centrally located to facilitate record linking. To preserve privacy, CareEvolution implements a blindfolded record linking system that cryptographically hashes record identifiers, obfuscating the information in such a way that comparisons can still be made but the original clear-text is irrecoverable. This provides the best of both worlds in that data can be freely shared for the purpose of record linking, but that same data can not be read due to the nature of this one-way hash.
Blindfolded approximate matching
Because the hashes of similar identifiers bear no correlation with each other, preprocessing of the unencrypted identifiers must be done to allow for approximate matches with identifier hashes. Standardized demographic information is transformed before blinding to allow for approximate string matching. Approximate matching in this scheme is accomplished using a technique called bigramming. [2] Bigramming breaks up the source string into many derived strings. Each derived string is given a similarity score that indicates how similar it is to the source. Two strings that have been bigrammed can then be compared by determining if they share a derived string. If so, the two derived similarity scores can be used to compute an overall “dice score.” Using a bigramming technique to generate derived strings and then hashing these strings enables approximate, blinded identifier matching.
Human review
Even with advanced record linking techniques, the ultra-high specificity required by the platform means that some actual links will be left in an unmatched state. The Identity API provides a mechanism for systems to specify links that have been confirmed through manual review on the source system, thus allowing the flow of clinical data between records. This rich interaction between the Identity API and system users enables the CareEvolution Orchestrate Platform to achieve high specificity without sacrificing sensitivity.
Summary
The need for very high specificity as well as appropriately high sensitivity in patient record linkage is a challenge in the health care community today, especially considering the privacy risk posed by centralized demographic information in data interoperability solutions. By leveraging state-of-the-art record linking techniques, the Identity API is able to address these issues. It provides secure linking with a near-zero false positive rate, while maintaining a high rate of sensitivity.