An overview of the CareEvolution Interoperability Platform’s Blinded Master Patient Index solution
The CareEvolution Blinded Master Patient Index facilitates patient matching for national scale leading health plans and provider networks. Using the BMPI API, you can easily integrate world-class patient matching into your solution, enabling privacy-preserving patient matching across a spectrum of data sources and use cases.
- Fully automated real-time patient matching
- Modern, cloud native architecture
- Easy to integrate REST API
CareEvolution’s Blinded Master Patient Index users
Identity management, or patient record linkage, is an important and necessary step in enabling the exchange of clinical information between the healthcare information systems of disparate hospitals, payers, and clinics. This linkage must meet functional criteria for sensitivity and specificity, and must be implemented in a secure and privacy-preserving manner. To meet this need, CareEvolution has developed a Blinded Master Patient Index (BMPI) within its Interoperability Platform.
Built on top of industry-leading record linking techniques, CareEvolution’s BMPI provides a set of advanced features that improve security and link specificity. It has been designed from the ground up to meet the needs of payers, providers, and academic research institutions at regional or nationwide scale and addresses the unique functional, security, and privacy issues encountered at this scale.
Securing identity management – the privacy mandate
Master patient index – a weak link in the interoperability security chain
The majority of healthcare data interoperability implementations utilize an MPI backed by a centralized store of accessible demographic information. This solution results in a significant security risk for any organization, however. Data aggregation required for this centralized store of information accentuates three critical risk factors that increase the potential that sensitive information will be improperly disclosed:
- Data aggregation increases the value of the centralized store, creating a lucrative target for potential attackers.
- It increases the number of entities that should have access to the central store; this in turn increases the number of avenues that can be compromised by attackers.
- A centralized store of sensitive data can become a valuable resource that may be susceptible to political pressure for legalized access by interests claiming a need to know. A concerted effort by the government to obtain data from large Internet search engines is a compelling example of this third risk factor. 
Given these security risks, securing centralized demographic stores should be a high priority for any company employing a health interoperability implementation.
Blinded record linkage – the solution
The CareEvolution BMPI provides a robust solution for securing the demographic information that is essential for record linkage. The platform achieves a secure, performant solution to record linkage in the distributed system by using a blinded directory for centralized demographic data. A set of techniques are implemented to cryptographically hash (i.e., one-way) the aggregated data to ensure that patient demographic data stored in the centralized index is unrecoverable.
The CareEvolution BMPI uses only FIPS compliant hash functions (HMAC/SHA256) with a unique client-specific key to ensure privacy and guard against brute force or dictionary attacks.
There are two direct results of hashing the centralized index:
- World-class security: From any plaintext string (e.g., “Smith”), a one-way hashing algorithm can quickly produce a long sequence of numbers (a “hash”), which represents the string “Smith”. However, to take this hash and reverse the algorithm to arrive at “Smith” would require years of computation, hence the term “one-way” hash.
- Record linking challenge: Since hashes of similar strings, such as “Smith” and “Smit”, yield dramatically different number sequences, the very process of hashing renders traditional approximate record linking techniques impossible. As a result, most contemporary providers of MPI or identity management solutions have avoided the formidable technical challenges posed by a crypto-hashed central directory. While this may have been acceptable when such solutions were intended to be implemented behind the security firewalls within an institution, extending a non-hashed centralized repository of demographic information across a region, let alone the country, poses an unprecedented and unwarranted privacy risk.
In addition to the critical security component, there are several other important requirements that a production MPI implementation, like CareEvolution’s BMPI, demands:
- No mistakes: There must be a near-zero false positive link rate. An unacceptable failure for a record linking system is the incorrect linkage of patients, resulting in moving or displaying data on the wrong patient record. The CareEvolution record linking algorithms are tuned to near 100% specificity to prevent these false positives.
- Automated: The platform should leverage record linking techniques that can perform the vast majority of the linking activity hands-off. Where manual review is appropriate, CareEvolution provides functionality allowing administrators to quickly review and accept links.
- Real-time: Record linkage on a nightly or weekly cadence is impractical for low-latency applications and workflows. The system should perform in near real-time so that clinically relevant information from remote institutions can be incorporated into existing workflows without additional latency.
CareEvolution record linking fundamentals
Some of the basic components in the design of the CareEvolution record linking system include data standardization, and deterministic and probabilistic linking strategies.
Demographic information is “cleansed” so that comparing this information will yield meaningful results. Casing, white space, special characters, nicknames, and fake or invalid values must be handled uniformly for each submitted record.
After the record has been standardized and transformed, record pairs are compared to determine their similarity. There are a range of established techniques that assist in this effort. Two primary categories include deterministic linking and probabilistic linking.
- Deterministic record linking employs a series of simple rules (e.g., the SSN, DOB, and last name match exactly) to determine linkage. While deterministic rules can be designed for high specificity, they still leave room for improvement. For example, records with exactly matching SSN, DOB, and last name would satisfy a deterministic record linking strategy. However, suppose the first name and gender varied greatly. These differences cast doubt on the validity of the link; it could be that one record has a mistyped SSN, or perhaps a husband and wife were both entered in the system with the same SSN for insurance reasons.
- Probabilistic record linking, on the other hand, associates statistical importance for agreement and disagreement to each identifier. Two records link if the sums of all the weighted agreements outweigh the disagreements.
The CareEvolution record linking uses multiple linking strategies that all help determine a record pair’s final link status.
CareEvolution advanced record linking techniques
CareEvolution’s BMPI record linking system builds on the solid foundation of mainstream record linking systems with advanced features, including privacy preserving record linkage, blindfolded approximate matching, and support for advanced human review.
Privacy preserving record linkage
In traditional MPI models, plaintext demographic information is centrally located to facilitate record linking. To preserve privacy, CareEvolution implements a blindfolded record linking system that cryptographically hashes record identifiers, obfuscating the information in such a way that comparisons can still be made but the original clear-text is irrecoverable. This provides the best of both worlds in that data can be freely shared for the purpose of record linking, but that same data can not be read due to the nature of this one-way hash.
Blindfolded approximate matching
Because the hashes of similar identifiers bear no correlation with each other, preprocessing of the unencrypted identifiers must be done to allow for approximate matches with identifier hashes. Standardized demographic information is transformed before blinding to allow for approximate string matching. Approximate matching in this scheme is accomplished using a technique called bigramming.  Bigramming breaks up the source string into many derived strings. Each derived string is given a similarity score that indicates how similar it is to the source. Two strings that have been bigrammed can then be compared by determining if they share a derived string. If so, the two derived similarity scores can be used to compute an overall “dice score.” Using a bigramming technique to generate derived strings and then hashing these strings enables approximate, blinded identifier matching.
Automated record linking requires a tradeoff between sensitivity and specificity. Even with advanced record linking techniques, the ultra-high specificity required by the platform means that some actual links will be left in a possible state. Therefore, CareEvolution has implemented functionality that allows administrators to weigh in, upgrading possible links to definite, and thus allowing the flow of clinical data between records. This rich interaction between the BMPI and system users enables the CareEvolution Interoperability Platform to achieve high specificity without sacrificing sensitivity.
The need for very high specificity as well as appropriately high sensitivity in patient record linkage is a challenge in the health care community today, especially considering the privacy risk posed by centralized demographic information in data interoperability solutions. By leveraging state-of-the-art record linking techniques, the CareEvolution Blinded Master Patient Index is able to address these issues. It provides secure linking with a near-zero false positive rate, while allowing human review to help find all possible links.