EHR data in clinical research (part 2): more data, less effort

In EHR data in clinical research (part 1), our first post in this series, we provided an overview of the value researchers can gain from the use of electronic health record (EHR) data in clinical studies. Researchers have long dreamed of a way to assess health interventions and outcomes that is easier than expensive, in-clinic randomized controlled trials. However, availability of datasets from EHRs to support real-world observational research has been limited—until now.

Availability of EHR data for research

The past 30 years have seen a transformation in the way clinical data is captured at the point of care. As late as 2008, EHR—also referred to as electronic medical record (EMR)—adoption by US hospitals was below 10%; today, it is over 96%. Further, office-based physicians’ adoption of EMRs is at least 80%. This major shift was prompted by federal legislation such as the HITECH Act, which established the EHR Incentive Program (known as Meaningful Use and later renamed to Promoting Interoperability). Adoption of EHRs means that a great deal of clinical care is now captured digitally in two major forms:

Structured data: encounters, diagnoses, procedures, medications, vital signs, and laboratory results
Free text: signs and symptoms, physical exams, pathology and radiology reports, and clinical assessments and plans

Despite the wealth of data being captured at the point of care, exchange of data across health care organizations, patients’ access to their own data, and availability of data to researchers has been elusive. Many systems lock away data due to lack of standards (or variability in implementation of “standards”), business and competitive considerations (locking in providers to walled proprietary systems), fear of HIPAA (sometimes over-interpretation of the actual regulation), or simply the lack of funding or ability to prioritize.

This data lockdown is rapidly unraveling with the widespread adoption of Consolidated Clinical Document Architecture (C-CDA)- and Fast Healthcare Interoperability Resources (FHIR)-based standards and the rules to prohibit information blocking put in place by the 2016 CURES Act. In particular, the Promoting Interoperability program’s patient access requirements and availability of patient-mediated exchange is changing the paradigm for both access and consent. In addition, under the Interoperability and Patient Access rule, CMS-regulated payors are now required to provide patients access to their own claims and clinical data in FHIR format. Furthermore, the next evolution of health information exchanges (HIE), called Qualified Health Information Networks (QHINs), and rules for establishing trust through the Trusted Exchange Framework and Common Agreement (TECFA) promise to finally create ‘data liquidity’ in healthcare, offering new opportunities for access to observational health data for research.

Compared to prior efforts, FHIR is more prescriptive and less abstract, but with specific mechanisms for extension and customization that enables a wider set of data to be represented.

Getting FHIR’ed up
Until recently, variably adopted or implemented standards has been a major impediment to access and aggregation of health data. A proliferation of standards and acronyms from multiple versions of HL7 messages to CCD and CDA, plus X12 for claims data, left the exchange landscape fragmented. Successful integrations around integrated delivery networks, state-based efforts, proprietary sharing networks, and functionally-focused data flows, such as those for lab results or prior authorization, were siloed from each other. Patient portals also have been widely adopted, but are not typically designed to enable movement of data from one provider to another.

Over the past few years, the C-CDA and FHIR standards have made significant inroads. C-CDA is a standard for sharing patient records between systems. It is flexible and primarily suited for individual display use cases since its flexibility also means that the data is not structured enough for use in applications or analytics. FHIR is a standard that prescribes how data is structured to enable easier movement between different systems, as well as aggregation of data across multiple systems. Compared to prior efforts, FHIR is more prescriptive and less abstract, but with specific mechanisms for extension and customization that enables a wider set of data to be represented.

Both C-CDA and FHIR now have the weight of federal rules behind their adoption of patients’ access to their own data in any platform or application of their choosing without undue effort or expense. In fact, rules mandate that providers and payors offer patients/members access to their data, with significant penalties for non-compliance. The ability to convert between C-CDA and FHIR formats is also now available. The emerging consensus around FHIR going forward and the new rules requiring access has opened the door to patient-mediated exchange, and has set the stage for new networks of organization-to-organization exchange and tools for aggregating and enhancing health data for a wide variety of applications, including research.

Sourcing EHR data from providers and HIEs for research
Conventional methods for incorporating EHR data in research requires expensive contracting with payers and providers. These organization-to-organization connections function well if all of a study’s participants get the majority of their healthcare within a single health system. However, participants are more likely to receive care from multiple health systems or across providers connected to different HIEs. For approximately 1200 health systems in the US today, there are currently over 15,000 active FHIR endpoints. There are also nearly 90 HIEs, many with overlapping geographies and participants. It is impossible to know, let alone connect with and access data from, all providers where study participants receive care or testing. In addition, connections with each provider currently require a separate HIPAA agreement and, subsequently, separate HIPAA consent from each participant. Even once obtained, data from multiple providers or HIEs will still likely vary greatly in format, requiring significant, specialized resources to prepare and aggregate the data before it can be used in analysis.

For certain localized studies, organization-to-organization connections could provide enough coverage to yield new health insights, but for most it is impractical. There are several organizations that provide access across multiple providers and exchanges and companies that aggregate access. Furthermore, in the next several years, QHINs should be coming online to provide more comprehensive coverage. QHINs will include both a record locator service and access across many networks and providers such that, in theory, a single query will be able to retrieve most available data for a patient. However, in the short-term a research use case is not on the agenda for QHINs.

While there are limitations to organization-to-organization health information exchange, many research organizations are already making significant use of the data that is accessible. The access mechanisms currently available (e.g., C-CDA) and emerging (e.g., FHIR and SMART) will form the basis for the QHINs, so current investments will not be wasted.

All FHIR endpoints which enable patient-mediated access mechanisms also enable participants to provide access to EHR data for research, based on the fact that once a patient has accessed their data they are in control of it.

Participant-mediated EHR sharing for research
There is an additional pathway relevant to access of EHR data for research, which will only be enhanced further as QHINs come online: patient-mediated exchange. With the C-CDA and FHIR standards and information-blocking rules, individual patients can now access data from providers and payers and are in control of access of that data by others, including researchers. A study participant, for example, can connect an app to each of their healthcare providers, retrieve their data, and consent to provide that data in some form to researchers. Note that in this way, researchers do not need to identify or negotiate with specific providers for access, and once the data is in a patient’s control, HIPAA organizational business associate agreements (BAAs) are not required. Instead, participant consent for data access for research is reviewed by Institutional Review Boards, and control for data access can be enabled by the patient directly as another task within the study protocol.

All FHIR endpoints which enable patient-mediated access mechanisms also enable participants to provide access to EHR data for research, based on the fact that once a patient has accessed their data they are in control of it. As the QHINs come online, patients should be able to make a single connection to access their data across all available endpoints. Access to this data for research will still be IRB controlled, but because patients are accessing their data directly under their HIPAA Right of Access and choosing to share it with a research study directly under their control, individual agreements are not required with each data provider. This is analogous to a participant requesting their paper medical record from their doctor and then sharing that document with a researcher; in both cases the participant/patient is in control of their own data, only now in electronic form.

There are at least two additional types of data sources which researchers can harness through participant-mediated sharing: claims and device data. The CMS Interoperability and Patient Access rule applies to patient access to claims data from government-sponsored health plans. This rule, plus the Blue Button 2.0 program for access to Medicare claims, mean that many study participants can also access their claims data and share with researchers. Claims data complements clinical data from EHRs by providing significant breadth to data, as most interactions with the healthcare system result in a claim. Meanwhile, data from wearables or other patient devices such as Fitbit, Apple Watch, glucometers, spirometers, and many others, can also be shared with researchers and aggregated with clinical and claims data to provide significantly more data depth. Since these data flows follow the same access strategy as patient-mediated EHR data, together they provide a compelling approach to access broad and deep data.

In action: real-world experience with real-world data

Real-world observational data, particularly through study participant-mediated access, is now increasingly available as a valuable resource for clinical research. CareEvolution has been at the forefront of health information exchange for over two decades and is now working with research partners to leverage the wealth of data available from EMRs for clinical research. Recent work with Scripps Research Digital Trials Center is highlighted in a Viewpoint published in the latest JMIR Medical Informatics and touched on here.

The Digital Engagement & Tracking for Early Control & Treatment (DETECT) digital research platform leverages patient-mediated EHR access for a real-world clinical trial to study COVID-19 testing, interventions, and outcomes. Even without a requirement for EMR connection, approximately 10% of participants have already connected at least one EMR data source for access by investigators.

The DETECT-AHEAD sub-study, which does require participants to connect at least one EMR data source, has provided some initial findings which highlight both the potential and the challenge of participant-mediated EMR data access for research. A key opportunity for real-world research is to expand representation in clinical trials. DETECT-AHEAD has shown some enhanced ability to recruit a diverse participant cohort, with a 0.68 male-to-female ratio and 30% racial minorities. However, while the digital divide has lessened, with >77% of the total US population with smartphone, great disparities still exist for access. More than 28% of the US rural population lacks broadband access; only 42% of people over >65 years old use a smartphone. Preliminary results from DETECT-AHEAD also show room for improvement, with significantly lower recruitment for seniors, those with highest education below grade 12, and those with annual household income < $10,000.

Even with EMR data access and diverse recruitment, there are still significant challenges in realizing value from EMR data for research. Privacy and security concerns require strong processes and protections. Participants should have the ability to identify and potentially edit errors. The quality of EMR data is highly variable and use in analytics requires a great deal of pre-processing before it is ready to support analysis, including integrating across disparate data sources, dealing with missing data, and standardizing coding. CareEvolution has long supported providers and payors in dealing with such issues to support clinical workflows and is now enabling our research partners to benefit from this experience and our comprehensive suite of tools.

Furthermore, not all EHR data is relevant to a particular research question. To ensure sufficient coverage of relevant data points, researchers should have a sense of data availability, perhaps by conducting initial pilot studies, and set reasonable expectations. Targeting specific data points could potentially improve data availability, for example, asking participants to specifically connect to particular types of providers (e.g., any provider where you have gotten an HbA1c lab, related to your cardiac condition).

Now is the time to act

As we look ahead to the future of health information exchange for research, there are several important points to consider:

While the emerging solutions may not be perfect, they will continue to improve on the scope and coverage of data available for research today.
Access mechanisms available today, including C-CDA and FHIR used in HIE and patient-mediated exchange, will form the basis for the future QHINs or other networks, so investments today will not be wasted.
There is a great deal of EHR data available already today, so researchers should learn to incorporate these data in their data pipelines and analytics now.

The next post in this EHR data in clinical research series outlines the challenges of using EHR data for research and ways to overcome them.