EHR data in clinical research (part 1): more data, less effort

What if researchers could access high fidelity data faster, without putting the burden on participants for data capture?

Traditional data collection for research is expensive, time-consuming, and often places the burden on participants to travel to a clinic or respond to a questionnaire. Electronic health records (EHR) are a largely untapped and robust data resource with high potential value for clinical research. EHR data can provide a broad and deep, historical and ongoing stream of data without burdening participants or researchers with collection.

This post is the first in a series that will cover what value EHR data can provide, what data is available and how, and how to overcome challenges in extracting insights from EHR data.

Access high quality data

EHR data is collected in a variety of health care settings through encounters and data sharing across health networks. EHRs capture numerous data types including:

Diagnoses, signs, and symptoms

Labs, measurements, and reports

Health service utilization and procedures

Medications, including both prescriptions and pharmacy fills, and immunizations

Allergies and adverse events

Family, social, and behavioral history

Traditional research—in which data collection is clinic-based, often using participant surveys to gather information—must always balance the depth and scope of data with the cost and burden of collection. Furthermore, by definition, data from these methods are not collected in the normal course of a patient’s encounters with the health care system, and thereby may introduce gaps, bias, or artifacts. In contrast, EHRs can provide in-depth, real-world observational data without additional burden on researchers or participants for collection. Having access to these data for individual patients within a particular clinical context opens up a wealth of information for clinical research.

EHR systems can vary widely in how they capture and store data. However, the basic data points tend to be consistent across care settings and encounter types, and the transmission of data across systems is becoming increasingly standardized. A large portion of relevant EHR data is now consistently captured in structured fields which can be leveraged for research analytics.

Furthermore, recent advances in technical standards and regulations are enabling study participants to provide access to this high quality data to researchers. A later post in this series will cover these recent advances in data accessibility and how to make use of them.

Screenshot of the Rewards tab in the PROGRESS mobile app In action:
The PRediction Of Glycemic RESponse Study (PROGRESS) study, sponsored by the Scripps Research Digital Trials Center and Tempus Labs, has enrolled over 400 participants who have connected at least one EHR to the MyDataHelps™ platform. The study seeks to understand the relationship between nutrition, activity, genomic data, microbiome, and other biomarkers with glucose levels. To this end, researchers are accessing EHR data to supplement questionnaires and a variety of device data—all collected remotely by participants without any clinic or site visits. EHR data will provide, for example:

notation of conditions such as anemia which can impact interpretation of lab findings,
prescriptions and pharmacy fills for insulin to support analysis of diabetic control and medication adherence, and
laboratory data such as HbA1c to compare to at-home glycemic measurements.

Read the PROGRESS case study

Engage participants

Participant engagement is a critical factor that frequently limits study duration and completeness of datasets. In addition, engagement may impact the effectiveness of an intervention under study. EHR data collection can support patient engagement in at least two ways. First, EHR data can be provided back to participants in a curated and intuitive layout, offering direct value for participants. Receiving value from a study can make a participant more likely to continue with the study and complete longitudinal tasks.

Second, when a study includes behavioral interventions, EHR data can be used to provide information to participants or to adjust timing and wording of nudges. In this case, EHR data, placed in appropriate context, can become part of the intervention. A cardiology study, for example, may compare blood pressure readings captured from a home-based device to in-office values retrieved from an EHR and nudge the participant to consider discussing how ‘white-coat syndrome’ or other context might be contributing to differences.

AllofUs Screen In action:
The All of Us Research Program, sponsored by the National Institutes of Health, has recruited a large number of participants from across the country, over 40,000 of whom have consented to provide EHR data and have connected to at least one EHR system in addition to answering survey questions, providing samples, and wearing biometric devices.

A participant app developed by CareEvolution aggregates, cleans, and de-identifies EHR data for researchers, while also providing participants with a health dashboard summarizing data collected from EHRs and devices. This feature provides value directly back to participants and can serve as motivation to continue to engage with the research program.

Read the All of Us case study

Reduce participant burden and validate participant-provided data

Data collection is often the most expensive component of clinical research and lack of adherence to data collection protocols is one of the biggest causes of participant drop-off. EHR data can provide a way to access data without burdening participants, supplementing questionnaires and device- or site-collected data. Once consent is provided and connections are established, historical and ongoing data can be retrieved automatically.

There are limitations to the number of survey questions that can reasonably be asked of participants. Survey completion is inversely related to survey length and frequency, and historical data is often lost to recall bias. Utilizing EHR data can increase capture of both the number and frequency of data points, increase the depth of data available by providing historical data, and facilitate the monitoring of longitudinal health outcomes—all without increasing participant burden.

In addition to supplementing participant-provided data, EHRs can also complement these data. By collecting the same data points through both mechanisms, EHR data can be used to corroborate data collected from participant questionnaires, enabling a study to compare and validate findings. EHRs can also provide “real-world” data obtained in the context of routine patient care, whereas data collected specifically in trial settings may produce findings which are less generalizable.

Phone showing Detect screen In action:
The Digital Engagement & Tracking for Early Control, & Treatment (DETECT) study, sponsored by the Scripps Research Digital Trials Center, has enrolled over 4000 participants who have connected at least one EHR to the MyDataHelps™ platform. The study seeks to understand the relationship between a variety of biometrics and COVID-19 infection, in order to improve early detection of infection and illness. Researchers are using EHR data to access COVID-19 lab results and vaccination status.

Data collection is still underway, but in preliminary analysis there are differences between participant-reported COVID-19 diagnoses and lab results and those obtained from EHR data. These differences may be due to timing, incomplete data, or other factors, and very well may become insignificant as the size of the dataset increases. If they persist they may provide additional insights into the reliability of and relationships between participant-collected and EHR data.

Read about the DETECT study

Ultimately, EHR data can be a valuable supplement or complement to participant- and site-collected data for research. Post #2 in this series will address availability of EHR data as well as ways to maximize data access and diverse representation. Post #3 will discuss ways to address the data quality and interpretation challenges of putting it to effective use in research.