Patient data curation and enrichment solutions struggle when it comes to handling diverse data formats, and most fail when it comes to uncoded data, resulting in missed opportunities to code unstructured and free text. Given the increasing pressure to leverage analytically-ready data for operational efficiency, product enhancements and patient care use cases at payors, providers, health tech vendors and life sciences companies, we compiled 6 steps to guide the evaluation of healthcare data curation and enrichment tools:
- Setup time: Assess the time it will take to set up the tool to conduct the evaluation. Some tools take weeks or months while others, such as API-based technology, can be quickly integrated into your existing pipeline. Lengthy setup times are a red flag for outdated technology.
- Ingestion success rate: Assess how well the tool can ingest real-world data (aka ‘dirty’ or non-doctored data) ‘out of the box’. Most tools require manual massaging, template creation, scripting and hard coded fixes to ingest the slightest format variation. Data loss can occur here as well.
- Deduping: Review the deduping methodology to be sure that assumptions about what constitutes duplicate data are consistent with your requirements. Find out if it’s too aggressive or conservative.
- Data preservation and enrichment: Stress test enrichment capabilities and how well the tool identifies and codes poorly-coded and free text data. After data has been processed, check for missed enrichment opportunities. Ensure that no data is lost as data traverses the curation and enrichment pipeline. Many tools toss out data that deviates even slightly from specs.
- Manual manipulation: After the data has been processed, watch out for the manual manipulation of data by the vendor staff. Many software tools are incapable of conversion and enrichment without human intervention at numerous points in the pipeline.
- Outbound data format: Assess usability of outbound file formats and whether the format is proprietary with a steep learning curve or a contemporary standard format (e.g. FHIR). Consider if these formats fit into existing architecture and the potential costs of integration into platforms that need unified, enriched, deduplicated data.
Following these steps can help anyone integrating patient data between internal systems avoid manual effort for each source and format being ingested and get the most from existing data sources and architecture.
