Strategic Health IT Advanced Research Projects (SHARP) Research Focus Area: 4. Secondary Use of EHR Data Increasing efficiency of patient care through electronic healthcare records
Mission To enable the use of EHR data for secondary purposes, such as clinical research and public health. Leveraging health informatics to: generate new knowledge improve care address population needs SHARPn is committed to open-source resources that can industrially scale to address barriers to the broad-based, facile, and ethical use of EHR data for secondary purposes SHARPn will collaborate to create, evaluate, and refine informatics artifacts that advance the capacity to efficiently leverage EHR data to improve care, generate new knowledge, and address population needs. Secondary use of EHR data is using data collected in clinical care to learn about health and disease, improve care delivery, and help clinicians provide better care to individuals and patient populations. Electronic data capture means we can use informatics tools to generate new knowledge about health and healthcare.
Mission To support the community of EHR data consumers by developing: open-source tools services scalable software SHARPn will make these artifacts available to the community of secondary EHR data users as open-source tools, services, and scalable software. The SHARPn Consortium of academic, non-profit and private sector researchers and engineers has been funded by the Office of the National Coordinator for HIT to research & develop products to help the healthcare community access the wealth of data being collected in the course of care in a responsible fashion.
Goals Research that will enable aggregation of different types healthcare data collected from a variety of sources. Services to identify diseases, risk factors, eligibility for clinical studies, or adverse events in clinical and population-based settings. Tools to understand and improve data quality. Research that will generate a framework of open-source services that can be dynamically configured to transform EHR data into standards-conforming, comparable information suitable for large-scale analyses, inferencing, and integration of disparate health data Services for phenotype recognition (disease, risk factor, eligibility, or adverse event) in medical centers and population-based settings. Analysis of data quality and repair strategies. The goals of SHARPn are to research how to use healthcare data from multiple electronic sources, such as clinical notes and labs; develop services to allow clinicians and researchers to identify patient cohorts within their communities; and tools to judge data quality, to determine the strength of conclusions drawn.
Aggregate Outcomes Data Patient-Specific Health Information Research Results Aggregate Outcomes Data Patient-Specific Health Information Overall, the software, or “middleware”, developed by SHARPn will allow researchers and clinicians to pull together very different types of healthcare data and ask questions about disease, prevention, outcomes and care delivery. Answering these questions requires a robust, high-quality dataset. This type of dataset is what the SHARPn middleware will help generate. Knowledge and Insights
Selected SHARPn Projects Project 1 – Clinical Data Normalization Services and Pipelines Project 2 – Natural Language Processing Project 3 – High Throughput Phenotyping Project 4 – Scaling Capacity
Clinical Data Normalization Project 1 – Clinical Data Normalization Clinical data comes in all different forms even for the same piece of information. For example, age could be reported as 40 years for an adult, 18 months for a toddler or 3 days for an infant. Un-normalized Normalized (days) Normalized (months) 40 years 1436 47 18 months 543 18 3 days 3 0.1 Without normalization, data can’t be used as a single a dataset. Data normalization is at the heart of secondary use of clinical data. If the data is not comparable between sources, it can’t be aggregated into large datasets and used reliably to answer research questions or survey populations from multiple health organizations.
Regenstrief and the Health Open Source Software Pipeline Project 1 – Clinical Data Normalization The HOSS Pipeline can receive clinical data from multiple formats and transform it into a common model for secondary use. SHARPn is partnering with Regenstrief to implement the HOSS Pipeline on a UIMA platform. The Health Open Source Software Pipeline (HOSS) is a collaborative between Regenstrief, Misys and Mirth Corp. to develop tools for HIE. Regenstrief has determined which data elements are most important to normalize to make data useful, and how best to normalize them.
Why Natural Language Processing? Project 2 – Natural Language Processing …because a lot of clinical data is captured in free-text notes. Extracting structured information facilitates… Searching Comparing Summarizing …to enable research, improve standards of care and evaluate outcomes easily. To perform research, to improve standards of care and to evaluate treatment outcomes easily — and ideally, in an automated fashion — access to the content of these documents is required. The knowledge contained in unstructured textual documents (e.g., pathology reports, clinical notes), is critical to achieving all of these goals. For instance, clinical research usually requires the identification of cohorts that follow precisely defined patient- and disease-related inclusion and exclusion parameters. Natural Language Processing (NLP) systems can extract structured information from these notes that allows the information contained there to be searched, for example for a diagnosis, compared, perhaps to find common co-morbidities with a certain diagnosis, and summarized.
Clinical Text Analysis and Knowledge Extraction System Project 2 – Natural Language Processing cTAKES is open-source software for natural language processing. One application is a medication annotator that can extract medication information from free-text, such as: Frequency Dosage Strength Form Route Duration Drug change status https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/OHNLP_Documentation_and_Downloads One application of cTAKES is a medication annotator. By processing a clinical note, the annotator can extract what medication is being taken, when, how, for how long, and how often. It can also determine if the medication is no longer being taken.
High-Throughput Phenotyping Project 3 – High Throughput Phenotyping Phenotyping is identifying a set of characteristics of about a patient, such as: A diagnosis Demographics A set of lab results A well-defined phenotype will produce a group of patients who might be eligible for a clinical study or a program to support high-risk patients. A phenotype is an observable characteristic, such as brown hair, height, blood type or cholesterol level. Identifying a cohort of patients with a similar phenotype is the first step in asking a question about what their health outcomes were or targeting a specific therapy.
High-Throughput Phenotyping Project 3 – High Throughput Phenotyping The SHARPn HTP project will allow clinicians and investigators to identify patients from their EHR data. The project is developing: Phenotyping processes Algorithms for specific diseases Tools to incorporate data from multiple sites Provides EHR derived phenotyping processes, algorithms and technical tools (widgets) for preparation of transportability evaluation across multiple sites. The HT in HTP is high throughput. To phenotype patients in the past meant a laborious paper chart review. Now, using the tools developed by SHARPn and its collaborators, the majority of this task can be completed using electronic data much more efficiently.
eMERGE Phenotype Library Project 3 – High Throughput Phenotyping With the eMERGE Consortium, algorithms have been developed to identify the following phenotypes from EHR data: Alzheimers Peripheral Aterial Disease Dementia QRS duration Diabetic Retinopathy Red Blood Cell Indices Height Resistant Hypertension Hypothyroidism Type II Diabetes Phenotype Lipids Type II Diabetes Pseudocode Low HDL White Blood Cell Indices eMERGE focuses on using EHR data to phenotype individuals for genome-wide association studies. The same phenotyping algorithms can be used for a variety of purposes. SHARPn and eMERGE have collaborated to expand the suite of phenotypes that have algorithms for phenotyping.
The answer is… “UIMA” Right again, WATSON! I’ll take What do Project 4 – Scaling Capacity I’ll take “Informatics Acronyms” for $1000, Alex. What do SHARPn and Jeopardy! have in common? The answer is… “UIMA” (AP Photo/Seth Wenig) Right again, WATSON!
UIMA Unstructured Information Management Architecture Project 4 – Scaling Capacity UIMA Unstructured Information Management Architecture UIMA lets Watson play Jeopardy and make sense of EHR data.
SHARPn Leverages "Deep Question Answering" of UIMA Project 4 – Scaling Capacity SHARPn Leverages "Deep Question Answering" of UIMA “Deep Question Answering” technology is the ability of a computer to understand natural human speech. Watson expects the science to help extend the power of advanced analytics to make sense of vast quantities of structured and unstructured data.
Project 4 – Scaling Capacity UIMA in Healthcare Deep QA technology could provide information to clinicians to help diagnose and treat patients. UIMA software is available through Apache, a robust, open license. UIMA annotators are available for healthcare purposes, such as cTAKES. The UIMA software is available through Apache licensing. There are annotators available already for UIMA, such as cTAKES, the NLP tool previously described. SHARPn encourages the HIT community to use UIMA through its Apache license to develop and deliver secondary use functionality in HIT products.
SHARPn in a real-world healthcare setting The SE Minnesota Beacon needs to identify high-risk diabetes patients in its population. The Beacon Community consists of multiple sites, with multiple EHR systems How will it look at data across the entire population? The Beacon Communities are looking for ways to use their HIT infrastructure to provide better, more efficient care. The Southeastern Minnesota Beacon will be using the SHARPn middleware to identify high-risk diabetes patients in its population to target resources where they will have the greatest impact.
SHARPn Pilot at the SEMN Beacon The SHARPn pilot will utilize… Natural language processing High-Throughput Phenotyping using a diabetes algorithm, and Deep Question Answering on the UIMA platform …to identify high-risk patients in the Beacon population and effectively target resources.
The SHARPn Community Agilex Technologies, Inc. Centerphase Solutions, Inc. Clinical Data Interchange Standards Consortium (CDISC) Deloitte Group Health Research Institute Harvard Children’s Hospital Boston IBM T.J. Watson Research Center Intermountain Healthcare Mayo Clinic Massachusetts Institute of Technology Minnesota Health Information Exchange (MN HIE) University at Albany - SUNY University of Colorado University of Pittsburgh University of Utah
To watch the Jeopardy! Challenge... NOVA’s “Smartest Machine On Earth” Premiered February 9, 2011 on PBS. Jeopardy! matches aired February 14, 15, and 16. Learn more about WATSON at: http://www-943.ibm.com/innovation/us/watson/ PBS’ NOVA produced a new film on artificial intelligence with unique access to IBM's Watson and the machine's bid to compete on Jeopardy!