Clinical research and the electronic medical record: Interdisciplinary research agendas Michael G. Kahn MD, PhD Biomedical Informatics Core Director Colorado Clinical and Translational Sciences Institute (CCTSI) Professor, Department of Pediatrics University of Colorado Director, Clinical Informatics The Children’s Hospital, Denver
Submission & Reporting Evidence-based Review New Research Questions Study Setup Study Design & Approval Recruitment & Enrollment Study Execution Clinical Practice Public Information T1 Biomedical Research Investigator Initiated T1 T2 Translational Research Industry Sponsored Commercialization Clinical Trial Data Basic Research Data Pilot Studies Required Data Sharing Outcomes Reporting Outcomes Research Evidence- based Patient Care and Policy EMR Data A Lifecycle View of Clinical Research
The Promise of the Electronic Medical Record Merging prospective clinical research & evidence-based clinical care –A “front-end” focus Improving care one patient at a time (decision support) Merging clinical care and clinical research data collection Clinically rich database for retrospective clinical research –A “back-end” focus Making discoveries across populations of patients Improving care at the population / policy level
Grand Vision: Any clinical investigator can “belly up to the bar” for research-quality data
The Tale of A Trivial Data Request The original data request: “For an upcoming grant application, how many patients were seen recently with neurofibromatosis-1 (NF-1) and scoliosis?” “Recently seen” = an encounter of any type since 1/1/2008 NF-1: ICD-9 code starts with “237.7” Scoliosis: ICD-9 code starts with “737.3” Result: N=15
The Tale of A Simple Data Query Drilling down: –This query required both diagnoses to be coded on the same encounter (event). N(Pt) Encounter Dx1 = NF-1 Dx2 = Scoliosis 1/1/ today
The Tale of A Simple Data Query Second query: –NF-1 and Scoliosis diagnoses can be coded on different encounters, both within time window –N= 28 N(Pt) Encounter Dx1 = NF-1 Dx2 = Scoliosis 1/1/ today Encounter
The Tale of A Simple Data Query Investigator still did not like the answer: –NF-1 is a life-long genetic illness –Scoliosis develops as a complication. –Therefore: NF-1 diagnosis at any time Only scoliosis need to be “recently seen” –N= 47 N(Pt) Encounter Dx1 = NF-1 Dx2 = Scoliosis Encounter 1/1/ today
One Question Three temporal structures Three different answers N = 15 N = 28 N = 47
10
Tale of a research query Use of C-Reactive Protein as a marker of clinical infection in the NICU First Temporal Structure: No Abx 2+ days CRP test 2 days Abx Start Days(Antibiotics) Abx Stop
Tale of a research query This is not right! No Abx 2+ days CRP test 2 days Abx Start Days(Antibiotics) Abx Stop Abx Stop could occur during 2-day window for CRP test, as long as CRP test occurred before CRP test
Tale of a research query Does this capture the desired relationship? Want to allow for Abx Stop to occur within the 2-day CRP window but only if after CRP test. But do not want to require Abx Stop in the 2-day window No Abx 2+ days CRP test Abx Stop 2 days Abx Start Days(Antibiotics)
Tale of a research query What if I do want to constraint Abx Stop to the 2-day window? What does that look like? Is the difference visually obvious? No Abx 2+ days CRP test Abx Stop 2 days Abx Start Days(Antibiotics) No Abx 2+ days CRP test Abx Stop 2 days Abx Start Days(Antibiotics)
Different temporal structures - Different answers Different Clinical Meanings/Interpretations
Representing Meaningful Temporal Relationships Three weeks prior to admission, a bright red patch appeared under the patient's eye. The patient developed a maculopapular rash that spread to her hands and then her knees the following day On admission, she began having fever to 40 o C which resolved by HD #2 She was discharged on HD #8
Original Assertions 6 Fever resolved Red Patch appeared Hospital Admission 3 4 Rash over Hands Rash over Knees 5 Fevers 7 Hospital Discharge Explosive number of derived temporal concepts (full transitive closure) Not all of them are useful. But which ones?
Full Temporal Closure 6 Fever resolved Red Patch appeared Hospital Admission 3 4 Rash over Hands Rash over Knees 5 Fevers 7 Hospital Discharge Explosive number of derived temporal concepts (full transitive closure) Not all of them are useful. But which ones?
Surgical cut time Abx start time Abx stop time Abx redose time Abx d/c time Time Milestones Associated with Surgical Antibiotics Prophylaxis Eight (of 10) clinically-meaningful time intervals Which ones are clinically relevant? Which ones have recommendations? Which ones can we extract?
Supporting Ad-Hoc Queries: Who is the User? Clinically-knowledgable but data-naive clinicians Goal: To ensure underlying temporal assumptions are explicit What type of user interface visual paradigm would support this type of interactive queries? –What meta-data support is needed for clinically-meaningful derived temporal concepts
PatternFinder (Lam: University of Maryland) From: Lam. Searching Electronic Health Records for Temporal Patterns. A Case Study with Azyxxi,
Relational operators –“relative increase greater than X” –“relative increase greater than X%” –“relative decrease greater than X” –“relative decrease greater than X%” –“less than value in event X” –“equal to value in event X –“not equal to value in event X” –“within X prior to (relative)” –“within X following (relative)” –“after X (relative)” –“before X (relative)” – “is equal to (relative)” –“equal to value in event X” –“not equal to value in event X” 22 Key Querying Features From:Lam. Searching Electronic Health Records for Temporal Patterns. A Case Suty with Azyxxi,
Patients with increasing dosages of Remeron followed by a heart attack within 180 days From: Wang, Plaisant, Shneiderman. Workshop: Interactive Exploration of Electronic Health Records, PatternFinder Interface
Patients with increasing dosages of Remeron followed by a heart attack within 180 days SELECT P.* FROM Person P, Event E1, Event E2, Event E3, Event E4 WHERE P.PID = E1.PID AND P.PID = E2.PID AND P.PID = E3.PID AND P.PID = E4.PID AND E1.type = “Medication” AND E1.class = “Anti Depressant” AND E1.name = “Remeron" AND E2.type = “Medication” AND E2.class = “Anti Depressant” AND E2.name = “Remeron“ AND E3.type = “Medication” AND E3.class = “Anti Depressant” AND E3.name = “Remeron" AND E2.value > E1.value AND E3.value >= E2.value AND E2.date > E1.date AND E3.date >= E2.date AND E4.type = “Visit” AND E4.class = “Hospital” AND E4.name = “Emergency" AND E4.value = "Heart Attack" AND E4.date >= E3.date AND 180 <= (E4.date – E3.date) From: Wang, Plaisant, Shneiderman. Workshop: Interactive Exploration of Electronic Health Records,
Result Set Visualization: Ball and Chain
LifeLines2: Align-Rank-Filter From: Wang, Plaisant, Shneiderman. Workshop: Interactive Exploration of Electronic Health Records,
Health Services Research Temporal Templates? Look-back Window End of Observation Date Observation Window Index Event Date Accrual Window Maximum Follow-up Date Patient-specific Dates Time Study-specific Dates From A. Forster. The Ottawa Hospital-Data Request Form 2009
Data quality – Dirty Laundry Suppose the previous issues were solved and investigators can easily construct complex temporal and atemporal queries…… …..what is the quality of the results that come back?
Let’s assume the query interface issue is solved! Would this result be worrisome?
It’s tough being 6 years old…….
Should we be worried? No –Large numbers will swamp out effect of anomalous data or use trimmed data –Simulation techniques are insensitive to small errors Yes –Public reporting could highlight data anomalies –Genomic associations look for small signals (small differences in risks) amongst populations
Research Challenge Can we create a dynamic measure of data quality that is provided with the results of all queries? Query Results, quality measure
What would be the elements of QM? Book cover images from Amazon.com
Measuring Data Quality Observed versus expected distributions Outliers Missing values Performance on data validity checks –Single attribute analysis –Double- / triple- / higher level attributes correlations –Physical / logical domain impossibilities
Defining data quality: The “Fit for Use” Model Borrowed from industrial quality frameworks –Juran (1951): “Fitness for Use” design, conformance, availability, safety, and field use Multiple adaptations by information science community –Not all adaptations are clearly specified –Not all adaptations are consistent –Not linked to measurement/assessment methods 37
38
How to measure data quality? Need to link conceptual framework with methods Maydanchik: Five classes of data quality rules –Attribute domain: validate individual values –Relational integrity: accurate relationships between tables, records and fields across multiple tables –Historical: time-vary data –State-dependent: changes follow expected transitions –Dependency: follow real-world behaviors 39 Maydanchik, A. (2007). Data quality assessment. Bradley Beach, NJ, Technics Publications.
Data Quality Assessment METHODS Five classes of data quality rules 30 assessment methods –Attribute domain rules (5 methods) –Relational integrity: (4 methods) –Historical: (9 methods) –State-dependent: (7 methods) –Dependency: (5 methods) 40 Time and change assessments dominate!!
Dimension 1: Attribute domain constraints 41
Dimension 2: Relational integrity rules 42
43
Dimension 4: State-dependent rules 44
Dimension 5:Attribute dependency rules 45
Implementing the Framework in SAFTINet One of three AHRQ Distributed Research Network grants –SCANNER (UCSD) –SPAN (KPCO) Focused on safety net healthcare providers Includes financial/clinical data integration with Medicaid payments Using Ohio State /TRIAD grid-technologies 46
SAFTINet: Distributed research network Grid Portal
Related DQ Work: Visualizing Data Quality