Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.

Slides:



Advertisements
Similar presentations
Chapter 2 The Process of Experimentation
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
The Research Question Alka M. Kanaya, MD Associate Professor of Medicine, Epidemiology & Biostatistics UCSF October 3, 2011.
Task Force on Diabetes and CVD (ESC and EASD) European Heart Journal 2007;28:
Why it is important and how it affects you as a physician. Jeni Smith, CPC.
2004Dr.Hamda Qotba1 Proposal writing By Dr. Hamda Qotba, B.Med.Sc, M.D, ABCM.
Area 4 SHARP Face-to-Face Conference Phenotyping Team – Centerphase Project Assessing the Value of Phenotyping Algorithms June 30, 2011.
Preceptor: Louise A. Mawn, M.D. May 30, Medical Documentation Medical record serves many functions For health care providers it facilitates: Communication.
Data Quality/ Data Heterogeneity An evolving mission Kent Bailey Susan Welch Jeff Tarlowe.
1 Lauren E. Finn, 2 Seth Sheffler-Collins, MPH, 2 Marcelo Fernandez-Viña, MPH, 2 Claire Newbern, PhD, 1 Dr. Alison Evans, ScD., 1 Drexel University School.
University of Pittsburgh Department of Biomedical Informatics Healthcare institutions have established local clinical data research repositories to enable.
Concept of Measurement
The Role of Standard Terminologies in Facilitating Integration James J. Cimino, M.D. Departments of Biomedical Informatics and Medicine Columbia University.
HIV Disease in Older Patients Donna M. Gallagher, ANP The International AIDS Society–USA DM Gallagher, ANP. Presented at IAS–USA/RWCA Clinical Conference,
Data Heterogeneity Study (Not Data Quality) (OR) “Type 2 Diabetes: A modern day St. Valentine’s Day Massacre” Feb.14, 2011.
Azara Proprietary & Confidential Overview June 2014 Improving Patient Outcomes through Data.
Chronic kidney disease Mr James Hollinshead Public Health Analyst East Midlands Public Health Observatory (EMPHO) UK Renal Registry 2011 Annual Audit Meeting.
Intelligent Data Analysis (IDA) by Josipa Kern, PhD Andrija Stampar School of Public Health Medical School University of Zagreb Zagreb, Croatia.
EHRS as a Tool to Improve BP Control 1.Brief history of OQIUN, CCI. Began 1999 using data cards. Started working with multiple practice sites using different.
Unit 11.2b: Data Quality Attributes Data Quality Improvement Component 12/Unit 11 Health IT Workforce Curriculum Version 1.0/Fall
Evidence-Based Practice Current knowledge and practice must be based on evidence of efficacy rather than intuition, tradition, or past practice. The importance.
Presented by: Total Health Care Totally There For You CODING FOR DIABETES MELLITUS 1.
Unit 11C: Data Quality Attributes Data Quality Improvement This material was developed by Johns Hopkins University, funded by the Department of Health.
Pattern of Diabetes Emergencies among adult Yemeni Diabetic Patients Dr. Zayed Atef Faculty of Medicine Sana’a University.
Effect of Hypertension and Dyslipidemia on glycemic control among Type 2 Diabetes patients in Thailand Dr. Mya Thandar Dr.PH. Batch 5 1.
Presented by Dr. Soe Sandi Tint
Jacqueline Wilson Lucas, B.A., MPH Renee Gindi, Ph.D. Division of Health Interview Statistics Presented at the 2012 National Conference on Health Statistics.
July 31, 2009Prepared by the Maine Health Information Center Overview of All Payer Claims Data Suanne Singer, Senior Consultant Maine Health Information.
Improving Hypertension Quality Measurement Using Electronic Health Records S Persell, AN Kho, JA Thompson, DW Baker Feinberg School of Medicine Northwestern.
© Copyright IBM Corporation 2008 Health Analytics: An Overview HealthTech Net November 20, 2008 Richard Singerman, Ph.D.
EMR use is not associated with better diabetes care Patrick J. O’Connor, MD, MPH, A. Lauren Crain, PhD, Leif I. Solberg, MD, Stephen E. Asche, MA, William.
Bariatric Surgery: Outcomes and Safety MISS 2010 Bruce M. Wolfe, MD Professor of Surgery Oregon Health & Science University.
Lipoatrophy and lipohypertrophy are independently associated with hypertension: the effect of lipoatrophy but not lipohypertrophy on hypertension is independent.
BlueCross BlueShield of Tennessee, Inc., an Independent Licensee of the BlueCross BlueShield Association. This document has been classified as public Information.
Effect of Hypertension and Dyslipidemia on glycemic control among Type 2 Diabetes patients in Thailand Dr. Mya Thandar DrPH Batch 5 1.
Advancing Knowledge to Improve Health Association of Care Coordination with Diabetes Outcome Measures among Adults with Diabetes David M. Mosen, PhD, MPH.
1 Copyright © 2011 by Saunders, an imprint of Elsevier Inc. Chapter 8 Clarifying Quantitative Research Designs.
Realizing the Benefits of Health IT For CHCs November 8, 2005 Ralph Silber, MPH, CEO Community Health Center Network 1320 Harbor Bay Parkway, Suite 250.
The Nursing Process ASSESSMENT. Nursing Process Dynamic, ongoing Facilitates delivery of organized plan of nursing care Involves 5 parts –Assessment –Diagnosis.
Studying Health Care: Some ICD-10 Tools Hude Quan, Nicole Fehr, Leslie Roos University of Calgary and Manitoba Centre for Health Policy.
The Diabetic Retinopathy Clinical Research Network Effect of Diabetes Education During Retinal Ophthalmology Visits on Diabetes Control (Protocol M) 11.
POPULATION SURVEYS Evaluation the health status of a population (community diagnosis). Evaluation the health status of a population (community diagnosis).
Chapter 2: Identification and Care of Patients With CKD 2015 A NNUAL D ATA R EPORT V OLUME 1: C HRONIC K IDNEY D ISEASE.
Do veterans with spinal cord injury and diabetes have greater risk of macrovascular complications? Ranjana Banerjea, PhD 1, Usha Sambamoorthi, PhD 1,2,3,
A Claims Database Approach to Evaluating Cardiovascular Safety of ADHD Medications A. J. Allen, M.D., Ph.D. Child Psychiatrist, Pharmacologist Global Medical.
Hospital racial segregation and racial disparity in mortality after injury Melanie Arthur University of Alaska Fairbanks.
The Usual Source of Care and Delivery of Preventive Services to Medicare Beneficiaries Academy Health, June 2005 Hoangmai Pham, MD, MPH Deborah Schrag,
Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department.
Data Quality SHARPn Nov 18, Recent summary of goals  Objectives  1. Enumeration of data sources for each of 4 types of data: –a) Diagnoses –b)
Postgraduate books recommended by Degree Management and Postgraduate Education Bureau, Ministry of Education Medical Statistics (the 2nd edition) 孙振球 主.
Impact of Perceived Discrimination on Use of Preventive Health Services Amal Trivedi, M.D., M.P.H. John Z. Ayanian, M.D., M.P.P. Harvard Medical School/Brigham.
Integrated Care for Veterans with Diabetes & Serious Mental Illness Judith A. Long, MD Steven C. Marcus, PhD.
Proposed Study of the Effect of Different Subtypes of Cognitive Impairment on the Capacity of Older Individuals to Manage Chronic Disorders.
By Dr Hidayathulla Shaikh. Objectives At the end of the lecture student should be able to –  Explain types of examination  Discuss different types of.
Carina Signori, DO Journal Club August 2010 Macdonald, M. et al. Diabetes Care; Jun 2010; 33,
Department of Preventive Medicine Faculty of Public Health University of Debrecen General Practitioners’ Morbidity Sentinel Stations Program (GPMSSP) to.
CHEST 2014; 145(4): 호흡기내과 R3 박세정. Cigarette smoking ㅡ the most important risk factor for COPD in the US. low value of FEV 1 : an independent predictor.
Circulation. 2014;129: Association Between Plasma Triglycerides and High-Density Lipoprotein Cholesterol and Microvascular Kidney Disease and Retinopathy.
Table 1. Methodological Evaluation of Observational Research (MORE) – observational studies of incidence or prevalence of chronic diseases Tatyana Shamliyan.
©2015 MFMER | slide-1 The Effect of an Automated Point of Care Tool on Diagnosis and Management of Childhood Obesity in Primary Care Natalie Gentile, MD.
Quality of Electronic Emergency Department Data: How Good Are They?
Actions Outcomes Resulting from Positive Hemoglobin A1C Screenings
American Public Health Association Annual Meeting
Improving Adverse Drug Reaction Information in Product Labels
Evaluating Sepsis Guidelines and Patient Outcomes
A Growth Curve Analysis Participant Baseline Characteristics
Chapter 7 The Hierarchy of Evidence
Strength of Evidence; Empirically Supported Treatments
Endocrine, Nutritional and Metabolic Disease Chapter IV
Lorna Perez, Ethan Gough
Presentation transcript:

Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton

Objectives  Assess Data variability within and across institutions  Assess impact of this variability on Secondary Use of EMR  Generate specifications for Widgets –“Warning Label” for suspect data categories –Data quality audits with logs –Batch data correction / removal

Current Research: Effects of Variation on Diabetes Phenotyping Algorithm  Purpose: Compare data relevant to Type 2 DM eMERGE phenotyping algorithm between Intermountain and Mayo  Methods: 1. Identify adult subjects with evidence in any semantic category of algorithm:  ICD-9-CM codes for Diabetes Mellitus  Abnormal glucose or HbA1C  Antihyperglycemic medications  Capillary glucose (Glucometer) procedures

Methods 2.Collect relevant data on these subjects –ICD-9-CM codes –Procedure codes –Demographic data –Smoking status –Body Mass index –Specialty of provider –Geographic info –Frequency of health care encounters 3.Describe variation between institutions

Analysis  Compare (between institutions) frequencies of data elements –ICD9 codes– overall and specific codes  Compare lab values– number and values  Compare medications–  Control for: –Provider specialty –Geographic variables –Demographic variables

Interpretation  Assess impact of data heterogeneity on phenotyping at different institutions  Recommendations for –High throughput Phenotyping –High throughput screening for clinical trials  Generalization to other phenotypes  Hypothesis generation

Preliminary Mayo Results  Mayo Data: ( ICD or abn.labs or capill. Glucose, limited to Olmsted and surrounding counties) –13,754 subjects  89% Caucasian,  2.5% African-American,  2.0% Asian  6.5% Native Am, Pac. Isl., other, unknown, refuse –Mean current age 64, range 20 to 104 –Sex: 53% male, 47% female

Preliminary Mayo results N=13,754  Smoking (n=11,626) –Current 66%, past 16%, never 13%, Unk 6%  BMI (limited to < 60) (n=6,338) –Mean /- 7.2 –Median 31.6, quartiles (27.5, 36.6)

Preliminary Results: ICD9 codes  Complications –None 6743(250.0) –Ketoacidosis 1(250.1) –Hyperosmolality 2(250.2) –Renal 398(250.4) –Opthalmic 1385(250.5) –Neuro 586(250.6) –Peripheral Circ. 25(250.7) –“other specified” 312(250.8) –Unspecified 336(250.9)

Preliminary Results: ICD9 codes  250.X0 Type 2 or unspecified, controlled or not » specified as uncontrolled  250.X1 Type 1, controlled or not »Specified as uncontrolled  250.X2 Type 2 or unspecified, uncontrolled  250.X3 Type 1, uncontrolled

Type 2/U vs. Type 1 DM codes Mayo Data: n=13707 Type 1 DM codes Type 2/U DM codes (46%) 6631 (48%) (4%) 254 (2%)

Intermountain peek (sic) Type 1 ICD9 codes Type 2/U ICD9 codes , ,0836,629  Disclaimer– don’t assume data are ready to compare between sites at this point

Back to Mayo Summary Sample Lab data Test name NMin1%Med.99%Max Glucose (P) 40, Glucose POCT 211, Hemogl obin A1c, B 35, % 5.1 % 6.9 % 12.1 % 16.7 %

Future Directions  Carry out inter-institution comparison  Study effects of geography, race, etc.  Implement chart review (on random sample) for “gold standard” definition of Type 2 DM  Use of lab values /meds for definition of continuous phenotype (DM-ness)  Extrapolation / generalization to other diseases /phenotypes

Data Quality (a.k.a. “Data Heterogeneity”) Susan Rea Welch

Conclusions: PhD Research Cohort Amplification –Knowledge Discovery from Databases (KDD) –Associative Classification Methods –Classification Rules for Diabetes and Asthma  comparably accurate  Concise  consistent with domain knowledge –Contributed new knowledge  Attributes for cohort identification  Unanticipated comorbidity associations

Consistency and Novelty Diabetes  Elevated quantitative lab glucose assays –Frequency 19%, Likelihood 87% –Less predictive than glucose by glucometer or Urine Microalbumin  Abnormal HbA1c test –Equivalent predictive power of HBA1c test order  Antihyperglycemic medications –Variable predictive strength: Metformin, Insulin, Insulin Release Stimulators, Insulin Response Enhancers

Consistency and Novelty Asthma  Medications were most predictive –High Likelihood: Salmeterol, Leukotriene receptor antagonist –Albuterol / Glucocorticoid combine:  Pulmonary Procedures (CPT hierarchy)  Female gender  Abnormal CBC  Unexpected comorbidity associations –Suggests discovery of shared pathways

Associative Classification – What? Pattern discovery in transaction database Independent of domain expertise Deductive, global associations in data Induce a general & accurate classifier

Associative Classification – Why? No domain expertise attribute selection Not affected by missing data Proven accuracy Understandable rules Independent rules

Core Candidate Attributes  Diagnosis codes  Provider specialty  Lab observations  Procedure codes  ‘Abnormal’ lab obs.  Imaging procedures  Medication list  Age groups  Female gender

SHARPn Y2 Research Aims  Associations reliable across EHRs?  Improve algorithms’ sensitivity / specificity? –AC attribute selection + other classifiers