Data Quality/ Data Heterogeneity An evolving mission Kent Bailey Susan Welch Jeff Tarlowe.

Slides:



Advertisements
Similar presentations
In the name of GOD In the name of GOD.
Advertisements

Designing Clinical Research Studies An overview S.F. O’Brien.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
Area 4 SHARP Face-to-Face Conference Phenotyping Team – Centerphase Project Assessing the Value of Phenotyping Algorithms June 30, 2011.
Estimation of Sample Size
Approaches to Data Acquisition The LCA depends upon data acquisition Qualitative vs. Quantitative –While some quantitative analysis is appropriate, inappropriate.
World Health Organization
© Nancy E. Mayo 2004 Sample Size Estimations Demystifying Sample Size Calculations Graphics contributed by Dr. Gillian Bartlett.
Concept of Measurement
ANOVA: ANalysis Of VAriance. In the general linear model x = μ + σ 2 (Age) + σ 2 (Genotype) + σ 2 (Measurement) + σ 2 (Condition) + σ 2 (ε) Each of the.
Beginning the Research Design
Midterm Review Evaluation & Research Concepts Proposals & Research Design Measurement Sampling Survey methods.
P09004 Eye Tracking. Engineering Analysis Reviewing manuals for Single Board Computer Looking at various mounting methods for cameras Preliminary.
Data Heterogeneity Study (Not Data Quality) (OR) “Type 2 Diabetes: A modern day St. Valentine’s Day Massacre” Feb.14, 2011.
Cohort Studies.
CDRH Review Practice for Over-the-Counter in Vitro Diagnostic Tests
Overview of Meta-Analytic Data Analysis. Transformations Some effect size types are not analyzed in their “raw” form. Standardized Mean Difference Effect.
Quantitative Genetics
Validity and Reliability Dr. Voranuch Wangsuphachart Dept. of Social & Environmental Medicine Faculty of Tropical Medicine Mahodil University 420/6 Rajvithi.
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Program Evaluation. Program evaluation Methodological techniques of the social sciences social policy public welfare administration.
1 Copyright © 2011 by Saunders, an imprint of Elsevier Inc. Chapter 9 Examining Populations and Samples in Research.
CHP400: Community Health Program- lI Research Methodology STUDY DESIGNS Observational / Analytical Studies Case Control Studies Present: Disease Past:
There are two main purposes in statistics; (Chapter 1 & 2)  Organization & ummarization of the data [Descriptive Statistics] (Chapter 5)  Answering.
PTP 560 Research Methods Week 8 Thomas Ruediger, PT.
Survey of Medical Informatics CS 493 – Fall 2004 October 11, 2004 V. “Juggy” Jagannathan.
Material Variability… … or “how do we know what we have?”
Understanding Research Design Can have confusing terms Research Methodology The entire process from question to analysis Research Design Clearly defined.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
The Campbell Collaborationwww.campbellcollaboration.org C2 Training: May 9 – 10, 2011 Introduction to meta-analysis.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Systematic Reviews Michael Chaiton Tobacco and Health: From Cells to Society September 24, 2014.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Data Quality Sharp project 5 June Statistical Problems with Data Quality in EHR Missing Data Missing Data Uncertain Diagnosis Uncertain Diagnosis.
Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Data Quality SHARPn Nov 18, Recent summary of goals  Objectives  1. Enumeration of data sources for each of 4 types of data: –a) Diagnoses –b)
Progress Meeting - Rennes - November 2001 Sampling: Theory and applications Progress meeting Rennes, November 28-30, 2001 Progress meeting Rennes, November.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Advanced Software Engineering Lecture 4: Process & Project Metrics.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
1/53: Topic 3.1 – Models for Ordered Choices Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA William.
Lecture 4 Confidence Intervals. Lecture Summary Last lecture, we talked about summary statistics and how “good” they were in estimating the parameters.
Document number Anticipated Impacts for FRRS Pilot Program ERCOT TAC Meeting September 7, 2012.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
SEMINAR ON PRESENTED BY BRAHMABHATT BANSARI K. M. PHARM PART DEPARTMENT OF PHARMACEUTICS AND PHARMACEUTICAL TECHNOLGY L. M. COLLEGE OF PHARMACY.
Chapter 11: Test for Comparing Group Means: Part I.
Meta-analysis of observational studies Nicole Vogelzangs Department of Psychiatry & EMGO + institute.
Clinical practice involves measuring quantities for a variety of purposes, such as: aiding diagnosis, predicting future patient outcomes, serving as endpoints.
Chapter 6 Introductory Statistics and Data
Present: Disease Past: Exposure
Epidemiological Methods
Chapter 2 Simple Comparative Experiments
Some Epidemiological Studies
This teaching material has been made freely available by the KEMRI-Wellcome Trust (Kilifi, Kenya). You can freely download,
الأستاذ المساعد بقسم المناهج وطرق التدريس
Two Sample t-test vs. Paired t-test
Basic Statistical Terms
Principal recommendations
_ Update Seth Blumenthal, MBA Director, Data & Innovation PCPI
ERRORS, CONFOUNDING, and INTERACTION
Review Questions III Compare and contrast the components of an individual score for a between-subject design (Completely Randomized Design) and a Randomized-Block.
Lecture 1: Descriptive Statistics and Exploratory
Chapter 6 Introductory Statistics and Data
Presentation transcript:

Data Quality/ Data Heterogeneity An evolving mission Kent Bailey Susan Welch Jeff Tarlowe

What is “Data Quality”?  Reliability? Accuracy?  Reproducibility?  Information content?  Presence in all patients? Missingness?  Lack of bias?  Suitability to the question?  All of the above?

What is “Data Quality?”  Individual datum level –Obvious errors –Non-obvious errors –Not “errors” but poor reflection of a characteristic of an individual  Batch Level –Bias, suitability to question, information content, e.g. “signal to noise ratio”

Projects under Data Quality  Project 1: Data Heterogeneity Study  Project 2: Difficult data elements –Body Mass Index –Smoking  Project 3: Comparison of Computer Algorithm vs. Manual Review for Treatment Cohort selection (a.k.a. the “John Henry” study)  Project 4: Measuring information in quantitative data

Project 1: Data Heterogeneity  Purpose: Compare EHR data between institutions in terms of characteristics (not “quality”)  Institutions: Mayo and Intermountain  Methods: extract data relative to Type 2 Diabetes from EHR at each institution: diagnoses, labs, meds  Analysis: –Descriptive (compare frequencies and distributions) –Tweak selection parameters, and study effects –Study within-institution heterogeneity / bias –Study differences in institutional source datasets

Project 1: Data Heterogeneity  Current Status/Milestones –IRB, data sharing approval –Initial DM2 datasets exist at each institution –De-identification (homemade) –Initial exchange of de-identified data! –Analysis proceeding  Comparative analysis of ICD9 codes  Comparison of datasources, missingness

Project 1:Data Heterogeneity  Next Steps/ Future directions –Compare and contrast Mayo and Intermountain data –Compare and elucidate idiosyncracies of data sources –Draw generalization on heterogeneities –Assess impact of these heterogeneities on secondary use –White paper?

Project 2: Difficult Data Elements  Purpose: characterize quality aspects of difficult data elements (BMI, smoking, …) and develop mitigation or warnings  Method: –extract data (height/weight/BMI, smoking) within Data Heterogeneity study at Intermountain and Mayo –Detect errors/ missingness –Develop mitigations if possible

P2: Difficult Data Elements  Current status: –data related to BMI have been shared, are being analyzed  Next steps/future directions –Comparative analysis of BMI data, data quality / absence issues –Smoking status derived from cTAKES-based algorithm about to be reviewed by chart review –Develop widgets? White paper?

Project 3: Algorithm/human review (“John Henry”) study  Purpose: Demonstrate and quantify the cost benefit associated with developing and implementing a computer algorithm to derive a cohort with high risk type 2 Diabetes compared to manual review to derive such a cohort. Analyze discordancies between the 2 methods.  Method: After phased preliminary comparative studies of 20 and 50 potential cases, with refining of algorithm, analyze 200 cases by the 2 approaches. Analyze the discordancies, but also the cumulative costs associated with both methods. Extrapolate to other target sample sizes

P3: John Henry  Current Status/Milestones –First phases complete –Final contest (200 charts) imminent  Next steps/Future directions –Analyze costs using various assumptions –Report results –Generalize to other settings?

Project 4: Measuring information in quantitative data  Purpose: to develop methods to quantify the signal to noise ratio in quantitative data that can be used to inform choices or weights applied to different potential variables related to the same underlying phenotype  Methods: application of ANOVA to estimate between- subject and within subject components of variance and other methods for estimating signal and noise components, example random capillary glucose and HbA1c  Current status: gleam in the eye, preliminary proof of concept

Questions/suggestions