Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture-Specific Considerations and Methods for Data Quality Assessment in Collaborative Clinical Data Research Networks Chunhua Weng, PhD Associate.

Similar presentations


Presentation on theme: "Architecture-Specific Considerations and Methods for Data Quality Assessment in Collaborative Clinical Data Research Networks Chunhua Weng, PhD Associate."— Presentation transcript:

1 Architecture-Specific Considerations and Methods for Data Quality Assessment in Collaborative Clinical Data Research Networks Chunhua Weng, PhD Associate Professor of Biomedical Informatics Columbia University, New York City May 27, 2015

2 No competing interests to disclose
Disclosures No competing interests to disclose

3 EHR data are subject to quality problems
“With the advent of the information era in medicine, we are pouring out a torrent of medical record misinformation. Medical records, which have long been faulty, contain more distorted, deleted, and misleading information than ever before.” Burnum (1989) The misinformation era: the fall of the medical record.

4

5

6 What data quality problems should we try to prevent when creating clinical data research networks for CER?

7 Three CDRN Architecture Models
Centralized Query of Everything Federated Query of Patient Counts Federated Query of Research Results

8 CDRN Model 1 Central CDW CDM-based De-identified data CDM-based
index index index CDM-based De-identified data CDM-based De-identified data CDM-based De-identified data ETL ETL ETL CDW-1 CDW-2 CDW-n

9 What are data quality concerns for model 1?
When and Where Data Quality Concerns At local level before ETL Sampling bias1, correctness, currency2, completeness3, concordance, plausibility, At local level during ETL Information loss, transformation/coding error At the central level during indexing Inconsistency or redundancy across sites At the central level after indexing currency, data provenance, research suitability 1. Rusanov A*, Weiskopf NG*, Wang S, Weng C, Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research, BMC Medical Informatics and Decision Making, 2014 Jun 11;14(1):51, 2. Weiskopf NG, Weng C, Methods and Dimensions of EHR Data Quality Assessment: Enabling Reuse for Clinical Research, J Am Med Inform Assoc Jan 1;20(1):144-51 3. Weiskopf NG, Hripcsak G, Sushmita S, Weng C, Defining and measuring completeness for electronic health records for secondary use. J Biomed Inform, 2013 Oct;46(5):830-6. 

10 Example Data Quality Issues at Local Level
Bias: Biases in lab ordering for specific population subgroups Biases towards specific population subgroups: do data represent the overall population? Bias in data measurement Completeness: Missing required data elements or variables for CER of particular diseases Completeness: Missing data due to data fragmentation Correctness: Accuracy of ICD-9 diagnosis Correctness: Incorrect coding/use of CDM or terminology Currency: Outdated gender information for transgender patients Plausibility: Discharge date is 25 days earlier than admission date

11 Example Data Quality Issues at Central Level
Concordance: The same patient has different values from data submitted by different sources Redundancy: The same patient is created multiple times in the central database and treated as multiple patients Currency: Lack of timely sync between local and central data Data provenance: Unable to trace back to data sources: e.g., did you put in discharge diagnosis or admission diagnosis? Not granular: Lack of granularity of coding

12 Defining and Measuring Data Completeness

13 Defining and Measuring Data Completeness

14 CDRN Model 2: SHRINE Source of Information:

15 CDRN Model 2: i2b2-SHRINE CDM-based data CDM-based data CDM-based data
Shared Query Aggregated Counts CDM-based data CDM-based data CDM-based data CDM-based data ETL ETL ETL ETL CDW-1 CDW-2 CDW-3 CDW-n

16 What are data quality concerns for model 2?
When and Where Data Quality Concerns At local level before ETL Sampling bias1, correctness, completeness3, concordance, plausibility, currency2 At local level during ETL Information loss, transformation/coding error At the central level during indexing Inconsistency or redundancy across sites At the central level currency, data provenance, research suitability 1. Rusanov A*, Weiskopf NG*, Wang S, Weng C, Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research, BMC Medical Informatics and Decision Making, 2014 Jun 11;14(1):51, 2. Weiskopf NG, Weng C, Methods and Dimensions of EHR Data Quality Assessment: Enabling Reuse for Clinical Research, J Am Med Inform Assoc Jan 1;20(1):144-51 3. Weiskopf NG, Hripcsak G, Sushmita S, Weng C, Defining and measuring completeness for electronic health records for secondary use. J Biomed Inform, 2013 Oct;46(5):830-6. 

17 CDRN Model 3: OHDSI

18 CDRN Model 3: OHDSI CDM-based data CDM-based data CDM-based data
Shared Protocol Aggregated Evidence CDM-based data CDM-based data CDM-based data CDM-based data ETL ETL ETL ETL CDW-1 CDW-2 CDW-3 CDW-n

19 What are data quality concerns for model 3?
When and Where Data Quality Concerns At local level before ETL Sampling bias1, correctness, completeness3, concordance, plausibility, currency2 At local level during ETL Information loss, transformation/coding error At the central level during indexing Inconsistency or redundancy across sites At the central level currency, data provenance, research suitability, metadata transparency and completeness 1. Rusanov A*, Weiskopf NG*, Wang S, Weng C, Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research, BMC Medical Informatics and Decision Making, 2014 Jun 11;14(1):51, 2. Weiskopf NG, Weng C, Methods and Dimensions of EHR Data Quality Assessment: Enabling Reuse for Clinical Research, J Am Med Inform Assoc Jan 1;20(1):144-51 3. Weiskopf NG, Hripcsak G, Sushmita S, Weng C, Defining and measuring completeness for electronic health records for secondary use. J Biomed Inform, 2013 Oct;46(5):830-6. 

20 Tool-Assisted Data Quality Assessment

21 Take home Different CDRN architectures entail different data quality assessment requirements Federated approach involving autonomous sites may minimize data query checking complexities Tools such as Achilles from OHDSI can be used for data quality assessment We need a rapid learning system for DQA


Download ppt "Architecture-Specific Considerations and Methods for Data Quality Assessment in Collaborative Clinical Data Research Networks Chunhua Weng, PhD Associate."

Similar presentations


Ads by Google