Presentation is loading. Please wait.

Presentation is loading. Please wait.

Opening slide 3:00 PM, EST.

Similar presentations


Presentation on theme: "Opening slide 3:00 PM, EST."— Presentation transcript:

1 Opening slide 3:00 PM, EST

2 Clinical Data Quality Issues and Clinical Quality Measure Reliability
Welcome to the OSEHRA Innovation Webinar Clinical Data Quality Issues and Clinical Quality Measure Reliability Opening slide Nicole G. Weiskopf, PhD Assistant Professor, Dept. of Medical Informatics and Clinical Epidemiology Oregon Health and Science University

3 Clinical Data Quality Issues and Clinical Quality Measure Reliability
Nicole G. Weiskopf, PhD Assistant Professor, DMICE, OHSU OSEHRA Webinar: 11/14/17 Contact:

4 Significance What are the benefits of EHR data reuse compared to traditional quality measurement and improvement? Eliminate common barriers (increased efficiency, decreased effort) Representative patient population (improved generalizability and external validity) Tremendous breadth and depth of data (variety and temporality)

5 Significance What happens if we can’t trust the data?
Garbage in, garbage out Decreased validity

6 What’s the point of evaluating something if we can’t trust the results?
It might be more efficient, but it’s still a waste of resources.

7 Defining EHR DQ: What is data quality
Defining EHR DQ: What is data quality? What does it mean for EHR data to be of good quality versus bad quality?

8 “Data are of high quality if they are fit for their intended uses in operations, decision making, and planning. Data are fit for use if they are free of defects and possess desired features.” This quotation is often attributed to JM Juran, but it’s actually from Redman’s data quality field guide, and is derived from Juran’s work. Redman, T (2001) Data quality: the field guide. Based on JM Juran’s work.

9 What data quality means to data consumers
Wang & Strong (1996) Beyond accuracy: What data quality means to data consumers Data Quality Intrinsic Believability, Accuracy, Objectivity, Reputation Contextual Value-added, Relevancy, Timeliness, Completeness, Appropriate amount Representational Interpretability, Ease of understanding, Representational consistency, Concise representation Accessibility Accessibility, Access security Used in management, information science, information technology, etc. Which of these resonate with you? Which don’t? Wang & Strong (1996) Beyond accuracy: What data quality means to data consumers

10 Defining EHR DQ Different requirements for different applications
Different requirements for different projects, even when the general application is the same This means that how you use the data will determine the quality of your data, even for the same dataset.

11 Data can be observed or unobserved…
Longitudinal patient state Observations Clinician Weiskopf et al. (2013) Defining and measuring completeness of EHRs for secondary use

12 …and recorded or unrecorded
Longitudinal patient state Observations Recordings Clinician EHR Weiskopf et al. (2013) Defining and measuring completeness of EHRs for secondary use

13 Make Observations Record Observations

14 Make Observations Record Observations Multi-vitamin, 1x
Metoprolol succinate ER 50mg, 1x Lisinopril 25mg, 1x Metoprolol succinate ER 25mg, 1x Lisinopril 25mg, 1x Metoprolol succinate ER 50mg, 1x Lisinopril 25mg, 2x Make Observations Record Observations Multi-vitamin, 1x Metoprolol succinate ER 50mg, 1x Lisinopril 25mg, 2x

15 “Traditional” Data Database Interface Query Results

16 Outside documentation
Healthcare Data PHR Dataset Dataset Dataset Billing Dataset Database Interface Query Results Labs Dataset Dataset EHR Dataset CPOE “Live” data Dataset Outside documentation Database Data Warehouses Datamarts

17 Healthcare HIT Dataset

18 When people talk about EHR data quality, what are they actually talking about?

19 PubMed Search: 230 results
Lit Review Methods Data collection: Literature review Inclusion criteria: Original research using data quality assessment (DQA) methods Data derived from EHR Peer-reviewed Search: DQ terms & EHR terms Reviewed 230 articles Performed ancestor search Final pool: 95 relevant articles PubMed Search: 230 results Apply criteria: 44 results Ancestor search: +51 results Final pool: 95 articles Weiskopf & Weng (2012) Methods and dimensions of EHR data quality assessment

20 Dimensions of Data Quality
completeness concordance correctness currency plausibility accessibility agreement accuracy timeliness consistency corrections made recency validity availability DQP errors missingness quality presence variation rate of recording Weiskopf & Weng (2012) Methods and dimensions of EHR data quality assessment

21 Dimensions of Data Quality
Completeness: Is a truth about a patient present in the EHR? Correctness: Is an element that is present in the EHR true? Currency: Is an element in the EHR a relevant representation of the patient state at a given point in time? Concordance: Is there agreement between or within elements in the EHR, or between or within elements in the EHR and another data source? Plausibility: Does an element in the EHR makes sense in light of other knowledge about what that element is measuring? Emphasize that these concepts were abstracted from the literature, that’s why there is overlap; not MY categories Through my own literature review, I identified five dimensions of data quality for which assessment methods have actually been described. You’ll see parallels between these dimensions and some of the frameworks I just showed you. These are not the only dimensions mentioned in the literature, but the others (e.g. standards-based, granularity) are not assessed in any systematic way. Animation: Correctness is the same as accuracy in the other frameworks. I used “correctness” instead because the informatics literature often uses accuracy to refer to correctness and completeness (i.e. errors of commission and omission), so I thought that this was less confusing. Weiskopf & Weng (2012) Methods and dimensions of EHR data quality assessment

22 45 articles Intense overreliance on gold standards Concordance and plausibility are probably proxies for other dimensions. They’re methodological approaches, not categories themselves No one assesses currency, even though temporality is vitally important in EHRs 4 articles Weiskopf & Weng (2012) Methods and dimensions of EHR data quality assessment

23 Dataset Granularity Correctness Completeness Currency

24 An element that is present in the EHR is true.
Dataset Granularity Correctness Completeness Currency An element that is present in the EHR is true. Time Value 145 140 140 25 120 115

25 A truth about a patient is present in the EHR.
Dataset Granularity Correctness Completeness Currency A truth about a patient is present in the EHR. Time Value 145 140 140 120 115

26 Granularity Correctness Completeness Currency
Dataset Granularity Correctness Completeness Currency An element in the EHR a relevant representation of the patient state at a given point in time. Time Value 140 120 115

27 An element in the EHR contains the appropriate amount of information.
Dataset Granularity Correctness Completeness Currency An element in the EHR contains the appropriate amount of information. Time Value HTN HTN HTN no HTN no HTN no HTN

28 A few thoughts on the impact of correctness
Theoretical distribution of systolic blood pressure

29 A few thoughts on the impact of correctness
Need to consider the difference between random and systematic error. What impact can random error have? What impact can systematic error have? Introduces noise. Can introduce spurious signal, or bias. (May be difficult to determine magnitude, but directionality can be informative.) In my experience, once data are aggregated, incorrect data, which are usually random, aren’t much of an issue. But you really don’t want to miss systematic error.

30 But what about completeness?
Health status indicator: American Society of Anesthesiology (ASA) Physical Status 1 (healthy) to 6 (brain dead); Limited to groups 1 to 4 Data quantity: Counts of days with medication actions Counts of days with laboratory results Population: 5000 patients Apr, Sep, 2012 No anesthesia within past year 1 A normal healthy patient 2 A patient with mild systemic disease 3 A patient with severe systemic disease 4 A patient with severe systemic disease that is a constant threat to life 5 A moribund patient who is not expected to survive without the operation 6 A declared brain-dead patient whose organs are being removed for donor purposes Originally thought to use Charleston comorbidity index, but rapidly realized that a lot of patients wouldn’t have enough data to calculate it! Weiskopf et al. (2013) Sick patients have more data: the non-random completeness of EHRs Rusanov et al. (2014) Hidden in plain sight: bias towards sick patients

31 Methods: Analysis Primary test: Kruskal-Wallis one-way analysis of variance Post-hoc analyses: Wilcoxon rank-sum test with a Bonferroni correction Used these methods for two reasons: very unequal sample sizes AND non-normal distribution Weiskopf et al. (2013) Sick patients have more data: the non-random completeness of EHRs Rusanov et al. (2014) Hidden in plain sight: bias towards sick patients

32 Counts of available data points differ across ASA scores
Talk about why this matters. “Here’s a thought experiment for you…” Weiskopf et al. (2013) Sick patients have more data: the non-random completeness of EHRs Rusanov et al. (2014) Hidden in plain sight: bias towards sick patients

33 Distribution of ASA scores changes with threshold for sufficient data
Weiskopf et al. (2013) Sick patients have more data: the non-random completeness of EHRs Rusanov et al. (2014) Hidden in plain sight: bias towards sick patients

34 MANY thoughts on completeness
There are three types of missingness, defined by Rubin. MCAR (missing completely at random): pattern of missingness is not related to any other data MAR (missing at random): the pattern of missingness is related to data that are present MNAR (missing not at random): the pattern of missingness is related to the values of the data that are missing Rubin (1976) Inference and missing data

35 Question: When EHR-derived data are missing (incomplete), are they missing completely at random, missing at random, or missing not at random? Answer: Yes Serious Answer: In real life, it’s very rare to see “pure” missingness of any type. Data may be missing at random, but it’s also likely that the underlying mechanism driving missingness reflects both observed and unobserved (missing) data. In the blood pressure example, presence of data within the past month is determined largely by frequency of healthcare utilization, which is determined in part by underlying health, which is in turn determined in part by age.

36 How does all this apply to clinical quality measures?

37 Potential benefits of EHR based CQMs
Resource savings Consistency in calculation logic Instantaneous (or close to) feedback Temporal trends instead of point-in-time assessment of quality of care But we need to consider the role of data quality in all of this.

38

39

40

41 ETL processes that take highly complex EHR data and transform them into flat files also transform underlying data quality problems related to structure, representation, availability, and accessibility to presence or absence of data. This is why EHR-focused models of data quality are generally simpler than, for example, Wang and Strong’s. (If you talk to clinicians, who deal with the upstream data, you’re likely to hear a lot about issues relating to data overload, unstructured text, fragmentation, etc.)

42 # of patients on antiplatelet
This is especially true for clinical quality measures, because to perform the logical calculations to determine measure denominator and numerator adherence, all clinical concepts are reduced to dichotomous variables. E.g., all patients with ischemic vascular disease should be on aspirin or another antiplatelet. # of patients on antiplatelet # of patients with IVD Performance = Antiplatelet issues: Documented in unstructured text Documented at prior visit Documented by external provider IVD Dx issues: What dx qualify as IVD? What if it’s not on the problem list, but the lab values indicate IVD?

43 Implications of Completeness for CQMs
How do we assess this? Chart review is possible for limited validation efforts, but not sustainable Same for direct provider or patient contact How do we improve this? Naïve approach: more structured documentation

44 Objective To assess and improve the reliability and actionability of automated EHR-based CQMs, it is necessary to: 1) understand and predict the impact of EHR data quality on CQM reliability; 2) provide stakeholders with data quality assessment methods that are predictive of CQM reliability; and 3) improve CQM reliability by developing methods to improve EHR data quality.

45 Questions? Opening slide

46 Thank You for Attending the OSEHRA Innovation Webinar!
December 2017 Winter Break- Please join us again in 2018!

47 So what can you do about poor data quality?
In an ideal world, we will improve data quality. But when you only have access to the downstream data products, you need to detect data quality problems, assess their potential impact, and proceed accordingly, which may mean that you correct/control for poor data quality or change the questions you’re asking.

48 How can we detect data quality problems?
The easiest and most accurate way to check data quality is to compare the data in question to a reference (gold) standard. It is very rare to have reference standard data for an EHR-derived dataset. What else could we compare our data to? Our requirements (remember fitness for use) Our expectations External knowledge The data themselves Other datasets

49 How can we detect data quality problems?
Variables

50 Take advantage of longitudinal data when possible
Especially useful for checking ETL processes.

51 Take advantage of longitudinal data when possible
Especially useful for checking ETL processes.

52 Compare to external knowledge
Rates of diagnosis A in your dataset to rates of diagnosis B according to national registry, claims data, etc. Is there a difference? If so, can it be explained by differences in the population? Or is there reason to suspect a data quality problem?

53 Data quality is a large problem area that is still mostly unsolved
Data quality is a large problem area that is still mostly unsolved. Ultimately we need to improve the source data, but until then: Understand the provenance of your data, especially in terms of system complexities and potential failure points Don’t think of data quality as an issue of right versus wrong values– the problem is generally more subjective (fitness for use) Data that are “bad” at random generally aren’t an issue, but systematic data quality problems can drastically alter your results When you uncover potential data quality problems, be thoughtful in your attempts to compensate


Download ppt "Opening slide 3:00 PM, EST."

Similar presentations


Ads by Google