module 11 Module I: Terminology— Data Quality Indicators (DQIs) Melinda Ronca-Battista ITEP Catherine Brown U.S. EPA
module 12 DQIs Defined DQIs are quantitative (objective numbers) and qualitative (subjective words) – Precision – Bias – Representativeness – Comparability – Completeness – Sensitivity
module 13 DQIs Defined (cont.) Quantitative DQIs – Precision, bias, and sensitivity Qualitative DQIs – Representativeness, comparability, and completeness
module 14 The Hierarchy of Quality Terms DQOs Data Quality Objectives Qualitative and quantitative study objectives AttributesDescriptive aspects of data DQIsIndicators (numbers) for the attributes MQOs Measurement Quality Objectives Acceptance criteria for the attributes measured by project DQIs
module 15 Precision Random errors or fluctuations in the measurement system (unavoidable wiggle) Estimated by agreement among repeated measurements of same property under similar conditions or Same conditions with identical instruments
module 16 Precision
module 17 Coefficient of Variation (COV) is another statistic to represent imprecision COV = coefficient of variation For collocated measurements Where s = sample standard deviation, or STDEV in Excel RPD = relative percent difference RPD = relative percent difference =
module 18 Collocated Methods =IF(D2="yes",ABS((A2-B2)/C2)*100,"") ABAvgBoth>3?RPD Yes 7.0 % No Yes Yes16.4
module 19 Collocated Precision Begins with RPD (or COV) Plot values over time— is A always higher than B? If not, variability is good estimate of precision error
module 110 Bias
module 111 Bias Bias =how far from “truth” you are, in terms of a percentage Bias =your result – audit result audit result audit result You have bias if, over time, you are always high, or always low (or always…)
module 112 Principal Causes of Bias Incomplete data (e.g., if all data only from end of week, less traffic, etc.) Analytical –Calibration error –Sample contamination –Interferences (dandruff) Sampling –Site operator always does same thing “wrong,” (e.g., upside down filter, changing a/c during audit) –Data retrieval error, so that negative values are reset to zero (causing positive bias) or instrument misread (esp. for manual QC checks’ screen reading)
module 113 Estimating Bias Difference between measurement result and “reality” Can only be identified with external estimate of “reality” Maybe second flow rate standard best you can do Ideally, completely independent audits with another person and instrument (required for NAAQS determination)
module 114 Manual PM Bias determined via PEP audits PEP considered “truth” Bias = consistent difference between audit results and field sampler results Can construct confidence intervals If always within limits for results of individual checks, must be within limits for average of differences over that time period
module 115 Bias for Automated Methods
module 116 Automated Methods Calculation made from QC results over time QC estimates used to fold both precision and bias into calculations; difficult to separate
module 117 Bias Hidden as Variability x x x x x x x x x x x x x x x x x x x x x x x x x x Is data set A or B a better representation of population? x x x x x x x x x x x x x x x x x x x x x x x x x x A B 10
module 118 Both data sets have similar variability. Data set B is a biased representation of the population of interest mean=38.5 Bias Hidden as Variability (cont.) x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x A B 10
module 119 Accuracy = Total Error Composed of both precision and bias Measure of long-term agreement of measurements to truth –Can only be measured over time—for any one measurement, random precision errors might be high or low –Over time, precision errors will average out, bias obvious EPA policy: Use bias and precision, rather than accuracy, as separate measures
module 120 Influence of Bias and Imprecision on Overall Accuracy Imprecise and biased Imprecise and unbiased Precise and biased Precise and unbiased
module 121 Precision and Bias Summary Track diff/mean for collocated Track diff/known, when known, is “truth” Track individual results over time (positive and negative) Systematic positive or negative results show bias Variability shows imprecision Use simple statistics EPA’s statistics are in P&B DASC 2007.xls
module 122 Representativeness
module 123 Choice of Sampling Unit - What does a sample represent? 1 filter with 24 hours of material A year One month
module 124 Representativeness Representativeness: measure of degree to which data suitably represent environmental condition e.g., 1 in 3 day results representative of air concentration to be found over how long a time period? How large an area?
module 125 Comparability Qualitative confidence that two or more data sets may be compared Data gathered with FRMs comparable Strict network design (distance from roads, etc.) ensures comparability Using SOPs from 1 person and 1 year to next helps ensure YOUR data set is comparable to dataset from another person and 1 year to next
module 126 Completeness Amount of valid data gathered, as a percentage of the number of valid measurements planned to meet DQOs
module 127 Sensitivity Discerning the Signal in the Noise Concentration Response
module 128 Sensitivity A. Capability to discriminate between different actual concentrations (or flow rates, etc.), or B. Capability of measuring a constituent at low levels –Practical Quantitation Level describes ability to quantify a constituent with known certainty e.g., PQL of.05 g/L for mercury represents level where a precision of +/- 15% can be obtained
module 129 For trace gas instruments, definitions are critical LDL (twice background noise) 40 CFR Part §53.23 (c) MDL (where can measure zero with 99% confidence) 40 CFR Part §136, App. B Zero drift (max diff over 12 hours) 40 CFR Part §53.23 (e)(i) Span drift (% change over 24 hrs of the same concentration) 40 CFR Part §53.23 (e)(ii) See MDL for gaseous.doc
module 130 1993 study by Wisconsin DNR found 23 of 56 labs incorrectly calculated MDL 1998 survey found 26% of submitted results incorrect Mistakes are Common
module 131 Module 1 Summary Precision error = random error (“wiggle”) Bias error = systematic up or down (“jump”) Plot individual results over time Detection limits defined differently; specify calculations for lab, assess what lab routinely does by asking them for their method