Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don.

Slides:



Advertisements
Similar presentations
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Advertisements

Quartiles  Divide data sets into fourths or four equal parts. Smallest data value Q1Q2Q3 Largest data value 25% of data 25% of data 25% of data 25% of.
Nursing Home Falls Control Chart Problem Statement: Assume that following data were obtained about number of falls in a Nursing Home facility. Produce.
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Descriptive Statistics In SAS Exploring Your Data.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
8-1 Quality Improvement and Statistics Definitions of Quality Quality means fitness for use - quality of design - quality of conformance Quality is.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
Total Quality Management BUS 3 – 142 Statistics for Variables Week of Mar 14, 2011.
15 Statistical Quality Control CHAPTER OUTLINE
Introduction to Regression Analysis, Chapter 13,
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Control charts : Also known as Shewhart charts or process-behaviour charts, in statistical process control are tools used to determine whether or not.
Statistical Process Control
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
Chapter 13 Quality Control and Improvement COMPLETE BUSINESS STATISTICSby AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 7th edition. Prepared by Lloyd Jaisingh,
Quality Assurance.
Quality Assessment 2 Quality Control.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Choosing and using statistics to test ecological hypotheses
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Quality Assurance & Quality Control Kristin Vanderbilt, Ph.D. Sevilleta LTER.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
Managing and Curating Data Chapter 8. Introduction Data organization Data management Data curation Raw data is required to repeat a scientific study Any.
DEVA Data Management Workshop Devil’s Hole Pupfish Project Quality Assurance.
Why do we need to do it? What are the basic tools?
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Quality Assurance & Quality Control
Quality Control  Statistical Process Control (SPC)
1 Project Quality Management QA and QC Tools & Techniques Lec#10 Ghazala Amin.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
10 March 2016Materi ke-3 Lecture 3 Statistical Process Control Using Control Charts.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Water quality data - providing information for management
Exploratory Data Analysis
Chapter 13 Simple Linear Regression
STATISTICS AND PROBABILITY IN CIVIL ENGINEERING
COMPLETE BUSINESS STATISTICS
Inference for Least Squares Lines
PCB 3043L - General Ecology Data Analysis.
Description of Data (Summary and Variability measures)
Statistical Methods For Engineers
6-1 Introduction To Empirical Models
Chapter 14 – Correlation and Simple Regression
Joanna Romaniuk Quanticate, Warsaw, Poland
Representation of Data
When You See (This), You Think (That)
Quality Control Lecture 3
Linear Regression and Correlation
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Basic Training for Statistical Process Control
Descriptive Statistics Civil and Environmental Engineering Dept.
Presentation transcript:

Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don Edwards

Instruction Materials Primary References Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell Science. Edwards (2000) Ch. 4 Brunt (2000) Ch. 2 Michener (2000) Ch. 7 Dux, J.P Handbook of Quality Assurance for the Analytical Chemistry Laboratory. Van Nostrand Reinhold Company Mullins, E Introduction to Control Charts in the Analytical Laboratory: Tutorial Review. Analyst 119: 369 – 375.

Outline: Define QA/QC QC procedures Designing data sheets Data entry using validation rules, filters, lookup tables QA procedures Graphics and Statistics Outlier detection Samples Simple linear regression

QA/QC “mechanisms [that] are designed to prevent the introduction of errors into a data set, a process known as data contamination” Brunt 2000

Errors (2 types) Commission: Incorrect or inaccurate data in a dataset Can be easy to find Malfunctioning instrumentation Sensor drift Low batteries Damage Animal mischief Data entry errors Omission Difficult or impossible to find Inadequate documentation of data values, sampling methods, anomalies in field, human errors

Quality Control “mechanisms that are applied in advance, with a priori knowledge to ‘control’ data quality during the data acquisition process” Brunt 2000

Quality Assurance “mechanisms [that] can be applied after the data have been collected, entered in a computer and analyzed to identify errors of omission and commission” graphics statistics Brunt 2000

QA/QC Activities Defining and enforcing standards for formats, codes, measurement units and metadata. Checking for unusual or unreasonable patterns in data. Checking for comparability of values between data sets. Brunt 2000

Outline: Define QA/QC QC procedures Designing data sheets Data entry using validation rules, filters, lookup tables QA procedures Graphics and Statistics Outlier detection Samples Simple linear regression

Flowering Plant Phenology – Data Entry Form Design Four sites, each with 3 transects Each species will have phenological class recorded

PlantLife Stage ______________ _______________ What’s wrong with this data sheet? Data Collection Form Development:

PlantLife Stage ardiP/G V B FL FR M S D NP arpuP/G V B FL FR M S D NP atcaP/G V B FL FR M S D NP bamuP/G V B FL FR M S D NP zigrP/G V B FL FR M S D NP P/G V B FL FR M S D NP PHENOLOGY DATA SHEET Collectors:_________________________________ Date:___________________ Time:_________ Location: black butte, deep well, five points, goat draw Transect: Notes: _________________________________________ P/G = perennating or germinatingM = dispersing V = vegetatingS = senescing B = buddingD = dead FL = floweringNP = not present FR = fruiting

Outline: Define QA/QC QC procedures Designing data sheets Data entry using validation rules, filters, lookup tables QA procedures Graphics and Statistics Outlier detection Samples Simple linear regression

Other methods for preventing data contamination Double-keying of data by independent data entry technicians followed by computer verification for agreement Randomly check 10% of data to calculate frequency of errors Use text-to-speech program to read data back Filters for “illegal” data Computer/database programs Legal range of values Sanity checks Edwards 2000

Table of Possible Values and Ranges Raw Data File Illegal Data Filter Report of Probable Errors Flow of Information in Filtering “Illegal” Data Edwards 2000

Illegal-data filter in SAS Data Checkum; Set all; Message=repeat(“ ”,39); If Y1 1 then do; message=“Y1 is not on the interval [0,1]”; output; end; If Y3>Y4 then do; message=“Y3 is larger than Y4”; output; end; …….(add as many checks as desired) If message NE repeat(“ ”, 39); Keep ID message; Proc Print Data=Checkum;

Outline: Define QA/QC QC procedures Designing data sheets Data entry using validation rules, filters, lookup tables QA procedures Graphics and Statistics Outlier detection Samples Simple linear regression

Identifying Sensor Errors: Comparison of data from three Met stations, Sevilleta LTER

Identification of Sensor Errors: Comparison of data from three Met stations, Sevilleta LTER

QA/QC in the Lab: Using Control Charts

Laboratory quality control using statistical process control charts: Determine whether analytical system is “in control” by examining Mean Variability (range)

All control charts have three basic components: a centerline, usually the mathematical average of all the samples plotted. upper and lower statistical control limits that define the constraints of common cause variations. performance data plotted over time.

Time Mean LCL UCL X-Bar Control Chart

Constructing an X-Bar control chart Each point represents a check standard run with each group of 20 samples, for example UCL = mean + 3*standard deviation

Things to look for in a control chart: The point of making control charts is to look at variation, seeking patterns or statistically unusual values. Look for: 1 data point falling outside the control limits 6 or more points in a row steadily increasing or decreasing 8 or more points in a row on one side of the centerline 14 or more points alternating up and down

Linear trend

Outline: Define QA/QC QC procedures Designing data sheets Data entry using validation rules, filters, lookup tables QA procedures Graphics and Statistics Outlier detection Samples Simple linear regression

Outliers An outlier is “an unusually extreme value for a variable, given the statistical model in use” The goal of QA is NOT to eliminate outliers! Rather, we wish to detect unusually extreme values. Edwards 2000

Outlier Detection “the detection of outliers is an intermediate step in the elimination of [data] contamination” Attempt to determine if contamination is responsible and, if so, flag the contaminated value. If not, formally analyse with and without “outlier(s)” and see if results differ. Edwards 2000

Methods for Detecting Outliers Graphics Scatter plots Box plots Histograms Normal probability plots Formal statistical methods Grubbs’ test Edwards 2000

X-Y scatter plots of gopher tortoise morphometrics Michener 2000

Example of exploratory data analysis using SAS Insight® Michener 2000

Box Plot Interpretation Upper quartile Median Lower quartile Upper adjacent value Pts. > upper adjacent value Lower adjacent value Pts. < lower adjacent value Inter- quartile range

Box Plot Interpretation Inter- quartile range IQR = Q(75%) – Q(25%) Upper adjacent value = largest observation < (Q(75%) + (1.5 X IQR)) Lower adjacent value = smallest observation > (Q(25%) - (1.5 X IQR))

Box Plots Depicting Statistical Distribution of Soil Temperature

Statistical tests for outliers assume that the data are normally distributed…. CHECK THIS ASSUMPTION!

Normal density and Cumulative Distribution Functions Edwards 2000

Normal Plot of 30 Observations from a Normal Distribution Edwards 2000

Normal Plots from Non-normally Distributed Data

Grubbs’ Text is a Statistical Test to determine if a point is an outlier.

Grubbs’ test for outlier detection: T n = (Y n – Y bar )/S where Y n is the possible outlier, Y bar is the mean of the sample, and S is the standard deviation of the sample Contamination exists if T n is greater than T.01[n]

Example of Grubbs’ test for outliers: rainfall in acre-feet from seeded clouds (Simpson et al. 1975) T 26 = > 3.029; Contaminated Edwards 2000 But Grubbs’ test is sensitive to non-normality……

Edwards 2000 Checking Assumptions on Rainfall Data Contaminated Normally distributed

Simple Linear Regression: check for model-based…… Outliers Influential (leverage) points

Influential points in simple linear regression: A “leverage” point is a point with an unusual regressor value that has more weight in determining regression coefficients than the other data values. Edmonds 2000

Edwards 2000 Influential Data Points in a Simple Linear Regression

Edwards 2000 Influential Data Points in a Simple Linear Regression

Edwards 2000 Influential Data Points in a Simple Linear Regression

Edwards 2000 Influential Data Points in a Simple Linear Regression

Edwards 2000 Brain weight vs. body weight, 63 species of terrestrial mammals Leverage pts. “Outliers”

Edwards 2000 Logged brain weight vs. logged body weight “Outliers”

Outliers in simple linear regression Observation 62

Outliers: identify using studentized residuals Contamination may exist if |r i | > t  /2, n-3  = 0.01

Simple linear regression: Outlier identification n = 86 t  /2,83 = 1.98

Simple linear regression: detecting leverage points h i = (1/n) + (x i – x) 2 /(n-1)S x 2 A point is a leverage point if h i > 4/n, where n is the number of points used to fit the regression

Regression with leverage point: Soil nitrate vs. soil moisture

Regression without leverage point Observation 46

Output from SAS: Leverage points n = 336 h i cutoff value: 4/336=0.012

Process from raw data to verified data to validated data (reviewed by a qualified scientist) is called data maturity, and implies increasing confidence in the quality of the data through time

Data Quality (Confidence in the Data) Data Maturity Raw Data Verified Data Validated Data

Questions?