Quality Assurance & Quality Control

Slides:



Advertisements
Similar presentations
Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2010.
Advertisements

Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Design of Experiments Lecture I
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Quartiles  Divide data sets into fourths or four equal parts. Smallest data value Q1Q2Q3 Largest data value 25% of data 25% of data 25% of data 25% of.
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Introduction to Control Charts.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Descriptive Statistics In SAS Exploring Your Data.
Regression Diagnostics Using Residual Plots in SAS to Determine the Appropriateness of the Model.
8-1 Quality Improvement and Statistics Definitions of Quality Quality means fitness for use - quality of design - quality of conformance Quality is.
Lecture 20 Simple linear regression (18.6, 18.9)
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
Total Quality Management BUS 3 – 142 Statistics for Variables Week of Mar 14, 2011.
Control charts : Also known as Shewhart charts or process-behaviour charts, in statistical process control are tools used to determine whether or not.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
Chapter 13 Quality Control and Improvement COMPLETE BUSINESS STATISTICSby AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 7th edition. Prepared by Lloyd Jaisingh,
Quality Assessment 2 Quality Control.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Choosing and using statistics to test ecological hypotheses
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Quality Assurance & Quality Control Kristin Vanderbilt, Ph.D. Sevilleta LTER.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
AP Stat Review Descriptive Statistics Grab Bag Probability
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
Managing and Curating Data Chapter 8. Introduction Data organization Data management Data curation Raw data is required to repeat a scientific study Any.
Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don.
DEVA Data Management Workshop Devil’s Hole Pupfish Project Quality Assurance.
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
PCB 3043L - General Ecology Data Analysis.
Quality Control  Statistical Process Control (SPC)
1 Project Quality Management QA and QC Tools & Techniques Lec#10 Ghazala Amin.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
10 March 2016Materi ke-3 Lecture 3 Statistical Process Control Using Control Charts.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Water quality data - providing information for management
Exploratory Data Analysis
BAE 6520 Applied Environmental Statistics
Inference for Least Squares Lines
PCB 3043L - General Ecology Data Analysis.
Description of Data (Summary and Variability measures)
Bar graphs are used to compare things between different groups
Statistical Methods For Engineers
Numerical Measures: Skewness and Location
Representation of Data
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 3rd Edition
Quality Control Lecture 3
Essentials of Statistics for Business and Economics (8e)
Basic Training for Statistical Process Control
Presentation transcript:

Quality Assurance & Quality Control

References Primary References Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell Science. Edwards (2000) Ch. 4 Brunt (2000) Ch. 2 Michener (2000) Ch. 7 Dux, J.P. 1986. Handbook of Quality Assurance for the Analytical Chemistry Laboratory. Van Nostrand Reinhold Company Mullins, E. 1994. Introduction to Control Charts in the Analytical Laboratory: Tutorial Review. Analyst 119: 369 – 375. Grubbs, Frank (February 1969), Procedures for Detecting Outlying Observations in Samples, Technometrics, Vol. 11, No. 1, pp. 1-21.

Outline: Define QA/QC QC procedures QA procedures Designing data sheets Data entry using validation rules, filters, lookup tables QA procedures Graphics and Statistics Outlier detection Samples Simple linear regression

QA/QC “mechanisms [that] are designed to prevent the introduction of errors into a data set, a process known as data contamination” Brunt 2000

Errors (2 types) Commission: Incorrect or inaccurate data in a dataset Can be easy to find Malfunctioning instrumentation Sensor drift Low batteries Damage Animal mischief Data entry errors Omission Difficult or impossible to find Inadequate documentation of data values, sampling methods, anomalies in field, human errors

Quality Control “mechanisms that are applied in advance, with a priori knowledge to ‘control’ data quality during the data acquisition process” Brunt 2000

Quality Assurance “mechanisms [that] can be applied after the data have been collected, entered in a computer and analyzed to identify errors of omission and commission” graphics statistics Brunt 2000

QA/QC Activities Defining and enforcing standards for formats, codes, measurement units and metadata. Checking for unusual or unreasonable patterns in data. Checking for comparability of values between data sets. Brunt 2000 QA/QC encompasses a suite of activities.

Outline: Define QA/QC QC procedures QA procedures Designing data sheets Data entry using validation rules, filters, lookup tables QA procedures Graphics and Statistics Outlier detection Samples Simple linear regression

Flowering Plant Phenology – Data Entry Form Design Four sites, each with 3 transects Each species will have phenological class recorded

What’s wrong with this data sheet? Data Collection Form Development: What’s wrong with this data sheet? Plant Life Stage ______________ _______________

Plant Life Stage PHENOLOGY DATA SHEET Collectors:_________________________________ Date:___________________ Time:_________ Location: black butte, deep well, five points, goat draw Transect: 1 2 3 Notes: _________________________________________ Plant Life Stage ardi P/G V B FL FR M S D NP arpu P/G V B FL FR M S D NP atca P/G V B FL FR M S D NP bamu P/G V B FL FR M S D NP zigr P/G V B FL FR M S D NP P/G V B FL FR M S D NP P/G = perennating or germinating M = dispersing V = vegetating S = senescing B = budding D = dead FL = flowering NP = not present FR = fruiting

Collectors Troy Maddux Date: 16 May 1991 Time: 13:12 PHENOLOGY DATA ENTRY Collectors Troy Maddux Date: 16 May 1991 Time: 13:12 Location: Deep Well Transect: 1 Notes: Cloudy day, 3 gopher burrows on transect ardi P/G V B FL FR M S D NP Y N arpu P/G V B FL FR M S D NP Y N asbr P/G V B FL FR M S D NP Y N deob P/G V B FL FR M S D NP Y N

Outline: Define QA/QC QC procedures QA procedures Designing data sheets Data entry using validation rules, filters and lookup tables QA procedures Graphics and Statistics Outlier detection Samples Simple linear regression

Validation Rules: Control the values that a user can enter into a field

Validation Rule Examples: >= 10 Between 0 and 100 Between #1/1/70# and Date()

Validation rules in MS Access: Enter in Table Design View >=0 and <= 100

Look-up Fields Display a list of values from which entry can be selected

Look-up Tables in MS Access: Enter in Table Design View

Macros Validate data based on conditional statements

You want to make sure that a value for vegetation cover is entered in every record. To do this, create a macro called “NoData” that will examine the contents of the cover field whenever the field is exited.

This is what the “NoData” Macro looks like. IsNull([Forms]![Vegetation_data]![Cover]) This is what the “NoData” Macro looks like.

Attach the macro to the “On Exit Event” in the Text Box: Cover Properties Window

When the user tabs out of the “cover” field without entering data, this message box flashes to the screen.

Other methods for preventing data contamination Double-keying of data by independent data entry technicians followed by computer verification for agreement Use text-to-speech program to read data back Filters for “illegal” data Computer/database programs Legal range of values Sanity checks Edwards 2000

Flow of Information in Filtering “Illegal” Data Raw Data File Illegal Data Filter Table of Possible Values and Ranges Report of Probable Errors Edwards 2000

Data set ‘tree’ A filter written in SAS Error report Data Checkum; Set tree; Message=repeat(“ ”,39); If cover<0 or cover>100 then do; message=“cover is not in the interval [0,100]”; output; end; If dbh_1998>dbh_1999 then do; message=“dbh_1998 is larger than dbh_1999”; output; end; If message NE repeat(“ ”, 39); Keep ID message; Proc Print Data=Checkum; A filter written in SAS Error report Obs ID message 1 a dbh_1998 is larger than dbh_1999 2 b cover is not in the interval [0,100] 3 c dbh_1998 is larger than dbh_1999 4 d cover is not in the interval [0,100]

Outline: Define QA/QC QC procedures QA procedures Designing data sheets Data entry using validation rules, filters, lookup tables QA procedures Graphics and Statistics Outlier detection Samples Simple linear regression

Identifying Sensor Errors: Comparison of data from three Met stations, Sevilleta LTER

Identification of Sensor Errors: Comparison of data from three Met stations, Sevilleta LTER

QA/QC in the Lab: Using Control Charts

Laboratory quality control using statistical process control charts: Determine whether analytical system is “in control” by examining Mean Variability (range)

All control charts have three basic components: a centerline, usually the mathematical average of all the samples plotted. upper and lower statistical control limits that define the constraints of common cause variations. performance data plotted over time.

X-Bar Control Chart UCL Mean LCL Time

Constructing an X-Bar control chart Each point represents a check standard run with each group of 20 samples, for example UCL = mean + 3*standard deviation

Things to look for in a control chart: The point of making control charts is to look at variation, seeking patterns or statistically unusual values. Look for: 1 data point falling outside the control limits 6 or more points in a row steadily increasing or decreasing 8 or more points in a row on one side of the centerline 14 or more points alternating up and down

Linear trend

Outline: Define QA/QC QC procedures QA procedures Designing data sheets Data entry using validation rules, filters, lookup tables QA procedures Graphics and Statistics Outlier detection Samples Simple linear regression

Outliers An outlier is “an unusually extreme value for a variable, given the statistical model in use” The goal of QA is NOT to eliminate outliers! Rather, we wish to detect unusually extreme values. Edwards 2000

Outlier Detection “the detection of outliers is an intermediate step in the elimination of [data] contamination” Attempt to determine if contamination is responsible and, if so, flag the contaminated value. If not, formally analyse with and without “outlier(s)” and see if results differ.

Methods for Detecting Outliers Graphics Scatter plots Box plots Histograms Normal probability plots Formal statistical methods Grubbs’ test Edwards 2000

X-Y scatter plots of gopher tortoise morphometrics Michener 2000

Example of exploratory data analysis using SAS Insight® Michener 2000

Box Plot Interpretation Pts. > upper adjacent value Upper adjacent value Upper quartile Inter-quartile range Median Lower quartile Lower adjacent value Pts. < lower adjacent value

Box Plot Interpretation IQR = Q(75%) – Q(25%) Upper adjacent value = largest observation < (Q(75%) + (1.5 X IQR)) Lower adjacent value = smallest observation > (Q(25%) - (1.5 X IQR)) Extreme outlier: > 3 X IQR beyond upper or lower adjacent values Inter-quartile range

Box Plots Depicting Statistical Distribution of Soil Temperature

Statistical tests for outliers assume that the data are normally distributed…. CHECK THIS ASSUMPTION!

Normal density and Cumulative Distribution Functions Edwards 2000

Normal Plot of 30 Observations from a Normal Distribution Edwards 2000

Normal Plots from Non-normally Distributed Data Edwards 2000

Grubbs’ test for outlier detection in a univariate data set: Tn = (Yn – Ybar)/S where Yn is the possible outlier, Ybar is the mean of the sample, and S is the standard deviation of the sample Contamination exists if Tn is greater than T.01[n]

But Grubbs’ test is sensitive to non-normality…… Example of Grubbs’ test for outliers: rainfall in acre-feet from seeded clouds (Simpson et al. 1975) 4.1 7.7 17.5 31.4 32.7 40.6 92.4 115.3 118.3 119.0 129.6 198.6 200.7 242.5 255.0 274.7 274.7 302.8 334.1 430.0 489.1 703.4 978.0 1656.0 1697.8 2745.6 T26 = 3.539 > 3.029; Contaminated Edwards 2000 But Grubbs’ test is sensitive to non-normality……

Checking Assumptions on Rainfall Data Contaminated Normally distributed Edwards 2000

Simple Linear Regression: check for model-based…… Outliers Influential (leverage) points

Influential points in simple linear regression: A “leverage” point is a point with an unusual regressor value that has more weight in determining regression coefficients than the other data values. An “outlier” is an observation with a response value that does not fit the X-Y pattern found in the rest of the data. Edmonds 2000

Influential Data Points in a Simple Linear Regression Edwards 2000

Influential Data Points in a Simple Linear Regression Edwards 2000

Influential Data Points in a Simple Linear Regression Edwards 2000

Influential Data Points in a Simple Linear Regression Edwards 2000

Brain weight vs. body weight, 63 species of terrestrial mammals Leverage pts. “Outliers” Edwards 2000

Logged brain weight vs. logged body weight “Outliers” Edwards 2000

Outliers in simple linear regression Observation 62

Outliers: identify using studentized residuals Contamination may exist if |ri| > t /2, n-3  = 0.01

Simple linear regression: Outlier identification

Simple linear regression: detecting leverage points hi = (1/n) + (xi – x)2/(n-1)Sx2 A point is a leverage point if hi > 4/n, where n is the number of points used to fit the regression

Regression with leverage point: Soil nitrate vs. soil moisture

Regression without leverage point Observation 46

Output from SAS: Leverage points hi cutoff value: 4/336=0.012

Questions?