How Many Samples do I Need? Part 2

Slides:



Advertisements
Similar presentations
1 of 45 How Many Samples do I Need? Part 3 Presenter: Sebastian Tindall 60 minutes DQO Training Course Day 1 Module 6.
Advertisements

Chapter 7 Statistical Data Treatment and Evaluation
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
MARLAP Measurement Uncertainty
Confidence Intervals for Proportions
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 18, Slide 1 Chapter 18 Confidence Intervals for Proportions.
Topic 7 Sampling And Sampling Distributions. The term Population represents everything we want to study, bearing in mind that the population is ever changing.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Chapter Sampling Distributions and Hypothesis Testing.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
1 Seventh Lecture Error Analysis Instrumentation and Product Testing.
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
BA 427 – Assurance and Attestation Services
DQOs and the Development of MQOs Carl V. Gogolak USDOE Environmental Measurements Lab.
Determining the Size of
Chapter 19: Confidence Intervals for Proportions
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents in a year, stock market returns, time between el nino.
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
1 of 25 The EPA 7-Step DQO Process Step 5 - Define Decision Rules 15 minutes Presenter: Sebastian Tindall DQO Training Course Day 2 Module 14.
Determining Sample Size
 The situation in a statistical problem is that there is a population of interest, and a quantity or aspect of that population that is of interest. This.
Estimation of Statistical Parameters
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
PARAMETRIC STATISTICAL INFERENCE
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
I Introductory Material A. Mathematical Concepts Scientific Notation and Significant Figures.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
1 of 49 Key Concepts Underlying DQOs and VSP DQO Training Course Day 1 Module minutes (75 minute lunch break) Presenter: Sebastian Tindall.
1 of 45 How Many Samples do I Need? Part 1 Presenter: Sebastian Tindall 60 minutes (15 minute 1st Afternoon Break) DQO Training Course Day 1 Module 4.
1 of 37 Key Concepts Underlying DQOs and VSP DQO Training Course Day 1 Module 4 (60 minutes) (75 minute lunch break) Presenter: Sebastian Tindall.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
1 of 50 The EPA 7-Step DQO Process Step 7 - Optimize Sample Design 60 minutes Presenter: Sebastian Tindall DQO Training Course Day 3 Module 16.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Introduction to the EPA 7-Step DQO Process
Chapter Thirteen Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
1 of 36 The EPA 7-Step DQO Process Step 6 - Specify Error Tolerances (60 minutes) (15 minute Morning Break) Presenter: Sebastian Tindall DQO Training Course.
Academic Research Academic Research Dr Kishor Bhanushali M
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
1 of 27 The EPA 7-Step DQO Process Step 5 - Define Decision Rules (15 minutes) Presenter: Sebastian Tindall Day 2 DQO Training Course Module 5.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Sampling and estimation Petter Mostad
1 of 39 How Many Samples do I Need? Part 3 Presenter: Sebastian Tindall (50 minutes) (5 minute “stretch” break) DQO Training Course Day 1 Module 6.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
Statistical Techniques
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
1 of 19 Managing Uncertainty with Systematic Planning for Environmental Decision-Making 3-Day DQO Training Day 2.
1 of 27 How Many Samples do I Need? Part 2 Presenter: Sebastian Tindall (60 minutes) (5 minute “stretch” break) DQO Training Course Day 1 Module 5.
1 of 31 The EPA 7-Step DQO Process Step 6 - Specify Error Tolerances 60 minutes (15 minute Morning Break) Presenter: Sebastian Tindall DQO Training Course.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
1 of 48 The EPA 7-Step DQO Process Step 6 - Specify Error Tolerances 3:00 PM - 3:30 PM (30 minutes) Presenter: Sebastian Tindall Day 2 DQO Training Course.
Statistics 19 Confidence Intervals for Proportions.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
Sampling and Sampling Distribution
Confidence Intervals for Proportions
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Confidence Intervals for Proportions
Confidence Intervals for Proportions
Introduction to Instrumentation Engineering
Confidence Intervals for Proportions
Lecture # 2 MATHEMATICAL STATISTICS
Confidence Intervals for Proportions
Presentation transcript:

How Many Samples do I Need? Part 2 DQO Training Course Day 1 Module 5 How Many Samples do I Need? Part 2 Presenter: Sebastian Tindall 60 minutes (15 minute 2nd Afternoon Break)

 Summary MASSIVE DATA Required Use Classical Statistical sampling approach: Very likely to fail to get representative data in most cases Use Other Statistical sampling approaches: Bayesian Geo-statistics Kriging Use M-Cubed Approach: Based on Massive FAM Use Multi-Increment sampling approach: Can use classical statistics Cheaper Faster Defensible: restricted to surfaces (soils, sediments, etc.)  MASSIVE DATA Required

[µ – AL] ≥ 3 σ then almost never fail (Run Simulations) Use Classical Statistical sampling approach: Very likely to fail to get representative data in most cases, except if… [µ – AL] ≥ 3 σ then almost never fail (Run Simulations)

Uncertainty is Additive! Remember the uncertainty is additive for all steps in sampling and analysis Analytical + Sampling & Sub-sampling + Natural heterogeneity of the site = Total Uncertainty

What is the one phenomenon that causes ALL sampling error? HETEROGENEITY

The SYSTEM functions as if it believes that… Prescriptive Analytical Methods = { Uncertainty Automatically Managed Decision Quality = { Decision Uncertainty Automatically Managed { Data Uncertainty Automatically Managed Data Quality

Representative Sample Take-Home Message Non- Representative Sample Perfect Analytical Chemistry + “BAD” DATA

Representativeness Diamond Ring Costume Jewelry Can an analyst tell the difference? Yes.

Representativeness Representative Soil Sample Non-Representative Soil Sample Can an analyst tell the difference? No.

Sample vs. Analytical Certainty TOTAL ERROR Sampling = 95% 331 Onsite 286 Lab 500 Onsite 416 Lab 7 2 39,800 Onsite 41,400 Lab 164 Onsite 136 Lab 1,280 Onsite 1,220 Lab 6 1 3 5 4 24,000 Onsite 27,700 Lab 27,800 Onsite 42,800 Lab Note: Above sample locations are 12” apart

Dilemma! None of the equations for the number of samples, or the average, or the standard deviation include a term for size: Area or Volume Some guidance suggests 1 sample/20 cu yd but this is indefensible Must decide on the scale of the decision or exposure unit to represent the population of interest Must sample within the scale of the decision unit

Typical Sampling Design EPA “Methods for Evaluating the Attainment of Soil Cleanup Standards - Vol 1”, 1989 Equation 6.6 Estimate of σ usually way off or unknown Wrong Often Assumed Normal Distribution

Typical Sampling Design (cont.) 1. Will usually fail to truly capture heterogeneity…. of population(s) 2. Results in large uncertainty which is seldom: - Identified - Quantified - or even Acknowledged

Uncertainty Mo = Md = Mn Mo  Md  Mn Lognormal Normal % of time when x < m is high, (when n is small) M0 = mode Md = median Mn = mean

Classical Statistics Burdens Required for each COPC within each Decision Unit: Reasonably accurate estimate of PDF (Histogram) Reasonably accurate estimate of the SD Correct selection of appropriate statistical sampling method (equation) Correct selection of appropriate statistical method (equation) for calculating a UCL All this is almost never possible and almost never done.

Classical Statistics Burdens Which UCL to use? Student’s t UCL Approximate Gamma UCL Adjusted Gamma UCL H-UCL (Lands Method) Chebyshev (MVUE) UCL CLT UCL Adj-CLT UCL (Adjusted for skewness) Mod-t UCL (Adjusted for skewness) Jackknife UCL Standard Bootstrap UCL Bootstrap-t UCL Hall's Bootstrap UCL Percentile Bootstrap UCL BCA Bootstrap UCL List of above UCLs taken from ProUCL

Classical Statistics Burdens Problem: Which UCL to use? Say you want to calculate an UCL on the average rainfall in your area for the purpose of building a dike to protect your town. So you get data for 5 out of a 100 years. You enter those 5 data points into ProUCL and use the 95% UCL it calculates. You build the dike. The next year the river overflows the dike and kills all the townsfolk. What happened? Answer: GIGO; your 5 data points did not include data for heavy rainfall years. (ProUCL uses bootstrap techniques on small data sets. But remember, Statistics cannot create information where there is none.)

Why Decisions are suspect Failure to define population accurately Failure to collect representative samples from the population of interest Failure to obtain representative data from the population of interest Failure to accurately determine the frequency distribution of the COPCs Failure to accurately determine the standard deviation of the COPCs Failure to select the appropriate statistical method for generating adequate samples Failure to use the appropriate UCL in making the decision

Definitions of Representativeness A sample collected in such a manner that the sampling error is less than a specified amount. A sample of a universe or whole that can be expected to exhibit the average properties of the universe or whole (40 CFR 260.10). A sample that answers a question about a population with a specified confidence Sampling for Environmental Activities, Envirostat, 2003

Definitions of Representativeness Representativeness expresses the degree to which sample data accurately and precisely represents a characteristic of a population, parameter variations at a sampling point, or an environmental condition. Representativeness is a qualitative parameter which is most concerned with the proper design of the sampling program. The representativeness criterion is best satisfied by making certain that sampling locations are selected properly and a sufficient number of samples are collected. Representativeness is addressed by describing sampling techniques and the rational used to select sampling locations. DQOs for Remedial Response Activities: Development Process, US EPA 1987

Definitions of Representativeness A sample is representative when it is taken by a selection method that is both accurate and reproducible. Thus representativeness is characterized by the absence of bias and an acceptable variance. As far as the author is aware, this is the only possible objective and scientific definition of representativeness. Sampling for Analytical Purpose, Pierre Gy, J. Wiley & Sons, 1998; pg 30

Definitions of Representativeness A correct sampling method is always structurally accurate. In addition, its variance is minimal so that its representativeness is maximal. Non-correct sampling is always structurally biased. It may be accurate over short periods, but these cannot be forecast and so are unusable. This makes the tests of accuracy recommended by certain standards (the so-called bias tests) not only useless but also dangerous as they offer a false sense of security. As well as having a negligible bias, representativeness requires reproducibility, i.e. a minimum variance, which itself depends on the quantitative properties of the sample (e.g. the mass and the number of increments). Sampling for Analytical Purpose, Pierre Gy, J. Wiley & Sons, 1998; pg 31

Definitions of Representativeness A sample is representative when the mean square, r2 SE , of Sampling Error (SE) is not larger than a certain standard of representativeness regarded as acceptable. Representativeness is the sum of the square of the mean of SE (mSE), and the variance of the SE (s2SE). r2 (SE)  m2 (SE) + s2 (SE) ≤ r2o (SE) Preparation of Soil Sampling Protocols: Sampling Techniques and Strategies, EPA/600/R-92/128, July 1992

Typical Values of Bias Primary sample (non-probabilistic): up to 1000% Secondary sample (probabilistic but incorrect): up 50% (and probably much more) Analysis: 0.1-1.0% Thus it is pointless and illusory to return an analytical result to three or four supposedly significant decimal places if the sample analyzed is insufficiently representative and even more pointless if it is biased. Sampling for Analytical Purpose, Pierre Gy, J. Wiley & Sons, 1998; pg 32

Concepts Homogeneous: when all its units are strictly identical to each other. Homogeneity is an abstract mathematical concept that does not exist in the real, material world. Heterogeneous: when all the units are not identical to each other. Heterogeneity is the only state in which a set of material units or groups of units can be observed in practice. Heterogeneity is seen as the sole source of all sampling errors Homogeneity is the inaccessible condition of zero Heterogeneity Sampling for Analytical Purpose, Pierre Gy, J. Wiley & Sons, 1998; pg 24-25

Quantity of Data Matters. Why? WARNING: The Statistician General has determined that drawing conclusions from insufficient data may be hazardous to your decisions. “Warning: The Statistician General has determined that drawing conclusions from insufficient data may be hazardous to your decisions.” Of course, there is no Statistician General, but the point is made.

Sample Size Rules of Thumb “Samples of less than 10 are usually too small to rely on sample estimates even in ‘nice’ parametric cases.” “In many practical contexts, the number 30 is used as a ‘minimum’ sample size.” M.R. Chernick in Bootstrap Methods: A Practitioner's Guide, 1999, pp. 150, 151. In the next few slides we will look at some rules of thumb regarding sample size. Michael Chernick is a biostatistician who has authored a comprehensive treatment of the bootstrap method of statistical resampling. His book provides a bootstrap bibliography with more than 1,600 references. The bootstrap method is one of the most powerful of the modern computer-intensive statistical methods. However, computer-intensive methods, like the bootstrap, also need data to work on. If computer-intensive methods need data, methods which replace one calculation with thousands, every statistical method needs it. Thus, quantity of data matters. Chernick, MR. 1999. Bootstrap Methods: A Practitioner's Guide. John Wiley & Sons, New York, pp. 150, 151.

Sample Size Rules of Thumb In order to choose a specific classical statistical method (equation) information regarding the distribution of the contaminant within the decision unit is usually required. Such information allows one to select a method and calculate the number of sample needed to meet the specified error tolerances, providing a reasonably accurate estimate of the variance in known. However, certain assumptions must be presented and TESTED in order to show the selected method was appropriate. These tests are performed using data generated from the sampling event, i.e, AFTER sampling has occurred. Herein lies the requirement for ~30-50 or more samples. It is usually not possible to make definite statements (e.g. frequency distribution) with small sample sizes. If the tests fail, then the sampling results are in jeopardy and the data maybe invalidated, which could lead to another round of sampling. In the next few slides we will look at some rules of thumb regarding sample size. Michael Chernick is a biostatistician who has authored a comprehensive treatment of the bootstrap method of statistical resampling. His book provides a bootstrap bibliography with more than 1,600 references. The bootstrap method is one of the most powerful of the modern computer-intensive statistical methods. However, computer-intensive methods, like the bootstrap, also need data to work on. If computer-intensive methods need data, methods which replace one calculation with thousands, every statistical method needs it. Thus, quantity of data matters. Chernick, MR. 1999. Bootstrap Methods: A Practitioner's Guide. John Wiley & Sons, New York, pp. 150, 151.

Sample Size Rules of Thumb (continued) The sampling data is presented graphically (usually in the form of a histogram) in order to assess the distribution of the contaminant. Based on the distribution of the contaminant, a method to calculate an UCL follows. It is inappropriate to calculate a 95% UCL using the method based on a normal distribution if Data Quality Assessment cannot show that the contaminant is distributed normally. In the next few slides we will look at some rules of thumb regarding sample size. Michael Chernick is a biostatistician who has authored a comprehensive treatment of the bootstrap method of statistical resampling. His book provides a bootstrap bibliography with more than 1,600 references. The bootstrap method is one of the most powerful of the modern computer-intensive statistical methods. However, computer-intensive methods, like the bootstrap, also need data to work on. If computer-intensive methods need data, methods which replace one calculation with thousands, every statistical method needs it. Thus, quantity of data matters. Chernick, MR. 1999. Bootstrap Methods: A Practitioner's Guide. John Wiley & Sons, New York, pp. 150, 151.

Sample Size Rules of Thumb “Although it is always dangerous to set ‘rules of thumb’ for sample sizes, I would suggest that in most cases it would be wise to take n ≥ 50.” M.R. Chernick in Bootstrap Methods: A Practitioner's Guide, 1999, p. 151. “Although it is always dangerous to set ‘rules of thumb’ for sample sizes, I would suggest that in most cases it would be wise to take n ≥ 50.” Dr. Chernick also states why small sample sizes are a problem for bootstrap methods. “A main concern in small samples is that with only a few values to select from, the bootstrap sample will underrepresent the true variability as observations are frequently repeated and the bootstrap samples themselves repeat” (Chernick 1999, p. 150). This problem of a small sample not representing the population is by no means limited to bootstrap methods. It pervades all statistical methods. In essence, it is the problem of being forced to extrapolate when we really want to interpolate. Chernick, MR. 1999. Bootstrap Methods: A Practitioner's Guide. John Wiley & Sons, New York, p. 151.

Sample Size Rules of Thumb “For practical purposes it will be assumed here that a ‘too small number’ is less than 30, and a ‘large number’ is at least 50.” Pierre Gy in Sampling for Analytical Purposes, 1998, p. 70. Pierre Gy comes from a different perspective in statistics. He has been involved for decades in developing a theory of sampling motivated by his experiences with mining and industry. He says, “For practical purposes it will be assumed here that a ‘too small number’ is less than 30, and a ‘large number’ is at least 50.” Gy, P. 1998. Sampling for Analytical Purposes. John Wiley & Sons, New York, p. 70.

Sample Size Rules of Thumb “In practice, there appears to be no simple rule for determining how large n should be….If the distribution is highly skewed, an n of 50 or more may be required.” Richard Gilbert in Statistical Methods for Environmental Pollution Monitoring, 1987, p. 140. Dr. Richard Gilbert is the author of a widely-referenced standard text in the field of environmental statistics. He states, “In practice, there appears to be no simple rule for determining how large n should be….If the distribution is highly skewed, an n of 50 or more may be required.” Gilbert, R.O. 1987. Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York.

Quantity of Data Matters. Why? “If the sample size is ‘large’ then most traditional estimators will yield the same conclusions and simple estimators suffice.” H. Lacayo, Jr. in Environmental Statistics: Handbook of Statistics Volume 12, 1994, p. 891. We now look at a somewhat subtle advantage of larger sample sizes. At the time of the above publication, Dr. Lacayo was a Senior Statistician in EPA’s Office of Policy, Planning, and Evaluation. He states, “If the sample size is ‘large’ then most traditional estimators will yield the same conclusions and simple estimators suffice.” In other words, take enough samples and simple methods can be used to make defensible decisions. Since people unacquainted with statistical methods are usually involved with cleanup decisions, the advantage of simple methods, which can also be simply explained, should not be overlooked. Ultimately, when we have field methods that can essentially look at each population unit with reasonable accuracy, very simple statistical methods will suffice. Lacayo, H. 1994. Environmental Statistics: Handbook of Statistics, Volume 12. Edited by G.P. Patil and C.R. Rao. Elsevier Science B.V., Amsterdam.

“Lacking distribution information, it is impossible to devise an optimal sampling strategy.” - Jenkins, et. al. 1996. “Assessment of Sampling Error Associated with Collection and Analysis of Soil Samples at Explosives-Contaminated Sites” U.S. Army Corps of Engineers, Cold Regions Research & Engineering Laboratory, p. 1. http://www.crrel.usace.army.mil/techpub/CRREL_Reports/reports/SR96_15.pdf

Begin with the End in Mind DATA Contaminant Concentrations in the Spatial Distribution of the Population Population Frequency Distribution Correct Equation for n (Statistical Method) , , ,  Alternative Sample Designs Optimal Sampling Design How Many Samples do I Need? The end

Q: Where do you obtain the contaminant distribution information in order to select the correct sampling design to ensure representativeness, etc? A: From sampling data. Q: How much sampling data do you need? A: Depends upon the consequences of making the wrong decision.

Sample Representativeness Are we honestly addressing Heterogeneity (sampling uncertainty)? Now we are finally able to address this issue, defensibly and affordably! Use cheaper analytical technologies that allow you to increase sample density Use real-time measurements at the site of the sample to support real-time decision-making IF we are willing to honestly balance analytical uncertainty against overall data uncertainty Thanks to technologies that are relatively new to the environmental field, we can begin to address the problem of sample representativeness. One aspect of these technologies that allows management of sampling uncertainty is the ability to run many more samples because per test costs are lower. Another aspect is that many, although not all, of these technologies can be run in the field. This saves sample preservation, transportation, and storage costs. But most importantly, real-time testing results support real-time decision-making, which offers a whole host of benefits—that I do not have time to go into in this talk. Certain field analytical technologies, such as field-portable GC/MS can be operated as definitively as any lab-based GC/MS. But many field technologies are truly based on screening analytical methods, such as immunoassays, cell receptor assays, or colorimetric kits. And immediately, that is where language problems begin to cause problems with acceptance—because I have used the word “screening.” But first we have to overcome many regulatory and perceptual obstacles that limit acceptance of the new technologies. Many of these obstacles are built into the language that we use. What I want to do is make you aware of the conceptual traps in our terminology, so we can start using language that avoids ambiguity--that leaves no room for misconceptions.

Real-Time Measurement Technologies Managing Uncertainty Systematic Planning Dynamic Work Plan Real-Time Measurement Technologies

Managing Uncertainty Systematic planning Identify decision goals w/ tolerable overall uncertainty Identify major uncertainties (cause decision error) Identify strategy to manage each major uncertainty Use the Field Analytical Method (FAM) and a Dynamic Work Plan (DWP) to effectively manage sampling uncertainty (ensure sample representativeness)

Please be back in 15 minutes End of Module 5 Thank you Questions? We will now take a 15 minute break. Please be back in 15 minutes