Module 2: Statistical Approaches to Assess the Current Performance of a Group Michael Crow (Independent Consultant to U.S. EPA) For the Workshop: Using.

Slides:



Advertisements
Similar presentations
Estimating a Population Proportion
Advertisements

Chapter 10: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 10: Estimating with Confidence
Module 2: Introduction to ERP Statistical Concepts and Tools Common Measures Training Chelmsford, MA September 28, 2006.
Inference: Confidence Intervals
Confidence Intervals for Proportions
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 18, Slide 1 Chapter 18 Confidence Intervals for Proportions.
Review: What influences confidence intervals?
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Today Concepts underlying inferential statistics
BCOR 1020 Business Statistics
Chapter 10: Estimating with Confidence
The Sampling Distribution Introduction to Hypothesis Testing and Interval Estimation.
Module 1: Why Use Statistics? An Introduction to the ERP Case Study Steve DeGabriele (Massachusetts DEP) For the Workshop: Using Essential Statistics for.
Chapter 19: Confidence Intervals for Proportions
Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing.
Standard Error and Research Methods
Determining Sample Size
Chapter 11: Estimation Estimation Defined Confidence Levels
Estimation of Statistical Parameters
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
PARAMETRIC STATISTICAL INFERENCE
+ Warm-Up4/8/13. + Warm-Up Solutions + Quiz You have 15 minutes to finish your quiz. When you finish, turn it in, pick up a guided notes sheet, and wait.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Confidence Intervals for Proportions Chapter 8, Section 3 Statistical Methods II QM 3620.
1 MARKETING RESEARCH Week 5 Session A IBMS Term 2,
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
1 Chapter 6 Estimates and Sample Sizes 6-1 Estimating a Population Mean: Large Samples / σ Known 6-2 Estimating a Population Mean: Small Samples / σ Unknown.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Estimating with Confidence Section 10.1 Confidence Intervals: The Basics.
Section 10.1 Confidence Intervals
Module 3: Using Statistics for Benchmarking and Comparison Common Measures Training Chelmsford, MA September 28, 2006.
+ “Statisticians use a confidence interval to describe the amount of uncertainty associated with a sample estimate of a population parameter.”confidence.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
 The point estimators of population parameters ( and in our case) are random variables and they follow a normal distribution. Their expected values are.
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Chapter 19 Confidence intervals for proportions
Inference: Probabilities and Distributions Feb , 2012.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
1 Probability and Statistics Confidence Intervals.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Sample Size Mahmoud Alhussami, DSc., PhD. Sample Size Determination Is the act of choosing the number of observations or replicates to include in a statistical.
As a data user, it is imperative that you understand how the data has been generated and processed…
Statistics 19 Confidence Intervals for Proportions.
CHAPTER 8 (4 TH EDITION) ESTIMATING WITH CONFIDENCE CORRESPONDS TO 10.1, 11.1 AND 12.1 IN YOUR BOOK.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Chapter 9 Introduction to the t Statistic
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Chapter 8: Estimating with Confidence
Unit 4 – Inference from Data: Principles
Chapter 8: Estimating with Confidence
Sampling Distributions and Estimation

Confidence Intervals: The Basics
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Confidence Intervals for Proportions
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Confidence Intervals for Proportions
Chapter 8: Estimating with Confidence
Presentation transcript:

Module 2: Statistical Approaches to Assess the Current Performance of a Group Michael Crow (Independent Consultant to U.S. EPA) For the Workshop: Using Essential Statistics for Effective Program Management Presented at: 2008 Symposium on Innovating for Sustainable Results Chapel Hill, North Carolina January 7, 2008

Overview This module prepares users to design and analyze the results of single random samples, assessing the performance of groups of facilities at a particular point in time. The module covers: Key statistical concepts Statistical points to remember ERP statistical resources Tour of ERP spreadsheet tools Hands-on exercises: Single-sample analysis

Key Concepts: One-Sample Analysis “Observed” proportions Margin of error/confidence interval Standard deviation Confidence level To help illustrate these concepts, we will use an imaginary ERP for gas stations being conducted by the state of “Massasota”

Imaginary Gas Station ERP in Massasota Massasota is a state where regulators believe all facilities are below average The state is conducting a multimedia ERP for gas stations with underground storage tanks Massasota has 5,000 such facilities The state conducts 90 baseline inspections to estimate sector performance at the start of ERP Inspectors use a detailed checklist Over 200 individual potential answers Focus on the answers for 20 Environmental Business Practice Indicators (EBPIs)

How Massasota Describes Inspector Findings Inspectors grade each facility on whether it is achieving its EBPIs Massasota calculates “observed proportions” of facilities achieving EBPIs E.g., 50% of gas stations in the sample are in full compliance with leak detection requirements Measured by the “leak detection” EBPI that rolls up all leak detection-related responses Called “observed” proportions because the results were actually measured, not inferred through statistics

Illustration of an Observed Proportion 50% of gas stations in Massasota’s sample are in compliance with leak detection requirements

Margin of Error and Confidence Interval The observed proportion of 50% is accurate if we are only talking about the sample For the population as a whole, there’s error associated with the result We can’t be sure these sampled facilities are exactly representative of the population The margin of error and confidence interval express the uncertainty associated with that observed proportion

Massasota’s Margin of Error/Confidence Interval, Leak Detection EBPI Let’s say Massasota’s margin of error for the leak detection EBPI is +/-10 percentage points Therefore, we believe the percentage of all gas stations in full compliance with leak detection requirements is 50%, +/-10 percentage points. Alternately, we can say that we believe the percentage is between 40% and 60%. The range of 40% to 60% is called the confidence interval; i.e., the interval that we are fairly confident contains the true population proportion Recognize that the confidence interval is twice the absolute value of the margin of error Margin of error is sometimes called “the half-width of the confidence interval”

Illustration of Margin Of Error/Confidence Interval 50% of gas stations in Massasota’s ERP population, +/- approximately 10 percentage points, are in full compliance with leak detection requirements

Choosing the Right Confidence Interval for You As discussed later in this module, you can influence the size of your confidence interval. For instance, you can increase the number of your inspections to get a smaller confidence interval. Questions to think about: Confidence interval may seem to be a wide range, but is it tight enough to make decisions? Would your actions be different if the true population proportion was 40% versus 60%? Now that you understand confidence intervals, time for a joke...

The Flexible Confidence Interval Confidence intervals can be established for many kinds of measures and levels of analysis. E.g., Indicator proportions (already discussed) Indicator means (covered in next two slides) Group compliance score E.g., average facility achieved 78% of relevant, compliance- related EBPIs, +/-12 percentage points Outcome measure E.g., reduction of 2,000 tons of VOC emissions (+/-250 tons), as a result of improved compliance with vapor recovery requirements Certification accuracy E.g., an estimated 68%-76% of certification responses agree with inspector findings

Confidence Intervals for Means For simplicity, our training focuses on proportions, which are used for yes/no questions E.g., is the facility in compliance with all leak detection requirements? Sample proportion = Y / (Y + N ) Population result: 50% compliance, +/-10 percentage points (or, 40%-60%) Confidence intervals can also be developed for means (averages) of quantities E.g., what is the facility’s average monthly generation of hazardous waste? Mean = total of all monthly averages / facilities in sample Population result: 158 pounds per month, +/-26 pounds (or, pounds per month)

Standard Deviation (for Means) Confidence interval for mean requires: Mean (average) of all sample observations Standard deviation of all sample observations Standard deviation is a measure of variability among observations Tightly packed around the mean? Or widely distributed? Easily calculated in Microsoft Excel or stat packages

Confidence Level Confidence you have that the confidence interval includes the true population performance E.g., that the percentage of all Massasota’s gas stations in full compliance with leak detection requirements is actually between 40% and 60% You choose the level you want: 90% or 95% or 99% Each increase leads to much larger sample sizes 95% most common in ERP to date 99% uncommon in this field

Confidence Level In our prior example, we might say… “We have 95% confidence that the number of Massasota gas stations in compliance with leak detection requirements is between 40% and 60%.” OR “We are 90% confident that the number of Massasota gas stations in compliance with leak detection requirements is between 41.5% and 58.5%.”

Confidence Level (Continued) 90% means the interval for 9 out of 10 samples will include the true answer Statisticians consider this a low, but acceptable, confidence level Much better than no understanding of uncertainty Confidence interval wrong 10% of the time Might be best for smaller samples 95% means the interval for 19 out of 20 samples will include the true answer Confidence interval wrong 5% of the time Twice as accurate as 90%, but interval is wider Most ERPs have used 95% confidence level Error rates may seem large, but only alternative is census

Illustrating the Meaning of a 90% Confidence Level Confidence interval for this sample does not contain population proportion. Imagine Massasota undertook 10 separate random samples to estimate baseline performance of its gas stations. Each yellow bar represents the confidence interval associated with the leak detection EBPI for that sample. The line in the middle of the yellow bar indicates the midpoint of the confidence interval. The black line running up from the bottom of the page shows the true population proportion. In Massasota, let’s imagine that the true percentage of all gas stations achieving full compliance with leak detection requirements is 42%. You can see that one of Massasota’s 10 samples does not contain 42% within the confidence interval. With a 90% confidence level, there’s a 10% chance that Massasota would make decisions based upon this incorrect confidence interval.

Statistical Points to Remember Statistics has surprising economies of scale Higher confidence requires more inspections Smaller “effective” sample sizes can decrease the confidence associated with results

Surprising Economies of Scale with Statistics For larger and larger population sizes, the incremental number of inspections needed is proportionally fewer E.g., the sample size required for a margin of error of approximately +/-12 percentage points is: 50, for 200 facilities; 64, for 2,000 facilities; and 66, for 20,000 facilities

Greater Certainty Requires More Inspections A substantial increase in sample size is required—at all population sizes—to reduce margin of error from +/-10 to +/-5 percentage points (at a 95% confidence level). In our example, Massasota would need to increase its sample size from 90 facilities to about 350 facilities. Margin of error can also be reduced by reducing the confidence level.

Smaller “Effective” Sample Sizes Can Reduce Certainty Sometimes, EBPIs won’t apply to all facilities in the population So, you can’t expect them to apply to all facilities in your sample In those cases, your “effective” sample size— which includes all facilities for which the EBPI is relevant—is the basis for calculating confidence intervals, which will most likely be larger than anticipated

Example: Smaller “Effective” Sample Sizes Can Reduce Certainty Imagine that Massasota found 50% of facilities in compliance with an EBPI related to underground injection control requirements However, those requirements only applied to 20 facilities of the 90 inspected Massasota’s margin of error for this EBPI is +/-20 percentage points (95% confidence level) Larger than the +/-10 percentage points for the leak detection EBPI discussed earlier (also at a 95% confidence level)

ERP Statistical Resources EPA and states have created a number of valuable statistical reference documents and tools Reference documents, such as: Generic Guide to Statistical Aspects of Implementing an Environmental Results Program Specific statistical methodologies for multiple states, utilizing varying approaches Implementation tools, such as: Massachusetts DEP’s Omni Analyzer, an automated data storage and analysis system EPA’s ERP statistical spreadsheet tools, the Sample Planner and Results Analyzer (used in this training) Most are available at

ERP Statistical Tools Intent Learn while playing with the numbers Answer real questions people ask in ERP User-friendly for novice Ubiquitous platform: no purchase required Conservative assumptions Spreadsheets can be readily retrofitted and automated for a particular state

ERP Stat Tools: Questions Addressed by the Sample Planner How many inspections do I need to do? How confident will I be in data from X inspections?

ERP Stat Tools: Questions Address by the Results Analyzer What’s the confidence interval around my result? Did performance improve over time? How much? How are facilities in my state performing relative to another state?

ERP Statistical Tools Tour Let’s take a tour of the one-sample pages…

For more information… Michael Crow Work: Mobile: Fax: