Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007.

Slides:



Advertisements
Similar presentations
Comparing Two Proportions
Advertisements

Chapter 7: Data for Decisions Lesson Plan
Sampling Distributions and Sample Proportions
Statistics for Managers Using Microsoft® Excel 5th Edition
Sampling Distributions
Chapter 10: Sampling and Sampling Distributions
QBM117 Business Statistics Statistical Inference Sampling 1.
Research Methods in MIS: Sampling Design
Chapter 7 Sampling Distributions
Dr. Chris L. S. Coryn Spring 2012
Topics: Inferential Statistics
Who and How And How to Mess It up
Beginning the Research Design
Sampling.
Sampling and Sample Size Determination
PPA 415 – Research Methods in Public Administration Lecture 5 – Normal Curve, Sampling, and Estimation.
Why sample? Diversity in populations Practicality and cost.
Sampling and Randomness
PROBABILITY SAMPLING: CONCEPTS AND TERMINOLOGY
7-1 Chapter Seven SAMPLING DESIGN. 7-2 Sampling What is it? –Drawing a conclusion about the entire population from selection of limited elements in a.
Bootstrapping LING 572 Fei Xia 1/31/06.
PROBABILITY SAMPLING: CONCEPTS AND TERMINOLOGY
Understanding sample survey data
STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.
Sample Design.
AM Recitation 2/10/11.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
McGraw-Hill/Irwin McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
Sampling January 9, Cardinal Rule of Sampling Never sample on the dependent variable! –Example: if you are interested in studying factors that lead.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Sampling: Theory and Methods
 The situation in a statistical problem is that there is a population of interest, and a quantity or aspect of that population that is of interest. This.
Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.
PARAMETRIC STATISTICAL INFERENCE
Basic Sampling & Review of Statistics. Basic Sampling What is a sample?  Selection of a subset of elements from a larger group of objects Why use a sample?
Chapter 7: Data for Decisions Lesson Plan Sampling Bad Sampling Methods Simple Random Samples Cautions About Sample Surveys Experiments Thinking About.
Chapter 11 – 1 Chapter 7: Sampling and Sampling Distributions Aims of Sampling Basic Principles of Probability Types of Random Samples Sampling Distributions.
Sampling “Sampling is the process of choosing sample which is a group of people, items and objects. That are taken from population for measurement and.
The Logic of Sampling. Methods of Sampling Nonprobability samplesNonprobability samples –Used often in Qualitative Research Probability or random samplesProbability.
CHAPTER 17: Tests of Significance: The Basics
Chapter 7: Sampling and Sampling Distributions
Sampling Techniques 19 th and 20 th. Learning Outcomes Students should be able to design the source, the type and the technique of collecting data.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Learning Objectives Explain the role of sampling in the research process Distinguish between probability and nonprobability sampling Understand the factors.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
Sampling Methods, Sample Size, and Study Power
Data Collection & Sampling Dr. Guerette. Gathering Data Three ways a researcher collects data: Three ways a researcher collects data: By asking questions.
Chapter 10 Sampling: Theories, Designs and Plans.
Bangor Transfer Abroad Programme Marketing Research SAMPLING (Zikmund, Chapter 12)
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
Chapter 6 Conducting & Reading Research Baumgartner et al Chapter 6 Selection of Research Participants: Sampling Procedures.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
CHAPTER 7, THE LOGIC OF SAMPLING. Chapter Outline  A Brief History of Sampling  Nonprobability Sampling  The Theory and Logic of Probability Sampling.
Chapter 7 Data for Decisions. Population vs Sample A Population in a statistical study is the entire group of individuals about which we want information.
C1, L1, S1 Chapter 1 What is Statistics ?. C1, L1, S2 Chapter 1 - What is Statistics? A couple of definitions: Statistics is the science of data. Statistics.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Warsaw Summer School 2014, OSU Study Abroad Program Sampling Distribution.
Sampling Design and Procedure
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Sampling Why use sampling? Terms and definitions
SAMPLING (Zikmund, Chapter 12.
Meeting-6 SAMPLING DESIGN
Week Three Review.
SAMPLING (Zikmund, Chapter 12).
Presentation transcript:

Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007

Learning Objectives At the end of this session participants will be able to: List sampling techniques Estimate desired sample size Describe a resample and a bootstrap procedure

Why Sample? Expense Time Impossible/impractical Better data from a sample

We Want Representativeness Men Women W M Representative Not Representative Not Representative

Impediments to Representativeness Bias Systematic differences between sample and population Can be eliminated by good sample design Sample error Unavoidable differences due to chance in selection of a sample This is “dumb luck” It cannot be eliminated or avoided so we have to address it

Sampling Techniques Simple Random Sample Stratified Sample Cluster Sample Sequential Sample Non-Probability Sample/Convenience Sample

Think of Random Like This You want to identify each element of population Place a unique number from 1 to N on it Randomly select a number between 1 and N Find item with that number and measure it Replace number and repeat If same number comes up again, ignore it, replace it and choose again

How do you take a sample? Objectives of survey What question(s) are you trying to answer? What information do you need? ID target population Obtain sample frame Sample design Method of measurement Measurement instrument

How do you take a sample? Select and train field workers Pretest Organize field work Organize data management Data analysis

How big should a sample be? Trade-offs determine this precision (size of interval estimate) accuracy (capturing the value) sample size (what’s it going to cost?) What is important to your decision process? You pick any two and third is determined for you

Sample Size for Mean n is size of sample E is allowable error Precision z is z- value Accuracy (level of confidence) s is sample SD Pilot survey Guesstimate

Example Mean house value N = 501 E = $3000 z = 1.96 (95%) s = $10,000 n=[(1.96*10000)/3000] 2 =43

Sample Size for Proportion p is proportion w/ characteristic 1-p is proportion w/o characteristic Z and E as before

Example Proportion of homes with basement N=501 p=.5 1-p=.5 z=1.96 E=.05 n=.5*.5*(1.96/.03) 2 =1067

What happens when the population has less members than the sample size calculated requires? Step One : Calculate the sample size as before. n = n o noNnoN 1 + where n o is the sample size calculated in step one. Step Two : Calculate the new sample size.

What Happens if n > N? First, calculate the sample size as before. Second, calculate the new sample size using: n new =n old /[1+(n old /N)] n new =1067/[1+(1067/501)]=340

How n is Chosen in Practice Arbitrarily select a sample size As large a sample as you can get for a budget Pick a percentage for your sample Identify sample size required to obtain precision and accuracy desired!

With Good Samples…. We have classical statistical techniques that enable us to make inferences about the populations from which the samples were drawn Confidence intervals Hypothesis testing

Resampling Statistics is changing Computers make computational methods once inconceivable, possible Bootstrap Permutation tests Other resampling methods

Advantages of Resampling Fewer assumptions—normal and large n not required Greater accuracy—can be better than classical methods in some cases Generality—approach is pretty similar Promote understanding—not so theoretical

Bootstrapping Procedure 1) Resample Calculate bootstrap distribution Use bootstrap distribution

Bootstrap Idea Original sample represents population Take resamples by sampling with replacement from original random sample They “represent” many samples from population Bootstrap distribution of statistic represents sampling distribution

Concept 594 structure values ($1,000s) You want the population mean Glance says not normal Mean = SD = 20.6

Original & Resample

Calculate Bootstrap Distribution Calculate statistic for each resample and make distribution of them

Resampling Distribution Took 500 samples of n = 594 with replacement from the original sample Calculated (500) means of these 500 samples Plot the resampling distribution of means (nearly normal) Mean = (close) SD = 0.8

Bootstrap a Statistic Draw hundreds of resamples with replacement from original sample Inspect the bootstrap distribution of resampled statistics Bootstrap distribution approximates sampling distribution Approximate shape and spread, centers on original statistic not parameter Does not replace or add to data

Use Bootstrap Distribution Study characteristics of resampling distribution for insight

Bootstrap Mean & Confidence Intervals Sample Mean155.4 Resamples Mean155.9 Bias+0.5 Standard Error percentile percentile percentile percentile157.6 Confidence Interval 95% (t)155.9 ± 1.6

Why Bootstrapping Works Seems to create data out of nothing? Resamples not used as if real data Resample means are used to estimate how the sample mean for a sample of size 594 varies because of random sampling Use data twice Once to estimate population mean (original) Once to estimate variation in sample mean (resamples)

Applies to Other Statistics 25% trimmed mean (middle 50%) Difference between means Ratio of means Median Correlation coefficient Most anything

Take Away Points Sampling is a cost effective way to gather data Resampling offers analysts a powerful numerical technique for statistical analysis Resampling is relatively simple with resampling software

Accuracy Bootstrap based on large sample (n>100) Shape and spread do not depend much on original sample Does show shape and spread of sampling distribution Bootstrap based on small samples Almost all variation for a statistic comes from original sample, reduce variation with smaller sample size Does not overcome weakness of small samples as basis for inference Some methods (BCa, tilting) are better than standard methods

Beyond the Basics Bootstrap bias-corrected accelerated Adjusts percentile endpoints for 95% CI E.g., 4.1 to 98.6 instead of 2.5 to 97.5 for the 95% Bootstrap tilting Adjusts process of randomly forming resamples More efficient than BCa Use one of these more accurate methods if your software offers it

Permutation Tests Imagine experiment with 23 assigned randomly to control and 25 to treatment (n=48) Choose 25 of 48 at random and call this treatment (others to control) This is SRS without replacement—permutation resample Repeat 100s of times, calculate statistic of interest Permutation distribution—for 2 sample problems We can see if observed difference is so large that it would rarely occur if treatment did not matter!