Download presentation
Presentation is loading. Please wait.
Published byDerick Powers Modified over 8 years ago
1
Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007
2
Learning Objectives At the end of this session participants will be able to: List sampling techniques Estimate desired sample size Describe a resample and a bootstrap procedure
3
Why Sample? Expense Time Impossible/impractical Better data from a sample
4
We Want Representativeness Men Women W M Representative Not Representative Not Representative
5
Impediments to Representativeness Bias Systematic differences between sample and population Can be eliminated by good sample design Sample error Unavoidable differences due to chance in selection of a sample This is “dumb luck” It cannot be eliminated or avoided so we have to address it
6
Sampling Techniques Simple Random Sample Stratified Sample Cluster Sample Sequential Sample Non-Probability Sample/Convenience Sample
7
Think of Random Like This You want to identify each element of population Place a unique number from 1 to N on it Randomly select a number between 1 and N Find item with that number and measure it Replace number and repeat If same number comes up again, ignore it, replace it and choose again
8
How do you take a sample? Objectives of survey What question(s) are you trying to answer? What information do you need? ID target population Obtain sample frame Sample design Method of measurement Measurement instrument
9
How do you take a sample? Select and train field workers Pretest Organize field work Organize data management Data analysis
10
How big should a sample be? Trade-offs determine this precision (size of interval estimate) accuracy (capturing the value) sample size (what’s it going to cost?) What is important to your decision process? You pick any two and third is determined for you
11
Sample Size for Mean n is size of sample E is allowable error Precision z is z- value Accuracy (level of confidence) s is sample SD Pilot survey Guesstimate
12
Example Mean house value N = 501 E = $3000 z = 1.96 (95%) s = $10,000 n=[(1.96*10000)/3000] 2 =43
13
Sample Size for Proportion p is proportion w/ characteristic 1-p is proportion w/o characteristic Z and E as before
14
Example Proportion of homes with basement N=501 p=.5 1-p=.5 z=1.96 E=.05 n=.5*.5*(1.96/.03) 2 =1067
15
What happens when the population has less members than the sample size calculated requires? Step One : Calculate the sample size as before. n = n o noNnoN 1 + where n o is the sample size calculated in step one. Step Two : Calculate the new sample size.
16
What Happens if n > N? First, calculate the sample size as before. Second, calculate the new sample size using: n new =n old /[1+(n old /N)] n new =1067/[1+(1067/501)]=340
17
How n is Chosen in Practice Arbitrarily select a sample size As large a sample as you can get for a budget Pick a percentage for your sample Identify sample size required to obtain precision and accuracy desired!
18
With Good Samples…. We have classical statistical techniques that enable us to make inferences about the populations from which the samples were drawn Confidence intervals Hypothesis testing
19
Resampling Statistics is changing Computers make computational methods once inconceivable, possible Bootstrap Permutation tests Other resampling methods
20
Advantages of Resampling Fewer assumptions—normal and large n not required Greater accuracy—can be better than classical methods in some cases Generality—approach is pretty similar Promote understanding—not so theoretical
21
Bootstrapping Procedure 1) Resample Calculate bootstrap distribution Use bootstrap distribution
22
Bootstrap Idea Original sample represents population Take resamples by sampling with replacement from original random sample They “represent” many samples from population Bootstrap distribution of statistic represents sampling distribution
23
Concept 594 structure values ($1,000s) You want the population mean Glance says not normal Mean = 155.4 SD = 20.6
24
Original & Resample
25
Calculate Bootstrap Distribution Calculate statistic for each resample and make distribution of them
26
Resampling Distribution Took 500 samples of n = 594 with replacement from the original sample Calculated (500) means of these 500 samples Plot the resampling distribution of means (nearly normal) Mean = 155.9 (close) SD = 0.8
27
Bootstrap a Statistic Draw hundreds of resamples with replacement from original sample Inspect the bootstrap distribution of resampled statistics Bootstrap distribution approximates sampling distribution Approximate shape and spread, centers on original statistic not parameter Does not replace or add to data
28
Use Bootstrap Distribution Study characteristics of resampling distribution for insight
29
Bootstrap Mean & Confidence Intervals Sample Mean155.4 Resamples Mean155.9 Bias+0.5 Standard Error0.8 2.5 percentile154.3 5 percentile154.6 95 percentile157.3 97.5 percentile157.6 Confidence Interval 95% (t)155.9 ± 1.6
30
Why Bootstrapping Works Seems to create data out of nothing? Resamples not used as if real data Resample means are used to estimate how the sample mean for a sample of size 594 varies because of random sampling Use data twice Once to estimate population mean (original) Once to estimate variation in sample mean (resamples)
31
Applies to Other Statistics 25% trimmed mean (middle 50%) Difference between means Ratio of means Median Correlation coefficient Most anything
32
Take Away Points Sampling is a cost effective way to gather data Resampling offers analysts a powerful numerical technique for statistical analysis Resampling is relatively simple with resampling software
33
Accuracy Bootstrap based on large sample (n>100) Shape and spread do not depend much on original sample Does show shape and spread of sampling distribution Bootstrap based on small samples Almost all variation for a statistic comes from original sample, reduce variation with smaller sample size Does not overcome weakness of small samples as basis for inference Some methods (BCa, tilting) are better than standard methods
34
Beyond the Basics Bootstrap bias-corrected accelerated Adjusts percentile endpoints for 95% CI E.g., 4.1 to 98.6 instead of 2.5 to 97.5 for the 95% Bootstrap tilting Adjusts process of randomly forming resamples More efficient than BCa Use one of these more accurate methods if your software offers it
35
Permutation Tests Imagine experiment with 23 assigned randomly to control and 25 to treatment (n=48) Choose 25 of 48 at random and call this treatment (others to control) This is SRS without replacement—permutation resample Repeat 100s of times, calculate statistic of interest Permutation distribution—for 2 sample problems We can see if observed difference is so large that it would rarely occur if treatment did not matter!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.