Download presentation
1
Confidence Intervals: Bootstrap Distribution
STAT 250 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap distribution (3.3) 95% CI using standard error (3.3) Percentile method (3.4)
2
Confidence Intervals . . . statistic ± ME Population
Sample Population Sample Sample . . . Margin of Error (ME) (95% CI: ME = 2×SE) Sample Sample Sample Sampling Distribution Calculate statistic for each sample Standard Error (SE): standard deviation of sampling distribution
3
BOOTSTRAP! Reality One small problem… … WE ONLY HAVE ONE SAMPLE!!!!
How do we know how much sample statistics vary, if we only have one sample?!? BOOTSTRAP!
4
Your best guess for the population is lots of copies of the sample!
Sample repeatedly from this “population”
5
Sampling with Replacement
It’s impossible to sample repeatedly from the population… But we can sample repeatedly from the sample! To get statistics that vary, sample with replacement (each unit can be selected more than once)
6
Suppose we have a random sample of 6 people:
Patti
7
A simulated “population” to sample from
Original Sample Patti A simulated “population” to sample from
8
Remember: sample size matters!
Bootstrap Sample: Sample with replacement from the original sample, using the same sample size. Remember: sample size matters! Patti Original Sample Bootstrap Sample
9
Reese’s Pieces How would you take a bootstrap sample from your sample of Reese’s Pieces?
10
Bootstrap Sample Your original sample has data values
18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 19, 20, 21, 22 Yes No
11
Bootstrap Sample Your original sample has data values
18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 19, 20, 21 Yes No
12
Bootstrap Sample Your original sample has data values
18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 18, 19, 20, 21 Yes No
13
Bootstrap Distribution
BootstrapSample Bootstrap Statistic BootstrapSample Bootstrap Statistic Original Sample Bootstrap Distribution . . Sample Statistic BootstrapSample Bootstrap Statistic
14
Mercury Levels in Fish What is the average mercury level of fish (large mouth bass) in Florida lakes? Sample of size n = 53, with 𝑥 =0.527 ppm. (FDA action level in the USA is 1 ppm, in Canada the limit is 0.5 ppm) Key Question: How much can statistics vary from sample to sample? FDA action level: 1ppm Canada: 0.5ppm Lange, T., Royals, H. and Connor, L. (2004). Mercury accumulation in largemouth bass (Micropterus salmoides) in a Florida Lake. Archives of Environmental Contamination and Toxicology, 27(4),
15
Mercury Levels in Fish
16
Bootstrap Sample You have a sample of size n = 53. You sample with replacement 1000 times to get 1000 bootstrap samples. What is the sample size of each bootstrap sample? 53 1000
17
Bootstrap Distribution
You have a sample of size n = 53. You sample with replacement 1000 times to get 1000 bootstrap samples. How many bootstrap statistics will you have? 53 1000
18
“Pull yourself up by your bootstraps”
Why “bootstrap”? “Pull yourself up by your bootstraps” Lift yourself in the air simply by pulling up on the laces of your boots Metaphor for accomplishing an “impossible” task without any outside help
19
Sampling Distribution
Population BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed
20
Bootstrap Distribution
What can we do with just one seed? Bootstrap “Population” Estimate the distribution and variability (SE) of 𝑥 ’s from the bootstraps 𝑥
21
Golden Rule of Bootstrapping
Bootstrap statistics are to the original sample statistic as the original sample statistic is to the population parameter
22
Center The sampling distribution is centered around the population parameter The bootstrap distribution is centered around the Luckily, we don’t care about the center… we care about the variability! population parameter sample statistic bootstrap statistic bootstrap parameter
23
Standard Error The variability of the bootstrap statistics is similar to the variability of the sample statistics The standard error of a statistic can be estimated using the standard deviation of the bootstrap distribution!
24
Confidence Intervals . . . statistic ± ME Sample Confidence Interval
Bootstrap Sample Sample Bootstrap Sample Bootstrap Sample . . . Margin of Error (ME) (95% CI: ME = 2×SE) Bootstrap Sample Bootstrap Sample Bootstrap Distribution Calculate statistic for each bootstrap sample Standard Error (SE): standard deviation of bootstrap distribution
25
Mercury Levels in Fish What is the average mercury level of fish (large mouth bass) in Florida lakes? Sample of size n = 53, with 𝑥 =0.527 ppm. (FDA action level in the USA is 1 ppm, in Canada the limit is 0.5 ppm) Give a confidence interval for the true average. FDA action level: 1ppm Canada: 0.5ppm Lange, T., Royals, H. and Connor, L. (2004). Mercury accumulation in largemouth bass (Micropterus salmoides) in a Florida Lake. Archives of Environmental Contamination and Toxicology, 27(4),
26
Mercury Levels in Fish 0.527 2 0.047 (0.433, 0.621)
SE = 0.047 0.527 2 0.047 (0.433, 0.621) We are 95% confident that average mercury level in fish in Florida lakes is between and ppm.
27
Same process for every parameter!
Estimate the standard error and/or a confidence interval for... proportion (𝑝) difference in means (µ1 −µ2 ) difference in proportions (𝑝1 −𝑝2 ) standard deviation (𝜎) correlation (𝜌) ... Generate samples with replacement Calculate sample statistic Repeat...
28
Hitchhiker Snails A type of small snail is very widespread in Japan, and colonies of the snails that are genetically very similar have been found very far apart. How could the snails travel such long distances? Biologist Shinichiro Wada fed 174 live snails to birds, and found that 26 were excreted live out the other end. (The snails are apparently able to seal their shells shut to keep the digestive fluids from getting in). What proportion of these snails ingested by birds survive? Yong, E. “The Scatological Hitchhiker Snail,” Discover, October 2011, 13.
29
Hitchhiker Snails (0.1, 0.2) (0.05, 0.25) (0.12, 0.18) (0.07, 0.18)
Give a 95% confidence interval for the proportion of snails ingested by birds that survive. (0.1, 0.2) (0.05, 0.25) (0.12, 0.18) (0.07, 0.18)
30
Body Temperature What is the average body temperature of humans?
Shoemaker, What's Normal: Temperature, Gender and Heartrate, Journal of Statistics Education, Vol. 4, No. 2 (1996)
31
Other Levels of Confidence
What if we want to be more than 95% confident? How might you produce a 99% confidence interval for the average body temperature?
32
(0.5th percentile, 99.5th percentile)
Percentile Method For a P% confidence interval, keep the middle P% of bootstrap statistics For a 99% confidence interval, keep the middle 99%, leaving 0.5% in each tail. The 99% confidence interval would be (0.5th percentile, 99.5th percentile) where the percentiles refer to the bootstrap distribution.
33
Bootstrap Distribution
For a P% confidence interval:
34
Body Temperature www.lock5stat.com/statkey
We are 99% sure that the average body temperature is between 98.00 and 98.58
35
Level of Confidence Which is wider, a 90% confidence interval or a 95% confidence interval? (a) 90% CI (b) 95% CI
36
Mercury and pH in Lakes For Florida lakes, what is the correlation between average mercury level (ppm) in fish taken from a lake and acidity (pH) of the lake? 𝑟=−0.575 Give a 90% CI for Lange, Royals, and Connor, Transactions of the American Fisheries Society (1993)
37
Mercury and pH in Lakes www.lock5stat.com/statkey
We are 90% confident that the true correlation between average mercury level and pH of Florida lakes is between and
38
Bootstrap CI Option 1: Estimate the standard error of the statistic by computing the standard deviation of the bootstrap distribution, and then generate a 95% confidence interval by Option 2: Generate a P% confidence interval as the range for the middle P% of bootstrap statistics
39
Middle 95% of bootstrap statistics
Mercury Levels in Fish SE = 0.047 0.527 2 0.047 (0.433, 0.621) Middle 95% of bootstrap statistics
40
Bootstrap Cautions These methods for creating a confidence interval only work if the bootstrap distribution is smooth and symmetric ALWAYS look at a plot of the bootstrap distribution! If the bootstrap distribution is skewed or looks “spiky” with gaps, you will need to go beyond intro stat to create a confidence interval
41
Bootstrap Cautions
42
Bootstrap Cautions
43
Number of Bootstrap Samples
When using bootstrapping, you may get a slightly different confidence interval each time. This is fine! The more bootstrap samples you use, the more precise your answer will be. For the purposes of this class, 1000 bootstrap samples is fine. In real life, you probably want to take 10,000 or even 100,000 bootstrap samples
44
Summary The standard error of a statistic is the standard deviation of the sample statistic, which can be estimated from a bootstrap distribution Confidence intervals can be created using the standard error or the percentiles of a bootstrap distribution Increasing the number of bootstrap samples will not change the SE or interval (except for random fluctuation) Confidence intervals can be created this way for any parameter, as long as the bootstrap distribution is approximately symmetric and continuous
45
To Do Read Sections 3.3, 3.4 Do HW 3.3, 3.4 (due Friday, 2/7)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.