Download presentation
Presentation is loading. Please wait.
1
Stat 301 – Day 36 Bootstrapping (4.5)
2
Last Time – CI for Odds Ratio Often the parameter of interest is the population odds ratio, Especially with case-control studies Turns out the log-odds ratio is often well modeled by the normal distribution with a known standard deviation formula So can estimate the population log odds ratio using sample log odds + z (1/a+1/b+1/c+1/d) Exponentiate to get endpoints for
3
PP 5.1.3 (p. 426) Sample odds: 2.587 ln(2.587)+ 1.645 (1/65+464+1/30+1/554).950 + 1.645(.230) = (.572, 1.328) (e.572, e 1.328 ) = (1.77, 3.77) I am 90% confident that the odds of having an accident for new zealand drivers who had less than 5 hours of sleep is between 1.77 and 3.77 times higher than the odds of having a crash for new zealand drivers who had more than 5 hours of sleep.
4
What we have done so far… Analyzing one sample compared to a claim about the population parameter Categorical: One-sample z-procedures if sample size is large or binomial Comparing two proportions (independent samples or randomized experiment) Categorical: Two-sample z-procedures if sample sizes large or Fisher’s Exact Test (experiment) With quantitative data, no “small sample” alternative
5
Bootstrapping Relatively new approach that allows us to estimate aspects of the sampling distribution of the statistic in cases where the Central Limit Theorem does not apply Small samples Statistics other than the mean Previously we considered the randomization distribution One random sample? Two independent samples? Confidence interval
6
Investigation 4.5.1 (p. 365) If I take a random sample of 10 words from the population, what do I know about the sampling distribution of the sample mean?
7
Investigation 4.5.1 A bootstrap sample resamples the data from the existing sample, drawing n observations, but with replacement GettysburgSample.mtw Random sample of 10 words from population Answer through part (h)
8
Bootstrap Distribution Sampling from existing sample (with replacement, that is, each observed value repeated infinitely many times, demo)demo Centers around sample statistic Estimates the standard deviation of the sampling distribution How does statistic vary around the parameter! If sampling distribution is symmetric, CI would then be statistic + t SE(statistic)
9
And without symmetry? Key Result: How bootstrap samples vary around statistic, mimics how statistics vary around parameter E.g., expected sampling error
10
Without symmetry Look at middle 95% of values (between 2.5 th percentile and 97.5 th percentile) Statistic should not be any more than (4.8-3.6) below parameter Statistic should not be any more than (6.1-4.8) above parameter Parameter < statistic+(4.8-3.6) Parameter > statistic-(6.1-4.8) Max overestimate
11
“Percentile” confidence interval (Sample mean – (6.1-4.8), sample mean + (4.8-3.6)) Upper bound: Estimate + (estimate – 2.5 th percentile) Lower bound: Estimate - (97.5 th percentile – estimate)
12
Investigation 4.5.2 (p. 371)
13
Investigation 4.5.3 (p. 372) (a)-(c)
14
For Tuesday Continuing HW 8 All but last problem?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.