Confidence Intervals: Bootstrap Distribution

Slides:



Advertisements
Similar presentations
Panel at 2013 Joint Mathematics Meetings
Advertisements

What Can We Do When Conditions Arent Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2012 JSM San Diego, August 2012.
Confidence Intervals: Bootstrap Distribution
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/23/12 Sections , Single Proportion, p Distribution (6.1)
Hypothesis Testing: Intervals and Tests
Early Inference: Using Bootstraps to Introduce Confidence Intervals Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Sampling: Final and Initial Sample Size Determination
Chapter 10: Estimating with Confidence
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
Chapter 18 Sampling Distribution Models
Section 3.4 Bootstrap Confidence Intervals using Percentiles.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.3 Estimating a Population mean µ (σ known) Objective Find the confidence.
Chapter 19: Confidence Intervals for Proportions
t scores and confidence intervals using the t distribution
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
StatKey: Online Tools for Bootstrap Intervals and Randomization Tests Kari Lock Morgan Department of Statistical Science Duke University Joint work with.
How confident are we that our sample means make sense? Confidence intervals.
Dr. Kari Lock Morgan Department of Statistics Penn State University Teaching the Common Core: Making Inferences and Justifying Conclusions ASA Webinar.
Starting Inference with Bootstraps and Randomizations Robin H. Lock, Burry Professor of Statistics St. Lawrence University Stat Chat Macalester College,
Bootstrapping: Let Your Data Be Your Guide Robin H. Lock Burry Professor of Statistics St. Lawrence University MAA Seaway Section Meeting Hamilton College,
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Review of normal distribution. Exercise Solution.
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
4.1 Introducing Hypothesis Tests 4.2 Measuring significance with P-values Visit the Maths Study Centre 11am-5pm This presentation.
Chapter 7 Confidence Intervals and Sample Sizes
More Randomization Distributions, Connections
Confidence Intervals: Bootstrap Distribution
Section 5.2 Confidence Intervals and P-values using Normal Distributions.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Normal Distribution Chapter 5 Normal distribution
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 9 Section 1 – Slide 1 of 39 Chapter 9 Section 1 The Logic in Constructing Confidence Intervals.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2011 JSM Miami Beach, August 2011.
Estimation: Sampling Distribution
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Rule of sample proportions IF:1.There is a population proportion of interest 2.We have a random sample from the population 3.The sample is large enough.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Confidence Interval Estimation for a Population Proportion Lecture 31 Section 9.4 Wed, Nov 17, 2004.
Module 14: Confidence Intervals This module explores the development and interpretation of confidence intervals, with a focus on confidence intervals.
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Estimation: Confidence Intervals SECTION 3.2 Confidence Intervals (3.2)
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Chapter 8: Confidence Intervals based on a Single Sample
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 Section 3.1 Sampling Distributions.
Constructing Bootstrap Confidence Intervals
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.
Statistics: Unlocking the Power of Data Lock 5 Section 6.4 Distribution of a Sample Mean.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
1 Probability and Statistics Confidence Intervals.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Estimation: Confidence Intervals SECTION 3.2 Confidence Intervals (3.2)
Bootstraps and Scrambles: Letting a Dataset Speak for Itself Robin H. Lock Patti Frazer Lock ‘75 Burry Professor of Statistics Cummings Professor of MathematicsSt.
Notes on Bootstrapping Jeff Witmer 10 February 2016.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.
Making computing skills part of learning introductory stats
Inference for Proportions
Confidence Intervals: Sampling Distribution
Bootstrap Confidence Intervals using Percentiles
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Sampling Distribution Models
Presentation transcript:

Confidence Intervals: Bootstrap Distribution STAT 250 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap distribution (3.3) 95% CI using standard error (3.3) Percentile method (3.4)

Confidence Intervals . . . statistic ± ME Population Sample Population Sample Sample . . . Margin of Error (ME) (95% CI: ME = 2×SE) Sample Sample Sample Sampling Distribution Calculate statistic for each sample Standard Error (SE): standard deviation of sampling distribution

BOOTSTRAP! Reality One small problem… … WE ONLY HAVE ONE SAMPLE!!!! How do we know how much sample statistics vary, if we only have one sample?!? BOOTSTRAP!

Your best guess for the population is lots of copies of the sample! Sample repeatedly from this “population”

Sampling with Replacement It’s impossible to sample repeatedly from the population… But we can sample repeatedly from the sample! To get statistics that vary, sample with replacement (each unit can be selected more than once)

Suppose we have a random sample of 6 people: Patti

A simulated “population” to sample from Original Sample Patti A simulated “population” to sample from

Remember: sample size matters! Bootstrap Sample: Sample with replacement from the original sample, using the same sample size. Remember: sample size matters! Patti Original Sample Bootstrap Sample

Reese’s Pieces How would you take a bootstrap sample from your sample of Reese’s Pieces?

Bootstrap Sample Your original sample has data values 18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 19, 20, 21, 22 Yes No

Bootstrap Sample Your original sample has data values 18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 19, 20, 21 Yes No

Bootstrap Sample Your original sample has data values 18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 18, 19, 20, 21 Yes No

Bootstrap Distribution BootstrapSample Bootstrap Statistic BootstrapSample Bootstrap Statistic Original Sample Bootstrap Distribution . . Sample Statistic BootstrapSample Bootstrap Statistic

Mercury Levels in Fish What is the average mercury level of fish (large mouth bass) in Florida lakes? Sample of size n = 53, with 𝑥 =0.527 ppm. (FDA action level in the USA is 1 ppm, in Canada the limit is 0.5 ppm) Key Question: How much can statistics vary from sample to sample? FDA action level: 1ppm Canada: 0.5ppm Lange, T., Royals, H. and Connor, L. (2004). Mercury accumulation in largemouth bass (Micropterus salmoides) in a Florida Lake. Archives of Environmental Contamination and Toxicology, 27(4), 466-471.

Mercury Levels in Fish

Bootstrap Sample You have a sample of size n = 53. You sample with replacement 1000 times to get 1000 bootstrap samples. What is the sample size of each bootstrap sample? 53 1000

Bootstrap Distribution You have a sample of size n = 53. You sample with replacement 1000 times to get 1000 bootstrap samples. How many bootstrap statistics will you have? 53 1000

“Pull yourself up by your bootstraps” Why “bootstrap”? “Pull yourself up by your bootstraps” Lift yourself in the air simply by pulling up on the laces of your boots Metaphor for accomplishing an “impossible” task without any outside help

Sampling Distribution Population BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed µ

Bootstrap Distribution What can we do with just one seed? Bootstrap “Population” Estimate the distribution and variability (SE) of 𝑥 ’s from the bootstraps 𝑥 µ

Golden Rule of Bootstrapping Bootstrap statistics are to the original sample statistic as the original sample statistic is to the population parameter

Center The sampling distribution is centered around the population parameter The bootstrap distribution is centered around the Luckily, we don’t care about the center… we care about the variability! population parameter sample statistic bootstrap statistic bootstrap parameter

Standard Error The variability of the bootstrap statistics is similar to the variability of the sample statistics The standard error of a statistic can be estimated using the standard deviation of the bootstrap distribution!

Confidence Intervals . . . statistic ± ME Sample Confidence Interval Bootstrap Sample Sample Bootstrap Sample Bootstrap Sample . . . Margin of Error (ME) (95% CI: ME = 2×SE) Bootstrap Sample Bootstrap Sample Bootstrap Distribution Calculate statistic for each bootstrap sample Standard Error (SE): standard deviation of bootstrap distribution

Mercury Levels in Fish What is the average mercury level of fish (large mouth bass) in Florida lakes? Sample of size n = 53, with 𝑥 =0.527 ppm. (FDA action level in the USA is 1 ppm, in Canada the limit is 0.5 ppm) Give a confidence interval for the true average. FDA action level: 1ppm Canada: 0.5ppm Lange, T., Royals, H. and Connor, L. (2004). Mercury accumulation in largemouth bass (Micropterus salmoides) in a Florida Lake. Archives of Environmental Contamination and Toxicology, 27(4), 466-471.

Mercury Levels in Fish 0.527  2  0.047 (0.433, 0.621) SE = 0.047 0.527  2  0.047 (0.433, 0.621) We are 95% confident that average mercury level in fish in Florida lakes is between 0.433 and 0.621 ppm.

Same process for every parameter! Estimate the standard error and/or a confidence interval for... proportion (𝑝) difference in means (µ1 −µ2 ) difference in proportions (𝑝1 −𝑝2 ) standard deviation (𝜎) correlation (𝜌) ... Generate samples with replacement Calculate sample statistic Repeat...

Hitchhiker Snails A type of small snail is very widespread in Japan, and colonies of the snails that are genetically very similar have been found very far apart. How could the snails travel such long distances? Biologist Shinichiro Wada fed 174 live snails to birds, and found that 26 were excreted live out the other end. (The snails are apparently able to seal their shells shut to keep the digestive fluids from getting in). What proportion of these snails ingested by birds survive? Yong, E. “The Scatological Hitchhiker Snail,” Discover, October 2011, 13.

Hitchhiker Snails (0.1, 0.2) (0.05, 0.25) (0.12, 0.18) (0.07, 0.18) Give a 95% confidence interval for the proportion of snails ingested by birds that survive. (0.1, 0.2) (0.05, 0.25) (0.12, 0.18) (0.07, 0.18)

Body Temperature What is the average body temperature of humans? www.lock5stat.com/statkey Shoemaker, What's Normal: Temperature, Gender and Heartrate, Journal of Statistics Education, Vol. 4, No. 2 (1996)

Other Levels of Confidence What if we want to be more than 95% confident? How might you produce a 99% confidence interval for the average body temperature?

(0.5th percentile, 99.5th percentile) Percentile Method For a P% confidence interval, keep the middle P% of bootstrap statistics For a 99% confidence interval, keep the middle 99%, leaving 0.5% in each tail. The 99% confidence interval would be (0.5th percentile, 99.5th percentile) where the percentiles refer to the bootstrap distribution.

Bootstrap Distribution For a P% confidence interval:

Body Temperature www.lock5stat.com/statkey We are 99% sure that the average body temperature is between 98.00 and 98.58

Level of Confidence Which is wider, a 90% confidence interval or a 95% confidence interval? (a) 90% CI (b) 95% CI

Mercury and pH in Lakes For Florida lakes, what is the correlation between average mercury level (ppm) in fish taken from a lake and acidity (pH) of the lake? 𝑟=−0.575 Give a 90% CI for  Lange, Royals, and Connor, Transactions of the American Fisheries Society (1993)

Mercury and pH in Lakes www.lock5stat.com/statkey We are 90% confident that the true correlation between average mercury level and pH of Florida lakes is between -0.702 and -0.433.

Bootstrap CI Option 1: Estimate the standard error of the statistic by computing the standard deviation of the bootstrap distribution, and then generate a 95% confidence interval by Option 2: Generate a P% confidence interval as the range for the middle P% of bootstrap statistics

Middle 95% of bootstrap statistics Mercury Levels in Fish SE = 0.047 0.527  2  0.047 (0.433, 0.621) Middle 95% of bootstrap statistics

Bootstrap Cautions These methods for creating a confidence interval only work if the bootstrap distribution is smooth and symmetric ALWAYS look at a plot of the bootstrap distribution! If the bootstrap distribution is skewed or looks “spiky” with gaps, you will need to go beyond intro stat to create a confidence interval

Bootstrap Cautions

Bootstrap Cautions

Number of Bootstrap Samples When using bootstrapping, you may get a slightly different confidence interval each time. This is fine! The more bootstrap samples you use, the more precise your answer will be. For the purposes of this class, 1000 bootstrap samples is fine. In real life, you probably want to take 10,000 or even 100,000 bootstrap samples

Summary The standard error of a statistic is the standard deviation of the sample statistic, which can be estimated from a bootstrap distribution Confidence intervals can be created using the standard error or the percentiles of a bootstrap distribution Increasing the number of bootstrap samples will not change the SE or interval (except for random fluctuation) Confidence intervals can be created this way for any parameter, as long as the bootstrap distribution is approximately symmetric and continuous

To Do Read Sections 3.3, 3.4 Do HW 3.3, 3.4 (due Friday, 2/7)