Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU 016 11/12/14.

Slides:



Advertisements
Similar presentations
Panel at 2013 Joint Mathematics Meetings
Advertisements

What Can We Do When Conditions Arent Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2012 JSM San Diego, August 2012.
Confidence Intervals: Bootstrap Distribution
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock Morgan, Lock, and Lock MAA Minicourse –
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock MAA Minicourse – Joint Mathematics.
Simulating with StatKey Kari Lock Morgan Department of Statistical Science Duke University Joint Mathematical Meetings, San Diego 1/11/13.
Hypothesis Testing: Intervals and Tests
Early Inference: Using Bootstraps to Introduce Confidence Intervals Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Inference: Neyman’s Repeated Sampling STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science.
Intuitive Introduction to the Important Ideas of Inference Robin Lock – St. Lawrence University Patti Frazer Lock – St. Lawrence University Kari Lock Morgan.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
Hypothesis Testing: Hypotheses
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
Models and Modeling in Introductory Statistics Robin H. Lock Burry Professor of Statistics St. Lawrence University 2012 Joint Statistics Meetings San Diego,
Section 3.4 Bootstrap Confidence Intervals using Percentiles.
A Fiddler on the Roof: Tradition vs. Modern Methods in Teaching Inference Patti Frazer Lock Robin H. Lock St. Lawrence University Joint Mathematics Meetings.
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
StatKey: Online Tools for Bootstrap Intervals and Randomization Tests Kari Lock Morgan Department of Statistical Science Duke University Joint work with.
Section 4.4 Creating Randomization Distributions.
Dr. Kari Lock Morgan Department of Statistics Penn State University Teaching the Common Core: Making Inferences and Justifying Conclusions ASA Webinar.
Starting Inference with Bootstraps and Randomizations Robin H. Lock, Burry Professor of Statistics St. Lawrence University Stat Chat Macalester College,
Building Conceptual Understanding of Statistical Inference with Lock 5 Dr. Kari Lock Morgan Department of Statistical Science Duke University Wake Forest.
Bootstrapping: Let Your Data Be Your Guide Robin H. Lock Burry Professor of Statistics St. Lawrence University MAA Seaway Section Meeting Hamilton College,
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 101 Dr. Kari Lock Morgan 9/25/12 SECTION 4.2 Randomization distribution.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
4.1 Introducing Hypothesis Tests 4.2 Measuring significance with P-values Visit the Maths Study Centre 11am-5pm This presentation.
Confidence Intervals: Bootstrap Distribution
Section 5.2 Confidence Intervals and P-values using Normal Distributions.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Normal Distribution Chapter 5 Normal distribution
Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
Hypothesis Testing: p-value
What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2011 JSM Miami Beach, August 2011.
How to Handle Intervals in a Simulation-Based Curriculum? Robin Lock Burry Professor of Statistics St. Lawrence University 2015 Joint Statistics Meetings.
Estimation: Sampling Distribution
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Confidence Intervals: Bootstrap Distribution
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
Implementing a Randomization-Based Curriculum for Introductory Statistics Robin H. Lock, Burry Professor of Statistics St. Lawrence University Breakout.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 Section 3.1 Sampling Distributions.
Constructing Bootstrap Confidence Intervals
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.
Statistics: Unlocking the Power of Data Lock 5 Section 6.4 Distribution of a Sample Mean.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
1 Probability and Statistics Confidence Intervals.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Estimation: Confidence Intervals SECTION 3.2 Confidence Intervals (3.2)
Bootstraps and Scrambles: Letting a Dataset Speak for Itself Robin H. Lock Patti Frazer Lock ‘75 Burry Professor of Statistics Cummings Professor of MathematicsSt.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 p-value.
Notes on Bootstrapping Jeff Witmer 10 February 2016.
Statistics: Unlocking the Power of Data Lock 5 Section 4.1 Introducing Hypothesis Tests.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.
Making computing skills part of learning introductory stats
Normal Distribution Chapter 5 Normal distribution
Introducing Hypothesis Tests
Confidence Intervals: Sampling Distribution
Introducing Hypothesis Tests
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14

Statistics: Unlocking the Power of Data Lock 5 p-value How extreme would your observed statistic be, under the null hypothesis? Many of you calculated your p-value without ever using your observed statistic.

Statistics: Unlocking the Power of Data Lock 5 Sleep versus Caffeine Mednick, Cai, Kanady, and Drummond (2008). “Comparing the benefits of caffeine, naps and placebo on verbal, motor and perceptual memory,” Behavioral Brain Research, 193, Students were given words to memorize, then randomly assigned to take either a 90 min nap, or a caffeine pill. 2 ½ hours later, they were tested on their recall ability. Explanatory variable: sleep or caffeine Response variable: number of words recalled Is sleep better than caffeine for memory?

Statistics: Unlocking the Power of Data Lock 5 IMPORTANT POINTS Sample statistics vary from sample to sample. (they will not match the parameter exactly) KEY QUESTION: For a given sample statistic, what are plausible values for the population parameter? How much uncertainty surrounds the sample statistic? KEY ANSWER: It depends on how much the statistic varies from sample to sample!

Statistics: Unlocking the Power of Data Lock 5 Reese’s Pieces What proportion of Reese’s pieces are orange? Take a random sample of 10 Reese’s pieces. What is your sample proportion? Come to the board to make a class dotplot You just made a sampling distribution!

Statistics: Unlocking the Power of Data Lock 5 Sampling Distribution A sampling distribution is the distribution of sample statistics computed for different samples of the same size from the same population. A sampling distribution shows us how the sample statistic varies from sample to sample

Statistics: Unlocking the Power of Data Lock 5 Lots of simulations! We need many more simulations!

Statistics: Unlocking the Power of Data Lock 5 Reese’s Pieces

Statistics: Unlocking the Power of Data Lock 5 Standard Error The standard error of a statistic, SE, is the standard deviation of the sample statistic The variability of the sample statistic (how much it varies from sample to sample) is so important it gets it’s own name…

Statistics: Unlocking the Power of Data Lock 5 Reese’s Pieces STANDARD ERROR

Statistics: Unlocking the Power of Data Lock 5 95% Confidence Interval If the sampling distribution is relatively symmetric and bell-shaped, a 95% confidence interval can be estimated using statistic ± 2 × SE

Statistics: Unlocking the Power of Data Lock 5 Reese’s Pieces

Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals Population Sample... Calculate statistic for each sample Sampling Distribution Standard Error (SE): standard deviation of sampling distribution statistic ± 2×SE

Statistics: Unlocking the Power of Data Lock 5 Summary To create a plausible range of values for a parameter: o Take many random samples from the population, and compute the sample statistic for each sample o Compute the standard error as the standard deviation of all these statistics o Use statistic  2  SE One small problem…

Statistics: Unlocking the Power of Data Lock 5 Reality … WE ONLY HAVE ONE SAMPLE!!!! How do we know how much sample statistics vary, if we only have one sample?!? BOOTSTRAP!

Statistics: Unlocking the Power of Data Lock 5 Imagine the “population” is many, many copies of the original sample (What do you have to assume?) “Population”

Statistics: Unlocking the Power of Data Lock 5 Suppose we have a random sample of 6 people:

Statistics: Unlocking the Power of Data Lock 5 Original Sample A simulated “population” to sample from

Statistics: Unlocking the Power of Data Lock 5 To simulate a sampling distribution, we can just take repeated random samples from this “population” made up of many copies of the sample In practice, we can’t actually make infinite copies of the sample… … but we can do this by sampling with replacement from the sample we have (each unit can be selected more than once) Sampling with Replacement

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Sample: Sample with replacement from the original sample, using the same sample size. Original Sample Bootstrap Sample

Statistics: Unlocking the Power of Data Lock 5 Take a bootstrap sample from your sample of Reese’s Pieces Reese’s Pieces

Statistics: Unlocking the Power of Data Lock 5 Bootstrap A bootstrap sample is a random sample taken with replacement from the original sample, of the same size as the original sample A bootstrap statistic is the statistic computed on a bootstrap sample A bootstrap distribution is the distribution of many bootstrap statistics

Statistics: Unlocking the Power of Data Lock 5 Original Sample Bootstrap Sample Bootstrap Statistic Sample Statistic Bootstrap Statistic Bootstrap Distribution

Statistics: Unlocking the Power of Data Lock 5 Lots of simulations! We need many more simulations!

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Distribution STANDARD ERROR Look familiar???

Statistics: Unlocking the Power of Data Lock 5 “Pull yourself up by your bootstraps” Why “bootstrap”? Lift yourself in the air simply by pulling up on the laces of your boots Metaphor for accomplishing an “impossible” task without any outside help

Statistics: Unlocking the Power of Data Lock 5 Sampling Distribution Population µ BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Distribution Bootstrap “Population” What can we do with just one seed? Grow a NEW tree! µ

Statistics: Unlocking the Power of Data Lock 5 Standard Error The variability of the bootstrap statistics is similar to the variability of the sample statistics The standard error of a statistic can be estimated using the standard deviation of the bootstrap distribution!

Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals Sample Bootstrap Sample... Calculate statistic for each bootstrap sample Bootstrap Distribution Standard Error (SE): standard deviation of bootstrap distribution statistic ± 2×SE Bootstrap Sample

Statistics: Unlocking the Power of Data Lock 5 We can use bootstrapping to assess the uncertainty surrounding ANY sample statistic! If we have sample data, we can use bootstrapping to create a 95% confidence interval for any parameter! (well, almost…) The Magic of Bootstrapping

Statistics: Unlocking the Power of Data Lock 5 Used Mustangs What’s the average price of a used Mustang car? Select a random sample of n = 25 Mustangs from a website (autotrader.com) and record the price (in $1,000’s) for each car.

Statistics: Unlocking the Power of Data Lock 5 Sample of Mustangs: Our best estimate for the average price of used Mustangs is $15,980, but how accurate is that estimate? BOOTSTRAP!

Statistics: Unlocking the Power of Data Lock 5 Original Sample 1. Bootstrap Sample 2. Calculate mean price of bootstrap sample 3. Repeat many times!

Statistics: Unlocking the Power of Data Lock 5 Used Mustangs Use StatKey ( to generate your own 95% confidence interval for the price of used mustangs on autotrader.com.

Statistics: Unlocking the Power of Data Lock 5 Used Mustangs Standard Error

Statistics: Unlocking the Power of Data Lock 5 Used Mustangs

Statistics: Unlocking the Power of Data Lock 5 Other Levels of Confidence For a P% confidence interval:

Statistics: Unlocking the Power of Data Lock 5 What is the average mercury level of fish (large mouth bass) in Florida lakes? Sample of size n = 53, with ppm. Give a confidence interval for true average. Key Question: How much can statistics vary from sample to sample? Mercury in Fish Lange, T., Royals, H. and Connor, L. (2004). Mercury accumulation in largemouth bass (Micropterus salmoides) in a Florida Lake. Archives of Environmental Contamination and Toxicology, 27(4),

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Confidence Interval SE = Distribution of Bootstrap Statistics  2  (0.433, 0.621) Middle 95% of bootstrap statistics

Statistics: Unlocking the Power of Data Lock 5 Bootstrap CI Option 1: Estimate the standard error of the statistic by computing the standard deviation of the bootstrap distribution, and then generate a 95% confidence interval by Option 2: Generate a P% confidence interval as the range for the middle P% of bootstrap statistics

Statistics: Unlocking the Power of Data Lock 5 Mercury and pH in Lakes Lange, Royals, and Connor, Transactions of the American Fisheries Society (1993) For Florida lakes, what is the correlation between average mercury level (ppm) in fish taken from a lake and acidity (pH) of the lake? Give a 90% CI for 

Statistics: Unlocking the Power of Data Lock 5 Mercury and pH in Lakes We are 90% confident that the true correlation between average mercury level and pH of Florida lakes is between and

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Cautions These methods for creating a confidence interval work whenever the bootstrap distribution is smooth and symmetric ALWAYS look at a plot of the bootstrap distribution! If the bootstrap distribution is skewed or looks “spiky” with gaps, you will need something more advanced

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Cautions

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Cautions

Statistics: Unlocking the Power of Data Lock 5 Summary The standard error of a statistic is the standard deviation of the sample statistic, which can be estimated from a bootstrap distribution Confidence intervals can be created using the standard error or the percentiles of a bootstrap distribution Confidence intervals can be created this way for any parameter, as long as the bootstrap distribution is approximately symmetric and continuous