Bootstrapping: Let Your Data Be Your Guide Robin H. Lock Burry Professor of Statistics St. Lawrence University MAA Seaway Section Meeting Hamilton College,

Slides:



Advertisements
Similar presentations
Panel at 2013 Joint Mathematics Meetings
Advertisements

StatKey Online Tools for Teaching a Modern Introductory Statistics Course Robin Lock St. Lawrence University USCOTS Breakout – May 2013 Patti Frazer Lock.
What Can We Do When Conditions Arent Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2012 JSM San Diego, August 2012.
Statistical Inference Using Scrambles and Bootstraps Robin Lock Burry Professor of Statistics St. Lawrence University MAA Allegheny Mountain 2014 Section.
What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2014 JSM Boston, August 2014.
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock Morgan, Lock, and Lock MAA Minicourse –
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock MAA Minicourse – Joint Mathematics.
Simulating with StatKey Kari Lock Morgan Department of Statistical Science Duke University Joint Mathematical Meetings, San Diego 1/11/13.
Hypothesis Testing: Intervals and Tests
Bootstrap Distributions Or: How do we get a sense of a sampling distribution when we only have ONE sample?
Early Inference: Using Bootstraps to Introduce Confidence Intervals Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Intuitive Introduction to the Important Ideas of Inference Robin Lock – St. Lawrence University Patti Frazer Lock – St. Lawrence University Kari Lock Morgan.
Models and Modeling in Introductory Statistics Robin H. Lock Burry Professor of Statistics St. Lawrence University 2012 Joint Statistics Meetings San Diego,
Section 3.4 Bootstrap Confidence Intervals using Percentiles.
A Fiddler on the Roof: Tradition vs. Modern Methods in Teaching Inference Patti Frazer Lock Robin H. Lock St. Lawrence University Joint Mathematics Meetings.
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
StatKey: Online Tools for Bootstrap Intervals and Randomization Tests Kari Lock Morgan Department of Statistical Science Duke University Joint work with.
Starting Inference with Bootstraps and Randomizations Robin H. Lock, Burry Professor of Statistics St. Lawrence University Stat Chat Macalester College,
Using Simulation Methods to Introduce Statistical Inference Patti Frazer Lock Kari Lock Morgan Cummings Professor of Mathematics Assistant Professor of.
Building Conceptual Understanding of Statistical Inference with Lock 5 Dr. Kari Lock Morgan Department of Statistical Science Duke University Wake Forest.
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Confidence Intervals and Hypothesis Tests
Using Bootstrap Intervals and Randomization Tests to Enhance Conceptual Understanding in Introductory Statistics Kari Lock Morgan Department of Statistical.
Introducing Inference with Bootstrap and Randomization Procedures Dennis Lock Statistics Education Meeting October 30,
Building Conceptual Understanding of Statistical Inference Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University
Understanding the P-value… Really! Kari Lock Morgan Department of Statistical Science, Duke University with Robin Lock, Patti Frazer.
Using Simulation Methods to Introduce Inference Kari Lock Morgan Duke University In collaboration with Robin Lock, Patti Frazer Lock, Eric Lock, Dennis.
Confidence Intervals: Bootstrap Distribution
Section 5.2 Confidence Intervals and P-values using Normal Distributions.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Normal Distribution Chapter 5 Normal distribution
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2011 JSM Miami Beach, August 2011.
How to Handle Intervals in a Simulation-Based Curriculum? Robin Lock Burry Professor of Statistics St. Lawrence University 2015 Joint Statistics Meetings.
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
Building Conceptual Understanding of Statistical Inference Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University
Statistics: Unlocking the Power of Data Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University University of Kentucky.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Using Randomization Methods to Build Conceptual Understanding of Statistical Inference: Day 2 Lock, Lock, Lock Morgan, Lock, and Lock MAA Minicourse- Joint.
Confidence Intervals: Bootstrap Distribution
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
Implementing a Randomization-Based Curriculum for Introductory Statistics Robin H. Lock, Burry Professor of Statistics St. Lawrence University Breakout.
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
Building Conceptual Understanding of Statistical Inference Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University Canton, New York.
Using Bootstrapping and Randomization to Introduce Statistical Inference Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Give your data the boot: What is bootstrapping? and Why does it matter? Patti Frazer Lock and Robin H. Lock St. Lawrence University MAA Seaway Section.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Early Inference: Using Randomization to Introduce Hypothesis Tests Kari Lock, Harvard University Eric Lock, UNC Chapel Hill Dennis Lock, Iowa State Joint.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Constructing Bootstrap Confidence Intervals
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
StatKey Online Tools for Teaching a Modern Introductory Statistics Course Robin Lock Burry Professor of Statistics St. Lawrence University
Bootstraps and Scrambles: Letting a Dataset Speak for Itself Robin H. Lock Patti Frazer Lock ‘75 Burry Professor of Statistics Cummings Professor of MathematicsSt.
Notes on Bootstrapping Jeff Witmer 10 February 2016.
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock Minicourse – Joint Mathematics.
Ex St 801 Statistical Methods Part 2 Inference about a Single Population Mean (HYP)
Patti Frazer Lock Cummings Professor of Mathematics
Introducing Statistical Inference with Resampling Methods (Part 1)
When we free ourselves of desire,
Connecting Intuitive Simulation-Based Inference to Traditional Methods
Using Simulation Methods to Introduce Inference
Using Simulation Methods to Introduce Inference
Teaching with Simulation-Based Inference, for Beginners
Presentation transcript:

Bootstrapping: Let Your Data Be Your Guide Robin H. Lock Burry Professor of Statistics St. Lawrence University MAA Seaway Section Meeting Hamilton College, April 2012

Questions to Address What is bootstrapping? How/why does it work? Can it be made accessible to intro statistics students? Can it be used as the way to introduce students to key ideas of statistical inference?

The Lock 5 Team Robin SUNY Oneonta St. Lawrence Dennis St. Lawrence Iowa State Eric Hamilton UNC- Chapel Hill Kari Williams Harvard Duke Patti Colgate St. Lawrence

Quick Review: Confidence Interval for a Mean Estimate ± Margin of Error Estimate ± (Table)*(Standard Error) What’s the “right” table? How do we estimate the standard error?

Common Difficulties Example: Suppose n=15 and the underlying population is skewed with outliers? What is the distribution? What is the standard error for s?  t-distribution doesn’t apply Example: Find a confidence interval for the standard deviation in a population.

Traditional Approach: Sampling Distributions Take LOTS of samples (size n) from the population and compute the statistic of interest for each sample. Recognize the form of the distribution Estimate the standard error of the statistic BUT, in practice, is it feasible to take lots of samples from the population? What can we do if we ONLY have one sample?

Alternate Approach: Bootstrapping “Let your data be your guide.” Brad Efron – Stanford University

“Bootstrap” Samples Key idea: Sample with replacement from the original sample using the same n. Assumes the “population” is many, many copies of the original sample.

Suppose we have a random sample of 6 people:

Original Sample A simulated “population” to sample from

Bootstrap Sample: Sample with replacement from the original sample, using the same sample size. Original SampleBootstrap Sample

Example: Atlanta Commutes Data: The American Housing Survey (AHS) collected data from Atlanta in What’s the mean commute time for workers in metropolitan Atlanta?

Sample of n=500 Atlanta Commutes Where is the “true” mean (µ)?

Original Sample Bootstrap Sample Bootstrap Statistic Sample Statistic Bootstrap Statistic Bootstrap Distribution

We need technology! StatKey

Three Distributions One to Many Samples StatKey

How can we get a confidence interval from a bootstrap distribution? Method #1: Use the standard deviation of the bootstrap statistics as a “yardstick”

Using the Bootstrap Distribution to Get a Confidence Interval – Version #1 The standard deviation of the bootstrap statistics estimates the standard error of the sample statistic. Quick interval estimate : For the mean Atlanta commute time:

Using the Bootstrap Distribution to Get a Confidence Interval – Version #2 Keep 95% in middle Chop 2.5% in each tail For a 95% CI, find the 2.5%-tile and 97.5%-tile in the bootstrap distribution 95% CI=(27.35,30.96)

90% CI for Mean Atlanta Commute Keep 90% in middle Chop 5% in each tail For a 90% CI, find the 5%-tile and 95%-tile in the bootstrap distribution 90% CI=(27.64,30.65)

Bootstrap Confidence Intervals Version 1 (Statistic  2 SE): Great preparation for moving to traditional methods Version 2 (Percentiles): Great at building understanding of confidence intervals

Sampling Distribution Population µ BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed

Bootstrap Distribution Bootstrap “Population” What can we do with just one seed? Grow a NEW tree! µ

Golden Rule of Bootstraps The bootstrap statistics are to the original statistic as the original statistic is to the population parameter.

What about Other Parameters? Generate samples with replacement Calculate sample statistic Repeat...

Example: Difference in Mean Hours of Exercise per Week, by Gender

Example: Standard Deviation of Mustang Prices

Example: Find a 95% confidence interval for the correlation between size of bill and tips at a restaurant. Data: n=157 bills at First Crush Bistro (Potsdam, NY) r=0.915

Bootstrap correlations 95% (percentile) interval for correlation is (0.860, 0.956) BUT, this is not symmetric…

Method #3: Reverse Percentiles Golden rule of bootstraps: Bootstrap statistics are to the original statistic as the original statistic is to the population parameter

Bootstrap CI for Correlation Ex: NFL uniform “malevolence” vs. Penalty yards r = StatKey

Method #3: Reverse Percentiles “Reverse” Percentile Interval: Lower: – = Upper: = Golden rule of bootstraps: Bootstrap statistics are to the original statistic as the original statistic is to the population parameter.

Even Fancier Adjustments... Bias-Corrected Accelerated (BCa): Adjusts percentiles to account for bias and skewness in the bootstrap distribution Other methods: ABC intervals (Approximate Bootstrap Confidence) Bootstrap tilting These are generally implemented in statistical software (e.g. R)

Bootstrap CI’s are NOT Foolproof Example: Find a bootstrap distribution for the median price of Mustangs, based on a sample of 25 cars at online sites. Always plot your bootstraps!

What About Resampling Methods in Hypothesis Tests?

“Randomization” Samples Key idea: Generate samples that are (a)based on the original sample AND (a)consistent with some null hypothesis.

Example: Mean Body Temperature Data: A sample of n=50 body temperatures. Is the average body temperature really 98.6 o F? H 0 :μ=98.6 H a :μ≠98.6 Data from Allen Shoemaker, 1996 JSE data set article

Randomization Samples How to simulate samples of body temperatures to be consistent with H 0 : μ=98.6? StatKey Demo

Randomization Distribution p-value ≈ 1/1000 x 2 = 0.002

Connecting CI’s and Tests Randomization body temp means when μ=98.6 Bootstrap body temp means from the original sample Fathom Demo

Fathom Demo: Test & CI Sample mean is in the “rejection region” Null mean is outside the confidence interval

“... despite broad acceptance and rapid growth in enrollments, the consensus curriculum is still an unwitting prisoner of history. What we teach is largely the technical machinery of numerical approximations based on the normal distribution and its many subsidiary cogs. This machinery was once necessary, because the conceptually simpler alternative based on permutations was computationally beyond our reach. Before computers statisticians had no choice. These days we have no excuse. Randomization-based inference makes a direct connection between data production and the logic of inference that deserves to be at the core of every introductory course.” -- Professor George Cobb, 2007

Materials for Teaching Bootstrap/Randomization Methods?