Give your data the boot: What is bootstrapping? and Why does it matter? Patti Frazer Lock and Robin H. Lock St. Lawrence University MAA Seaway Section.

Slides:



Advertisements
Similar presentations
Panel at 2013 Joint Mathematics Meetings
Advertisements

What Can We Do When Conditions Arent Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2012 JSM San Diego, August 2012.
Statistical Inference Using Scrambles and Bootstraps Robin Lock Burry Professor of Statistics St. Lawrence University MAA Allegheny Mountain 2014 Section.
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock Morgan, Lock, and Lock MAA Minicourse –
THE INTRODUCTORY STATISTICS COURSE: A SABER TOOTH CURRICULUM? George W. Cobb Mount Holyoke College USCOTS Columbus, OH 5/20/05.
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock MAA Minicourse – Joint Mathematics.
Early Inference: Using Bootstraps to Introduce Confidence Intervals Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Models and Modeling in Introductory Statistics Robin H. Lock Burry Professor of Statistics St. Lawrence University 2012 Joint Statistics Meetings San Diego,
A Fiddler on the Roof: Tradition vs. Modern Methods in Teaching Inference Patti Frazer Lock Robin H. Lock St. Lawrence University Joint Mathematics Meetings.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.3 Estimating a Population mean µ (σ known) Objective Find the confidence.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 9_part I ( and 9.7) Tests of Significance.
Sample size computations Petter Mostad
Inference about a Mean Part II
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
Starting Inference with Bootstraps and Randomizations Robin H. Lock, Burry Professor of Statistics St. Lawrence University Stat Chat Macalester College,
Using Simulation Methods to Introduce Statistical Inference Patti Frazer Lock Kari Lock Morgan Cummings Professor of Mathematics Assistant Professor of.
Building Conceptual Understanding of Statistical Inference with Lock 5 Dr. Kari Lock Morgan Department of Statistical Science Duke University Wake Forest.
Bootstrapping: Let Your Data Be Your Guide Robin H. Lock Burry Professor of Statistics St. Lawrence University MAA Seaway Section Meeting Hamilton College,
Using Bootstrap Intervals and Randomization Tests to Enhance Conceptual Understanding in Introductory Statistics Kari Lock Morgan Department of Statistical.
Introducing Inference with Bootstrap and Randomization Procedures Dennis Lock Statistics Education Meeting October 30,
Building Conceptual Understanding of Statistical Inference Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University
Understanding the P-value… Really! Kari Lock Morgan Department of Statistical Science, Duke University with Robin Lock, Patti Frazer.
Using Simulation Methods to Introduce Inference Kari Lock Morgan Duke University In collaboration with Robin Lock, Patti Frazer Lock, Eric Lock, Dennis.
Confidence Intervals: Bootstrap Distribution
Using Lock5 Statistics: Unlocking the Power of Data
Many times in statistical analysis, we do not know the TRUE mean of a population of interest. This is why we use sampling to be able to generalize the.
Welcome to the Unit 8 Seminar Dr. Ami Gates
What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2011 JSM Miami Beach, August 2011.
How to Handle Intervals in a Simulation-Based Curriculum? Robin Lock Burry Professor of Statistics St. Lawrence University 2015 Joint Statistics Meetings.
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
Estimating a Population Proportion
Building Conceptual Understanding of Statistical Inference Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University
Statistics: Unlocking the Power of Data Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University University of Kentucky.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Confidence Interval Estimation for a Population Proportion Lecture 31 Section 9.4 Wed, Nov 17, 2004.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Confidence Intervals: Bootstrap Distribution
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
Implementing a Randomization-Based Curriculum for Introductory Statistics Robin H. Lock, Burry Professor of Statistics St. Lawrence University Breakout.
Building Conceptual Understanding of Statistical Inference Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University Canton, New York.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Using Bootstrapping and Randomization to Introduce Statistical Inference Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Ch 12 – Inference for Proportions YMS 12.1
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
+ Using StatCrunch to Teach Statistics Using Resampling Techniques Webster West Texas A&M University.
Early Inference: Using Randomization to Introduce Hypothesis Tests Kari Lock, Harvard University Eric Lock, UNC Chapel Hill Dennis Lock, Iowa State Joint.
Constructing Bootstrap Confidence Intervals
Review of Statistical Terms Population Sample Parameter Statistic.
Confidence Interval Estimation of Population Mean, μ, when σ is Unknown Chapter 9 Section 2.
Inference for Proportions Section Starter Do dogs who are house pets have higher cholesterol than dogs who live in a research clinic? A.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Bootstraps and Scrambles: Letting a Dataset Speak for Itself Robin H. Lock Patti Frazer Lock ‘75 Burry Professor of Statistics Cummings Professor of MathematicsSt.
SECTION 7.2 Estimating a Population Proportion. Where Have We Been?  In Chapters 2 and 3 we used “descriptive statistics”.  We summarized data using.
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock Minicourse – Joint Mathematics.
Making computing skills part of learning introductory stats
Inference: Conclusion with Confidence
Patti Frazer Lock Cummings Professor of Mathematics
Introducing Statistical Inference with Resampling Methods (Part 1)
Inference: Conclusion with Confidence
When we free ourselves of desire,
Connecting Intuitive Simulation-Based Inference to Traditional Methods
Using Simulation Methods to Introduce Inference
Using Simulation Methods to Introduce Inference
Inference for Proportions
Teaching with Simulation-Based Inference, for Beginners
Presentation transcript:

Give your data the boot: What is bootstrapping? and Why does it matter? Patti Frazer Lock and Robin H. Lock St. Lawrence University MAA Seaway Section Meeting Plattsburgh, October 2010

Bootstrap confidence intervals and randomization hypothesis tests provide an alternate way to DO and to TEACH statistical inference.

Why bootstrap intervals and randomization tests?

Top Ten Reasons for using simulation-based inference Five

5. Maintain student interest by foreshadowing inference from day 1 and getting to the key ideas of inference very early in the course. When do current texts first discuss intervals and tests? Confidence IntervalSignificance Test pg. 359pg. 373 pg. 329pg. 400 pg. 486pg. 511 pg. 319pg. 365

4. Develop students’ intuitive understanding of the key ideas of statistical inference. Descriptive stats Sampling and design Probability distributions Statistical inference formulas Current model in intro stats: The underlying concepts behind intervals and tests are hard. Is this the best way to build understanding?

3. Help students understand the global picture for intervals and tests, rather than memorize a list of formulas. We’d like students to see the general pattern rather than a string of (what can appear to them to be) unrelated formulas.

2. Flexibility!!!  Few underlying assumptions  Works for any parameter  Same methods apply to many situations

1. It’s the way of the past and the future. "Actually, the statistician does not carry out this very simple and very tedious process, but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by this elementary method." -- Sir R. A. Fisher, 1936

“... despite broad acceptance and rapid growth in enrollments, the consensus curriculum is still an unwitting prisoner of history. What we teach is largely the technical machinery of numerical approximations based on the normal distribution and its many subsidiary cogs. This machinery was once necessary, because the conceptually simpler alternative based on permutations was computationally beyond our reach. Before computers statisticians had no choice. These days we have no excuse. Randomization-based inference makes a direct connection between data production and the logic of inference that deserves to be at the core of every introductory course.” -- Professor George Cobb, 2007 … and the future.

Top Five Reasons to use simulation-based inference : 5. Maintain interest by getting to inference early. 4. Develop understanding of the key ideas. 3. Help students understand the global picture. 2. Flexibility. 1.It’s the way of the past and the future.

What is a bootstrap? and How does it give an interval?

Example: Atlanta Commutes Data: The American Housing Survey (AHS) collected data from Atlanta in What’s the mean commute time for workers in metropolitan Atlanta?

Sample of n=500 Atlanta Commutes Where is “true” μ?

“Bootstrap” Samples Key idea: Sample with replacement from the original sample using the same n. Assumes the “population” is many, many copies of the original sample.

Creating a Bootstrap Distribution (using Fathom) 1. Start with the sample in a collection. 2. Define the statistic of interest (as a measure). 3. Create a sample with replacement (same n). 4. Collect the measures for lots of samples. 5. Analyze the distribution of collected measures.

Bootstrap Distribution of 1000 Atlanta Commute Means

Using the Bootstrap Distribution to Get a Confidence Interval – Version #1 The standard deviation of the bootstrap statistics estimates the standard error of the sample statistic. Quick interval estimate : For the mean Atlanta commute time:

Using the Bootstrap Distribution to Get a Confidence Interval – Version # Keep 95% in middle Chop 2.5% in each tail

Using the Bootstrap Distribution to Get a Confidence Interval – Version # Keep 95% in middle Chop 2.5% in each tail For a 95% CI, find the 2.5%-tile and 97.5%-tile in the bootstrap distribution 95% CI=(27.33,31.00)

90% CI for Mean Atlanta Commute Keep 90% in middle Chop 5% in each tail For a 90% CI, find the 5%-tile and 95%-tile in the bootstrap distribution 90% CI=(27.52,30.68)

99% CI for Mean Atlanta Commute Keep 99% in middle Chop 0.5% in each tail For a 99% CI, find the 0.5%-tile and 99.5%-tile in the bootstrap distribution 99% CI=(27.02,31.82)

Other Parameters? Find a 95% confidence interval for the standard deviation, σ, of Atlanta commute times. Original sample: s=20.72

Other Parameters? Find a 98% confidence interval for the correlation between time and distance of Atlanta commutes. Original sample: r =0.807 (0.71, 0.87)

Questions? For more info: Patti Frazer Robin