Introducing Inference with Bootstrap and Randomization Procedures Dennis Lock Statistics Education Meeting October 30, 2012 1.

Slides:



Advertisements
Similar presentations
Estimating a Population Mean
Advertisements

Panel at 2013 Joint Mathematics Meetings
Statistical Inference Using Scrambles and Bootstraps Robin Lock Burry Professor of Statistics St. Lawrence University MAA Allegheny Mountain 2014 Section.
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock Morgan, Lock, and Lock MAA Minicourse –
THE INTRODUCTORY STATISTICS COURSE: A SABER TOOTH CURRICULUM? George W. Cobb Mount Holyoke College USCOTS Columbus, OH 5/20/05.
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock MAA Minicourse – Joint Mathematics.
Simulating with StatKey Kari Lock Morgan Department of Statistical Science Duke University Joint Mathematical Meetings, San Diego 1/11/13.
Hypothesis Testing: Intervals and Tests
Bootstrap Distributions Or: How do we get a sense of a sampling distribution when we only have ONE sample?
Early Inference: Using Bootstraps to Introduce Confidence Intervals Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Intuitive Introduction to the Important Ideas of Inference Robin Lock – St. Lawrence University Patti Frazer Lock – St. Lawrence University Kari Lock Morgan.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
Models and Modeling in Introductory Statistics Robin H. Lock Burry Professor of Statistics St. Lawrence University 2012 Joint Statistics Meetings San Diego,
A Fiddler on the Roof: Tradition vs. Modern Methods in Teaching Inference Patti Frazer Lock Robin H. Lock St. Lawrence University Joint Mathematics Meetings.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
8-3 Testing a Claim about a Proportion
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
StatKey: Online Tools for Bootstrap Intervals and Randomization Tests Kari Lock Morgan Department of Statistical Science Duke University Joint work with.
Section 4.4 Creating Randomization Distributions.
Starting Inference with Bootstraps and Randomizations Robin H. Lock, Burry Professor of Statistics St. Lawrence University Stat Chat Macalester College,
Using Simulation Methods to Introduce Statistical Inference Patti Frazer Lock Kari Lock Morgan Cummings Professor of Mathematics Assistant Professor of.
Building Conceptual Understanding of Statistical Inference with Lock 5 Dr. Kari Lock Morgan Department of Statistical Science Duke University Wake Forest.
Bootstrapping: Let Your Data Be Your Guide Robin H. Lock Burry Professor of Statistics St. Lawrence University MAA Seaway Section Meeting Hamilton College,
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
StatKey Online Tools for Teaching a Modern Introductory Statistics Course Robin Lock Burry Professor of Statistics St. Lawrence University
Using Bootstrap Intervals and Randomization Tests to Enhance Conceptual Understanding in Introductory Statistics Kari Lock Morgan Department of Statistical.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
Chapter 8 Introduction to Hypothesis Testing. Hypothesis Testing Hypothesis testing is a statistical procedure Allows researchers to use sample data to.
Building Conceptual Understanding of Statistical Inference Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University
Understanding the P-value… Really! Kari Lock Morgan Department of Statistical Science, Duke University with Robin Lock, Patti Frazer.
Using Simulation Methods to Introduce Inference Kari Lock Morgan Duke University In collaboration with Robin Lock, Patti Frazer Lock, Eric Lock, Dennis.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
How to Handle Intervals in a Simulation-Based Curriculum? Robin Lock Burry Professor of Statistics St. Lawrence University 2015 Joint Statistics Meetings.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
Building Conceptual Understanding of Statistical Inference Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University
Statistics: Unlocking the Power of Data Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University University of Kentucky.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Using Randomization Methods to Build Conceptual Understanding of Statistical Inference: Day 2 Lock, Lock, Lock Morgan, Lock, and Lock MAA Minicourse- Joint.
StatKey Online Tools for Teaching a Modern Introductory Statistics Course Robin Lock Burry Professor of Statistics St. Lawrence University
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
Implementing a Randomization-Based Curriculum for Introductory Statistics Robin H. Lock, Burry Professor of Statistics St. Lawrence University Breakout.
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
Building Conceptual Understanding of Statistical Inference Patti Frazer Lock Cummings Professor of Mathematics St. Lawrence University Canton, New York.
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Using Bootstrapping and Randomization to Introduce Statistical Inference Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Give your data the boot: What is bootstrapping? and Why does it matter? Patti Frazer Lock and Robin H. Lock St. Lawrence University MAA Seaway Section.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : χ.
+ Using StatCrunch to Teach Statistics Using Resampling Techniques Webster West Texas A&M University.
Early Inference: Using Randomization to Introduce Hypothesis Tests Kari Lock, Harvard University Eric Lock, UNC Chapel Hill Dennis Lock, Iowa State Joint.
Chapter 9 Day 2 Tests About a Population Proportion.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
StatKey Online Tools for Teaching a Modern Introductory Statistics Course Robin Lock Burry Professor of Statistics St. Lawrence University
Bootstraps and Scrambles: Letting a Dataset Speak for Itself Robin H. Lock Patti Frazer Lock ‘75 Burry Professor of Statistics Cummings Professor of MathematicsSt.
Simulation-based inference beyond the introductory course Beth Chance Department of Statistics Cal Poly – San Luis Obispo
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock Minicourse – Joint Mathematics.
Making computing skills part of learning introductory stats
Patti Frazer Lock Cummings Professor of Mathematics
Simulation Based Inference for Learning
When we free ourselves of desire,
Using Simulation Methods to Introduce Inference
Using Simulation Methods to Introduce Inference
Presentation transcript:

Introducing Inference with Bootstrap and Randomization Procedures Dennis Lock Statistics Education Meeting October 30,

An introductory statistics book writing with my family – Robin H. Lock (St. Lawrence) – Patti F. Lock (St. Lawrence) – Kari Lock Morgan (Harvard/Duke) – Eric F. Lock (UNC/Duke) introduces inference through simulation techniques Release Date one week from today!! 2 Statistics: Unlocking The Power of Data

Randomization Hypothesis Tests – Sometimes call permutation tests Bootstrap Confidence Intervals 3 Simulation Techniques

Hypothesis Test: 1.Determine Null and Alternative Hypothesis 2.Use a formula to calculate a test statistic 3.Compare to “some” distribution assuming the Null Hypothesis is true 4.Use a Normal table, or computer software to find a p-value 4 Traditional Methods

Plugging numbers into formulas and relying on theory from mathematical statistics does little for conceptual understanding. With a variety of formulae for each situation students get mired in the details, losing the big picture. – This is especially apparent with p-values! 5 Traditional Methods

Simulation Approach Hypothesis Test: 1.Determine the Null and Alternative Hypothesis 2.Simulate randomization samples, assuming the Null Hypothesis is true 3.Calculate the statistic of interest for each simulated randomization 4.Find the proportion of simulated statistics as extreme or more extreme than the observed statistic 6

Simulation Approach: Example Treating cocaine addiction 1 – 48 cocaine addicts seeking treatment – 24 assigned randomly to two treatments: Desipramine Lithium – Two possible outcomes Relapse No Relapse Typical difference in proportions 7 1 Gawin, F., et al., ‘‘Desipramine Facilitation of Initial Cocaine Abstinence,” Archives of General Psychiatry, 1989; 46(2): 117–121.

Simulation Approach: Example 8 RelapseNo Relapse Desipramine 1014 Lithium 186

Simulation Approach: Example 2.Simulate randomization samples, assuming the Null Hypothesis is true Key Idea: We wish to generate samples that are: a)Consistent with the Null Hypothesis and b)Based on the sample data and c) consistent with the way the data was collected – If the null hypothesis is true then the treatment has no effect on the response. So we take our 28 relapse and 20 non-relapse counts and randomly assign them to one of two treatment groups. – Important point: This matches how the original data was collected! 9

Simulation Approach: Example 10 RelapseNo Relapse Desipramine 159 Lithium 1410

Simulation Approach: Example 4.Find the proportion of simulated statistics as extreme or more extreme than the observed statistic 11

Randomization Approach Intrinsically connected to concepts Same procedure applies to all statistics No conditions to check 12

Simulation and Traditional Simulation methods good for motivating conceptual understanding of inference However, familiarity with traditional methods (t-test) is still expected after intro stat Use simulation methods to introduce inference, and then teach the traditional methods as “short-cut formulas” 13

Bootstrap confidence intervals Normal distributions Data production (samples/experiments) 14 Reworked Stat 101 Descriptive Statistics – one and two samples Sampling distributions (mean/proportion) Confidence intervals (means/proportions) Hypothesis tests (means/proportions) Randomization-based hypothesis tests

Inference Introduced When do you get to inference? – Traditional: towards the end of the course Still haven’t gotten to inference in 104, just finished writing the second exam Agresti and Franklin p-value introduced? Page 404! – Simulation: Early! Students don’t need to know probability or the normal distribution before inference Chapter 3: Confidence Intervals! Lock5 p-value introduced? Page 236! 15

"Actually, the statistician does not carry out this very simple and very tedious process, but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by this elementary method.“ – Sir R. A. Fisher on permutation methods, Not a new idea!

We couldn’t! – It isn’t until recently we’ve had the computing power to make this process realistic. – Change is slow… 17 Why don’t we teach this way?

Vast majority of Introductory statistics students are going into a field other than statistics. – Traditional methods are how members of this field do statistics, so expected to be known! – Unfortunately this results in teaching statistics such that students can perform these tests As long as they can compute a t-test we succeeded! 18 Why don’t we teach this way?

19 Technological Advances

"Automate calculation and graphics as much as possible.“ – David S. Moore, 1992 Our text follows this idea – Formula’s are given for completeness but very briefly – Focuses on interpretation not calculation – Saves time! 20 Technological Advances

“They get the answer right but do not understand.” Following sampling distributions with bootstrap confidence intervals can help in this situation – Bootstrap distribution looks very similar to a sampling distribution! 21 Discussion of Sampling Distribution

We assume the sample is representative of the population, so we can approximate the population as many copies of the original sample. – We take a sampling distribution with sample size n from this mock population. – This is done by: 1.Sampling n observations with replacement from the original distribution. 2.Computing the statistic of interest (bootstrap statistic) 3.Distribution of these statistics is a bootstrap distribution. 22 Bootstrap Distribution

Teaching uses: – Simply observing the distribution (symmetric and bell shaped, etc.) – Using it to find a standard error for the statistic. Empirical rule interval These look like intervals they will see later – Percentiles! Constructing confidence intervals with percentiles These confidence intervals are very intuitive, rather then looking at values from a table! 23 Using the Bootstrap Distribution

Important note: We stick to only using the bootstrap on symmetric bell-shaped distributions. Bootstrap CI’s can be used on other distributions, but this is beyond the scope of an intro stat course – Bias-corrected and accelerated intervals – “Reverse” percentile intervals – Many others 24 Using the Bootstrap Distribution

“... the consensus curriculum is still an unwitting prisoner of history. What we teach is largely the technical machinery of numerical approximations based on the normal distribution and its many subsidiary cogs. This machinery was once necessary, because the conceptually simpler alternative based on permutations was computationally beyond our reach. Before computers statisticians had no choice. These days we have no excuse. Randomization-based inference makes a direct connection between data production and the logic of inference that deserves to be at the core of every introductory course.” – Professor George W. Cobb, from: “The Introductory Statistics Course: A Ptolemaic Curriculum”, George Cobb Paper

Not very! – The students come away with the same information they have now… Plus hopefully much more understanding! – Simulation methods make up only 6 sections out of about 50! 26 How extreme are these changes?

Having available technology to perform bootstrap and randomization procedures is a necessity! – This is possible in all of the major stat packages, and becoming easier in most of them (although still not ideal). – Enter StatKey! 27 Technology Applets

StatKey is a series of applets designed for the book, but available freely to the public. – I’ve actually been using StatKey this semester to help explain sampling distributions in class. 28 StatKey!

Unite States Conference on Teaching Statistics Theme: “The next BIG thing” in statistics education – All attendees were polled, winner… Using randomization methods in introductory statistics! 29 USCOTS 2011