Bootstraps and Scrambles: Letting Data Speak for Themselves Robin H. Lock Burry Professor of Statistics St. Lawrence University Science.

Slides:



Advertisements
Similar presentations
Implementation and Order of Topics at Hope College.
Advertisements

Panel at 2013 Joint Mathematics Meetings
What Can We Do When Conditions Arent Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2012 JSM San Diego, August 2012.
“Students” t-test.
Statistical Inference Using Scrambles and Bootstraps Robin Lock Burry Professor of Statistics St. Lawrence University MAA Allegheny Mountain 2014 Section.
What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2014 JSM Boston, August 2014.
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock MAA Minicourse – Joint Mathematics.
Lecture 3 Outline: Thurs, Sept 11 Chapters Probability model for 2-group randomized experiment Randomization test p-value Probability model for.
Early Inference: Using Bootstraps to Introduce Confidence Intervals Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Sampling Distributions (§ )
A Fiddler on the Roof: Tradition vs. Modern Methods in Teaching Inference Patti Frazer Lock Robin H. Lock St. Lawrence University Joint Mathematics Meetings.
ESTIMATION AND CONFIDENCE INTERVALS Up to now we assumed that we knew the parameters of the population. Example. Binomial experiment knew probability of.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Connecting Simulation- Based Inference with Traditional Methods Kari Lock Morgan, Penn State Robin Lock, St. Lawrence University Patti Frazer Lock, St.
Probability (cont.). Assigning Probabilities A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. For the.
Starting Inference with Bootstraps and Randomizations Robin H. Lock, Burry Professor of Statistics St. Lawrence University Stat Chat Macalester College,
Bootstrapping: Let Your Data Be Your Guide Robin H. Lock Burry Professor of Statistics St. Lawrence University MAA Seaway Section Meeting Hamilton College,
Bootstrapping applied to t-tests
Dennis Shasha From a book co-written with Manda Wilson
Review of Basic Statistics. Definitions Population - The set of all items of interest in a statistical problem e.g. - Houses in Sacramento Parameter -
Statistical Computing
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2011 JSM Miami Beach, August 2011.
How to Handle Intervals in a Simulation-Based Curriculum? Robin Lock Burry Professor of Statistics St. Lawrence University 2015 Joint Statistics Meetings.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Chapter 10 – Sampling Distributions Math 22 Introductory Statistics.
Robin Lock St. Lawrence University USCOTS Opening Session.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
Implementing a Randomization-Based Curriculum for Introductory Statistics Robin H. Lock, Burry Professor of Statistics St. Lawrence University Breakout.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
PANEL: Rethinking the First Statistics Course for Math Majors Joint Statistical Meetings, 8/11/04 Allan Rossman Beth Chance Cal Poly – San Luis Obispo.
Using Bootstrapping and Randomization to Introduce Statistical Inference Robin H. Lock, Burry Professor of Statistics Patti Frazer Lock, Cummings Professor.
Give your data the boot: What is bootstrapping? and Why does it matter? Patti Frazer Lock and Robin H. Lock St. Lawrence University MAA Seaway Section.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
+ Using StatCrunch to Teach Statistics Using Resampling Techniques Webster West Texas A&M University.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
Psychology 202a Advanced Psychological Statistics October 1, 2015.
Confidence Intervals INTRO. Confidence Intervals Brief review of sampling. Brief review of the Central Limit Theorem. How do CIs work? Why do we use CIs?
Bootstraps and Scrambles: Letting a Dataset Speak for Itself Robin H. Lock Patti Frazer Lock ‘75 Burry Professor of Statistics Cummings Professor of MathematicsSt.
Introducing Statistical Inference with Resampling Methods (Part 1)
Teaching Statistics with Simulation
CHAPTER 10 Comparing Two Populations or Groups
Randomization Tests PSU /2/14.
Sampling distribution
Sampling Distributions
CHAPTER 10 Comparing Two Populations or Groups
Ch. 8 Estimating with Confidence
Statistical Inference for the Mean Confidence Interval
Tutorial 9 Suppose that a random sample of size 10 is drawn from a normal distribution with mean 10 and variance 4. Find the following probabilities:
CHAPTER 10 Comparing Two Populations or Groups
Sampling Distribution Models
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Sampling Distributions (§ )
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
The Normal Distribution
CHAPTER 10 Comparing Two Populations or Groups
Randomization Tests - Beyond One/Two Sample Means & Proportions
Presentation transcript:

Bootstraps and Scrambles: Letting Data Speak for Themselves Robin H. Lock Burry Professor of Statistics St. Lawrence University Science Today SUNY Oswego, March 31, 2010

Bootstrap CI’s & Randomization Tests (1) What are they? (2) Why are they being used more? (3) Can these methods be used to introduce students to key ideas of statistical inference?

Example #1: Perch Weights Suppose that we have collected a sample of 56 perch from a lake in Finland. Estimate and find 95% confidence bounds for the mean weight of perch in the lake. From the sample: n=56 X=382.2 gms s=347.6 gms

Classical CI for a Mean (μ) “Assume” population is normal, then  (289.1, 475.3) For perch sample:

Possible Pitfalls What if the underlying population is NOT normal? What if the sample size is small? What is you have a different sample statistic? What if the Central Limit Theorem doesn’t apply? (or you’ve never heard of it!)

Bootstrap Basic idea: Simulate the sampling distribution of any statistic (like the mean) by repeatedly sampling from the original data. Bootstrap distribution of perch means: Sample 56 values (with replacement) from the original sample. Compute the mean for bootstrap sample Repeat MANY times.

Original Sample (56 fish)

Bootstrap “population” Sample and compute means from this “population”

Bootstrap Distribution of 1000 Perch Means

CI from Bootstrap Distribution Method #1: Use bootstrap std. dev. For 1000 bootstrap perch means: S boot =45.8

CI from Bootstrap Distribution Method #2: Use bootstrap quantiles 2.5% % CI for μ

Example #2: Friendly Observers Experiment: Subjects were tested for performance on a video game Conditions: Group A: An observer shares prize Group B: Neutral observer Response: (categorical) Beat/Fail to Beat score threshold Hypothesis: Players with an interested observer (Group A) will tend to perform less ably. Butler & Baumeister (1998)

A Statistical Experiment Start with 24 subjectsDivide at random into two groups Group A: Share Group B: Neutral Group A: Share Group B: Neutral Record the data (Beat or No Beat)

Friendly Observer Results Group A (share prize) Group B (prize alone) Beat Threshold Failed to Beat Threshold Is this difference “statistically significant”?

Friendly Observer - Simulation 1. Start with a pack of 24 cards. 11 Black (Beat) and 13 Red (Fail to Beat) 2. Shuffle the cards and deal 12 at random to form Group A. 3. Count the number of Black (Beat) cards in Group A. 4. Repeat many times to see how often a random assignment gives a count as small as the experimental count (3) to Group A. Automate this

Friendly Observer – Fathom Computer Simulation 48/1000

Automate: Friendly Observers Applet Allan Rossman & Beth Chance

Observer’s Applet

Fisher’s Exact test P( A Beat < 3)

Example #3: Lake Ontario Trout X = fish age (yrs.) Y = % dry mass of eggs n = 21 fish Is there a significant negative association between age and % dry mass of eggs? r = H o :ρ=0 vs. H a : ρ<0

Randomize the PctDM values to be assigned to any of the ages (ρ=0). Compute the correlation for the randomized sample. Repeat MANY times. See how often the randomization correlations exceed the originally observed r= Randomization Test for Correlation

Randomization Distribution of Sample Correlations when H o :ρ=0 26/1000 r=-0.45

Confidence Interval for Correlation? Construct a bootstrap distribution of correlations for samples of n=20 fish drawn with replacement from the original sample.

Bootstrap Distribution of Sample Correlations r=-0.74r=-0.08

Bootstrap/Randomization Methods Require few (often no) assumptions/conditions on the underlying population distribution. Avoid needing a theoretical derivation of sampling distribution. Can be applied readily to lots of different statistics. Are more intuitively aligned with the logic of statistical inference.

Can these methods really be used to introduce students to the core ideas of statistical inference? Coming in 2012… Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock and Lock