What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2014 JSM Boston, August 2014.

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Advanced Piloting Cruise Plot.
1
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
STATISTICS Sampling and Sampling Distributions
STATISTICS HYPOTHESES TEST (III) Nonparametric Goodness-of-fit (GOF) tests Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
STATISTICS HYPOTHESES TEST (II) One-sample tests on the mean and variance Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
Detection of Hydrological Changes – Nonparametric Approaches
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Mean, Median, Mode & Range
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
2010 fotografiert von Jürgen Roßberg © Fr 1 Sa 2 So 3 Mo 4 Di 5 Mi 6 Do 7 Fr 8 Sa 9 So 10 Mo 11 Di 12 Mi 13 Do 14 Fr 15 Sa 16 So 17 Mo 18 Di 19.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
PP Test Review Sections 6-1 to 6-6
ABC Technology Project
EU market situation for eggs and poultry Management Committee 20 October 2011.
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
What Can We Do When Conditions Arent Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2012 JSM San Diego, August 2012.
Confidence Intervals: Bootstrap Distribution
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
VOORBLAD.
15. Oktober Oktober Oktober 2012.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
“Start-to-End” Simulations Imaging of Single Molecules at the European XFEL Igor Zagorodnov S2E Meeting DESY 10. February 2014.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Slippery Slope
Statistical Inference Using Scrambles and Bootstraps Robin Lock Burry Professor of Statistics St. Lawrence University MAA Allegheny Mountain 2014 Section.
H to shape fully developed personality to shape fully developed personality for successful application in life for successful.
Januar MDMDFSSMDMDFSSS
Statistical Inferences Based on Two Samples
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Weekly Attendance by Class w/e 6 th September 2013.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Energy Generation in Mitochondria and Chlorplasts
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
CpSc 3220 Designing a Database
DISTRIBUSI PROBABILITAS KONTINYU Referensi : Walpole, RonaldWalpole. R.E., Myers, R.H., Myers, S.L., and Ye, K Probability & Statistics for Engineers.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2011 JSM Miami Beach, August 2011.
Presentation transcript:

What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2014 JSM Boston, August 2014

Why Do We Have “Conditions”?

CI for a Mean To use t* the sample should be from a normal distribution (especially if n is small). But what if it’s a small sample that is clearly skewed, has outliers, …?

Problem: n<30 and the data look right skewed. Is a t-distribution appropriate? Example #1: Mean Mustang Price Start with a random sample of 25 prices (in $1,000’s) from the web. Task: Find a 95% confidence interval for the mean Mustang price

Problems: What’s the standard error (SE) for s? What’s the appropriate reference distribution? Example #2: Std. Dev. of Mustang Prices Given the sample of 25 Mustang prices … Task: Find a 90% CI for the standard deviation of Mustang prices

Bootstrapping Basic Idea: Use simulated samples, based only the original sample data, to approximate the sampling distribution and standard error of the statistic. “Let your data be your guide.” Brad Efron Stanford University Estimate the SE without using a known “formula” Remove conditions on the underlying distribution Also provides a way to introduce the key ideas!

Common Core H.S. Standards Statistics: Making Inferences & Justifying Conclusions HSS-IC.A.1 Understand statistics as a process for making inferences about population parameters based on a random sample from that population. HSS-IC.A.2 Decide if a specified model is consistent with results from a given data-generating process, e.g., using simulation. HSS-IC.B.3 Recognize the purposes of and differences among sample surveys, experiments, and observational studies; explain how randomization relates to each. HSS-IC.B.4 Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling. HSS-IC.B.5 Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant. Statistics: Making Inferences & Justifying Conclusions HSS-IC.A.1 Understand statistics as a process for making inferences about population parameters based on a random sample from that population. HSS-IC.A.2 Decide if a specified model is consistent with results from a given data-generating process, e.g., using simulation. HSS-IC.B.3 Recognize the purposes of and differences among sample surveys, experiments, and observational studies; explain how randomization relates to each. HSS-IC.B.4 Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling. HSS-IC.B.5 Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant.

Bootstrapping To create a bootstrap distribution: Assume the “population” is many, many copies of the original sample. Simulate many “new” samples from the population by sampling with replacement from the original sample. Compute the sample statistic for each bootstrap sample. “Let your data be your guide.” Brad Efron Stanford University

Original Sample (n=6) Finding a Bootstrap Sample A simulated “population” to sample from Bootstrap Sample (sample with replacement from the original sample)

Original Sample Bootstrap Sample ●●●●●● Bootstrap Statistic Sample Statistic Bootstrap Statistic ●●●●●● Bootstrap Distribution Many times

Key concept: How much can we expect the sample means to vary just by random chance? Example #1: Mean Mustang Price Start with a random sample of 25 prices (in $1,000’s) from the web. Goal: Find an interval that is likely to contain the mean price for all Mustangs for sale on the web.

Original Sample Bootstrap Sample Repeat 1,000’s of times!

We need technology! StatKey Freely available web apps with no login required Runs in (almost) any browser (incl. smartphones/tablets) Google Chrome App available (no internet needed) Standalone or supplement to existing technology

Bootstrap Distribution for Mustang Price Means Three Distributions One to Many Samples

How do we get a CI from the bootstrap distribution? Method #1: Standard Error Find the standard error (SE) as the standard deviation of the bootstrap statistics Find an interval with

Standard Error

How do we get a CI from the bootstrap distribution? Method #1: Standard Error Find the standard error (SE) as the standard deviation of the bootstrap statistics Find an interval with Method #2: Percentile Interval For a 95% interval, find the endpoints that cut off 2.5% of the bootstrap means from each tail, leaving 95% in the middle

95% Confidence Interval Keep 95% in middle Chop 2.5% in each tail We are 95% sure that the mean price for Mustangs is between $11,762 and $20,386

Bootstrap Confidence Intervals Version 1 (Statistic  2 SE): Great preparation for moving to traditional methods Version 2 (Percentiles): Great at building understanding of confidence level Same process works for different parameters! Either method requires few prerequisites.

Example #2: Std. Dev. Mustang Price Find a 90% confidence interval for the standard deviation of the prices of all Mustangs for sale at this website. nmeanstd. dev. Price Price (in $1,000’s) What changes? Record the sample standard deviation for each of the bootstrap samples.

90% CI for Std. Dev. of Mustang Prices We are 90% sure that the standard deviation of all Mustang prices at this website is between 7.61 and (thousand dollars).

What About Technology? Other possible options? Fathom R Minitab (macros) JMP StatCrunch Others? xbar=function(x,i) mean(x[i]) x=boot(Time,xbar,1000) x=do(1000)*sd(sample(Price,25,replace=TRUE))

Why does the bootstrap work?

Sampling Distribution Population µ BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed

Bootstrap Distribution Bootstrap “Population” What can we do with just one seed? Grow a NEW tree! µ

Golden Rule of Bootstraps The bootstrap statistics are to the original statistic as the original statistic is to the population parameter.

What About Hypothesis Tests?

Create a randomization distribution by simulating many samples from the original data, assuming H 0 is true, and calculating the sample statistic for each new sample. Estimate p-value directly as the proportion of these randomization statistics that exceed the original sample statistic. Randomization Approach

Example #3: Beer & Mosquitoes Volunteers 1 were randomly assigned to drink either a liter of beer or a liter of water. Mosquitoes were caught in nets as they approached each volunteer and counted. nmean Beer Water Does this provide convincing evidence that mosquitoes tend to be more attracted to beer drinkers or could this difference be just due to random chance? 1 Lefvre, T., et. al., “Beer Consumption Increases Human Attractiveness to Malaria Mosquitoes, ” PLoS ONE, 2010; 5(3): e9546.

Example #3: Beer & Mosquitoes µ = mean number of attracted mosquitoes H 0 : μ B = μ W H a : μ B > μ W Competing claims about the population means Is this a “significant” difference? How do we measure “significance”?...

P-value: The proportion of samples, when H 0 is true, that would give results as (or more) extreme as the original sample. Say what???? KEY IDEA

Physical Simulation

Randomization Approach Water Beer Number of Mosquitoes To simulate samples under H 0 (no difference): Re-randomize the values into Beer & Water groups Original Sample

Randomization Approach Water Beer Number of Mosquitoes To simulate samples under H 0 (no difference): Re-randomize the values into Beer & Water groups

Randomization Approach Number of Mosquitoes Beer Water Repeat this process 1000’s of times to see how “unusual” is the original difference of StatKey

p-value = proportion of samples, when H 0 is true, that are as (or more) extreme as the original sample. p-value

Example #4: Mean Body Temperature Data: A sample of n=50 body temperatures. Is the average body temperature really 98.6 o F? H 0 :μ=98.6 H a :μ≠98.6 Data from Allen Shoemaker, 1996 JSE data set article

Key idea: For a randomization distribution we need to generate samples that are (a) consistent with the null hypothesis (b) based on the sample data. How to simulate samples of body temps to be consistent with H 0 : μ=98.6? StatKey

Randomization Distribution Looks pretty unusual… two-tail p-value ≈ 4/5000 x 2 =

Bootstrap vs. Randomization Distributions Bootstrap DistributionRandomization Distribution Our best guess at the distribution of sample statistics Our best guess at the distribution of sample statistics, if H 0 were true Centered around the observed sample statistic Centered around the null hypothesized value Simulate samples by resampling from the original sample Simulate samples assuming H 0 is true Key difference: a randomization distribution assumes H 0 is true, while a bootstrap distribution does not

Body Temperature - Bootstrap

Body Temperature-Randomization What’s the difference between these two distributions?

Body Temperature Bootstrap Distribution Randomization Distribution H 0 :  = 98.6 H a :  ≠

Body Temperature Bootstrap Distribution Randomization Distribution H 0 :  = 98.4 H a :  ≠ 98.4

Materials for Teaching Bootstrap/Randomization Methods?