Limits to Statistical Theory Bootstrap analysis ESM 206 11 April 2006.

Slides:



Advertisements
Similar presentations
Review bootstrap and permutation
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Hypothesis testing and confidence intervals by resampling by J. Kárász.
Lecture 6 Outline – Thur. Jan. 29
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Sampling Distributions (§ )
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
3 pivot quantities on which to base bootstrap confidence intervals Note that the first has a t(n-1) distribution when sampling from a normal population.
Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
Topic 2: Statistical Concepts and Market Returns
Tuesday, October 22 Interval estimation. Independent samples t-test for the difference between two means. Matched samples t-test.
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
HIM 3200 Normal Distribution Biostatistics Dr. Burton.
CONFIDENCE INTERVALS What is the Purpose of a Confidence Interval?
Chapter 11: Inference for Distributions
5-3 Inference on the Means of Two Populations, Variances Unknown
Quiz 6 Confidence intervals z Distribution t Distribution.
Standard error of estimate & Confidence interval.
STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.
Bootstrapping applied to t-tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Chapter 7 Estimation: Single Population
Dependent Samples: Hypothesis Test For Hypothesis tests for dependent samples, we 1.list the pairs of data in 2 columns (or rows), 2.take the difference.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
1 SAMPLE MEAN and its distribution. 2 CENTRAL LIMIT THEOREM: If sufficiently large sample is taken from population with any distribution with mean  and.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
1 Theoretical Physics Experimental Physics Equipment, Observation Gambling: Cards, Dice Fast PCs Random- number generators Monte- Carlo methods Experimental.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
PARAMETRIC STATISTICAL INFERENCE
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Resampling techniques
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
Computational statistics, lecture3 Resampling and the bootstrap  Generating random processes  The bootstrap  Some examples of bootstrap techniques.
Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.
CPE 619 Comparing Systems Using Sample Data Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
From Wikipedia: “Parametric statistics is a branch of statistics that assumes (that) data come from a type of probability distribution and makes inferences.
1 Probability and Statistics Confidence Intervals.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
Modern Approaches The Bootstrap with Inferential Example.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Quantifying Uncertainty
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
Estimating standard error using bootstrap
Chapter 3 INTERVAL ESTIMATES
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
More on Inference.
Application of the Bootstrap Estimating a Population Mean
ESTIMATION.
Confidence Interval Estimation
Chapter 3 INTERVAL ESTIMATES
Sampling distribution
When we free ourselves of desire,
R Data Manipulation Bootstrapping
More on Inference.
Bootstrap - Example Suppose we have an estimator of a parameter and we want to express its accuracy by its standard error but its sampling distribution.
Putting It All Together: Which Method Do I Use?
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
QQ Plot Quantile to Quantile Plot Quantile: QQ Plot:
Ch13 Empirical Methods.
Interval Estimation and Hypothesis Testing
Introductory Statistics
How Confident Are You?.
Presentation transcript:

Limits to Statistical Theory Bootstrap analysis ESM April 2006

Assumption of t-test Sample mean is a t-distributed random variable –Guaranteed if observations are normally distributed random variables or sample size is very large –In practice, OK if observations are not too skewed and sample size is reasonably large This assumption also applies when using standard formula for 95% CI of mean

Resampling for a confidence interval of the mean IN AN IDEAL WORLD Take sample Calculate sample mean Take new sample Calculate new mean Repeat many times Look at the distribution of sample means 95% CI ranges from 2.5 percentile to 97.5 percentile IN THE REAL WORLD Find some way to simulate taking a sample Calculate the sample mean Repeat many times Look at the distribution of sample means 95% CI ranges from 2.5 percentile to 97.5 percentile

Bootstrap resampling PARAMETRIC BOOTSTRAP Assume data are random variables from a particular distribution –E.g., log-normal Use data to estimate parameters of the distribution –E.g., mean, variance Use random number generator to create sample –Same size as original –Calculate sample mean Allows us to ask: What if data were a random sample from specified distribution with specified parameters? NONPARAMETRIC BOOTSTRAP Assume underlying distribution from which data come is unknown –Best estimate of this distribution is the data themselves – the empirical distribution function Create a new dataset by sampling with replacement from the data –Same size as original –Calculate sample mean WHICH IS BETTER? If underlying distribution is correctly chosen, parametric has more precision If underlying distribution incorrectly chosen, parametric has more bias

TcCB in the cleanup site Parametric bootstrap –If Y is log-normal, it is specified in terms of mean and standard deviation of X = log(Y) –Mean = –SD = –Use “Monte Carlo Simulation” to generate 999 replicate simulated datasets from log-normal distribution –Calculate mean of each replicate and sort means –25 th value is lower end of 95% CI –975 th value is upper end of 95% CI 95% CI: [-0.678, 8.458]

Parametric bootstrap: results 95% CI: [0.917, 2.293]

Normal QQ Plot Sort data Index the values (i = 1,2,…,n) Calculate q = i /(n+1) –This is the quantile Plot quantiles against data values –This is the empirical cumulative distribution function (CDF) Construct CDF of standard normal using same quantiles Compare the distributions at the same quantiles

Nonparametric bootstrap: results 95% CI: [0.851, 9.248]

Bootstrap and hypothesis tests One sample t-test –Calculate bootstrap CI of mean –Does it overlap test value? Paired t-test –Calculate differences: D i = x i - y i –Find bootstrap CI of mean difference –Does it overlap zero? Two-sample t-test –Want to create simulated data where H0 is true (same mean) but allow variance and shape of distribution to differ between populations –Easiest with nonparametric: Subtract mean from each sample. Now both samples have mean zero Resample these residuals, creating simulated group A from residuals of group A and simulated group B from residuals of group B –Generate distribution of t values –P is fraction of simulated t’s that exceed t calculated from data

TcCB: H0: cleanup mean = reference mean t = 1.45 Bootstrapped ‘t’ values do not follow a t distribution! P = 0.02