August 2004Copyright Tim Hesterberg1 Introduction to the Bootstrap (and Permutation Tests) Tim Hesterberg, Ph.D. Association of General Clinical Research.

Slides:



Advertisements
Similar presentations
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
CHAPTER 24: Inference for Regression
Sampling Distributions (§ )
Objectives (BPS chapter 24)
Session 2. Applied Regression -- Prof. Juran2 Outline for Session 2 More Simple Regression –Bottom Part of the Output Hypothesis Testing –Significance.
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
Multiple regression analysis
Chapter 10 Simple Regression.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Bootstrapping LING 572 Fei Xia 1/31/06.
Chapter 2 Simple Comparative Experiments
Chapter 11: Inference for Distributions
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Today Concepts underlying inferential statistics
Chapter 12 Section 1 Inference for Linear Regression.
Relationships Among Variables
Bootstrapping applied to t-tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
AM Recitation 2/10/11.
Inference for regression - Simple linear regression
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation.
Ch 11 – Inference for Distributions YMS Inference for the Mean of a Population.
More About Significance Tests
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Chapter 10 Comparing Two Means Target Goal: I can use two-sample t procedures to compare two means. 10.2a h.w: pg. 626: 29 – 32, pg. 652: 35, 37, 57.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
PARAMETRIC STATISTICAL INFERENCE
1 Nonparametric Methods III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
+ Chapter 12: Inference for Regression Inference for Linear Regression.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Relationship between two variables Two quantitative variables: correlation and regression methods Two qualitative variables: contingency table methods.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Robust Estimators.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Notes on Bootstrapping Jeff Witmer 10 February 2016.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Chapter 10: The t Test For Two Independent Samples.
The Practice of Statistics, 5 th Edition1 Check your pulse! Count your pulse for 15 seconds. Multiply by 4 to get your pulse rate for a minute. Write that.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
Inference about the slope parameter and correlation
Correlation and Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Introductory Statistics
Presentation transcript:

August 2004Copyright Tim Hesterberg1 Introduction to the Bootstrap (and Permutation Tests) Tim Hesterberg, Ph.D. Association of General Clinical Research Center Statisticians August 2004, Toronto

August 2004Copyright Tim Hesterberg2 Outline of Talk Why Resample? Introduction to Bootstrapping More examples, sampling methods Two-sample Bootstrap Two-sample Permutation Test Other statistics Other permutation tests

August 2004Copyright Tim Hesterberg3 Why Resample? Fewer assumptions: normality, equal variances Greater accuracy (in practice) Generality: Same basic procedure for wide range of statistics, sampling methods Promote understanding: Concrete analogies to theoretical concepts

August 2004Copyright Tim Hesterberg4 Good Books Hesterberg et al. Bootstrap Methods and Permutation Tests (2003, W. H. Freeman) B. Efron and R. Tibshirani An Introduction to the Bootstrap (1993, Chapman & Hall). A.C. Davison and D.V. Hinkley, Bootstrap Methods and Their Application (Cambridge University Press, 1997).

August 2004Copyright Tim Hesterberg5 Example - Verizon Number of Observations Average Repair Time ILEC (Verizon) CLEC (other carrier) Is the difference statistically significant?

Example Data

August 2004Copyright Tim Hesterberg7 Start Simple We’ll start simple – single sample mean Later –other statistics –two samples –permutation tests

August 2004Copyright Tim Hesterberg8 Bootstrap Procedure Repeat 1000 times –Draw a sample of size n with replacement from the original data (“bootstrap sample”, or “resample”) –Calculate the sample mean for the resample The 1000 bootstrap sample means comprise the bootstrap distribution.

August 2004Copyright Tim Hesterberg9 Bootstrap Distn for ILEC mean

August 2004Copyright Tim Hesterberg10 Bootstrap Standard Error Bootstrap standard error (SE) = standard deviation of bootstrap distribution > ILEC.boot.mean Call: bootstrap(data = ILEC, statistic = mean, seed = 36) Number of Replications: 1000 Summary Statistics: Observed Mean Bias SE mean

August 2004Copyright Tim Hesterberg11 Bootstrap Distn for CLEC mean

August 2004Copyright Tim Hesterberg12 Take another look Take another look at the previous two figures. Is the amount of non-normality/asymmetry there a cause for concern? Note – we’re looking at a sampling distribution, not the underlying distribution. This is after the CLT effect!

August 2004Copyright Tim Hesterberg13 Idea behind bootstrapping Plug-in principle –Underlying distribution is unknown –Substitute your best guess

August 2004Copyright Tim Hesterberg14 Ideal world

August 2004Copyright Tim Hesterberg15 Bootstrap world

August 2004Copyright Tim Hesterberg16 Fundamental Bootstrap Principle Plug-in principle –Underlying distribution is unknown –Substitute your best guess Fundamental Bootstrap Principle –This substitution works –Not always –Bootstrap distribution centered at statistic, not parameter

August 2004Copyright Tim Hesterberg17 Secondary Principle Implement the Fundamental Principle by Monte Carlo sampling This is just an implementation detail! –Exact: n n samples –Monte Carlo, typically 1000 samples 1000 realizations from theoretical bootstrap dist More for higher accuracy (e.g. 500,000)

August 2004Copyright Tim Hesterberg18 Not Creating Data from Nothing Some are uncomfortable with the bootstrap, because they think it is creating data out of nothing. (The name doesn’t help!) Not creating data. No better parameter estimates. (Exception – bagging, boosting.) Use the original data to estimate SE or other aspects of the sampling distribution. –Using sampling, rather than a formula

August 2004Copyright Tim Hesterberg19 Formulaic and Bootstrap SE

August 2004Copyright Tim Hesterberg20 What to substitute? Plug-in principle –Underlying distribution is unknown –Substitute your best guess What to substitute? –Empirical distribution – ordinary bootstrap –Smoothed distribution – smoothed bootstrap –Parametric distribution – parametric bootstrap –Satisfy assumptions, e.g. null hypothesis

August 2004Copyright Tim Hesterberg21 Another example: Kyphosis Variables Kyphosis (present or absent), Age of child, Number of vertebrae in operation, Start of range of vertebrae Logistic regression

August 2004Copyright Tim Hesterberg22 Kyphosis - Logistic Regression Value Std. Error t value (Intercept) Age Start Number Null Deviance: on 80 df Residual Deviance: on 77 df

August 2004Copyright Tim Hesterberg23 Kyphosis vs. Start

August 2004Copyright Tim Hesterberg24 Kyphosis Example Pseudo-code: Repeat 1000 times { Draw sample with replacement from original rows Fit logistic regression Save coefficients } Use the bootstrap distribution Live demo (kyphosis.ssc)

August 2004Copyright Tim Hesterberg25 Bootstrap SE and bias Bootstrap SE (standard error) = standard deviation of bootstrap distribution Bootstrap bias = mean of bootstrap distribution – original statistic

August 2004Copyright Tim Hesterberg26 t confidence interval Statistic +- t* SE(bootstrap) Reasonable interval if bootstrap distribution is approximately normal, little bias. Compare to bootstrap percentiles. Return to Kyphosis example In the literature, “bootstrap t” means something else.

August 2004Copyright Tim Hesterberg27 Percentiles to check Bootstrap t If bootstrap distribution is approximately normal and unbiased, then bootstrap t intervals and corresponding percentiles should be similar. Compare these If similar use either; else use a more accurate interval

August 2004Copyright Tim Hesterberg28 More Accurate Intervals BCa, Tilting, others (real bootstrap-t) Percentile and “bootstrap-t”: –first-order correct –Consistent, coverage error O(1/sqrt(n)) BCa and Tilting: –second-order correct –coverage error O(1/n)

August 2004Copyright Tim Hesterberg29 Different Sampling Procedures Two-sample applications Other sampling situations

August 2004Copyright Tim Hesterberg30 Two-sample Bootstrap Procedure Given independent SRSs from two populations: Repeat 1000 times –Draw sample size m from sample 1 –Draw sample size n from sample 2, independently –Compute statistic that compares two groups, e.g. difference in means The 1000 bootstrap statistics comprise the bootstrap distribution.

August 2004Copyright Tim Hesterberg31 Example – Relative Risk Blood PressureCardiovascular Disease High55/3338 = Low21/2676 = Estimated Relative Risk = 2.12

August 2004Copyright Tim Hesterberg32 …bootstrap Relative Risk

August 2004Copyright Tim Hesterberg33 Example: Verizon

August 2004Copyright Tim Hesterberg34 …difference in means

August 2004Copyright Tim Hesterberg35 …difference in trimmed means

August 2004Copyright Tim Hesterberg36 …comparison Diff means Observed Mean Bias SE mean Diff 25% trimmed means Observed Mean Bias SE Param

August 2004Copyright Tim Hesterberg37 Other Sampling Situations Stratified Sampling –Resample within strata Small samples or strata –Correct for narrowness bias Finite Population –Create finite population, resample without replacement Regression

August 2004Copyright Tim Hesterberg38 Bootstrap SE too small Usual SE for mean is where Bootstrap corresponds to using divisor of n instead of n-1. Bias factor for each sample, each stratum

August 2004Copyright Tim Hesterberg39 Remedies for small SE Multiply SE by sqrt(n/(n-1) –Equal strata sizes only. No effect on CIs. Sample with reduced size, (n-1) Bootknife sampling –Omit random observation –Sample size n from remaining n-1 Smoothed bootstrap –Choose smoothing parameter to match variance –Continuous data only

August 2004Copyright Tim Hesterberg40 Smoothed bootstrap Kernel Density Estimate = Nonparametric bootstrap + random noise

August 2004Copyright Tim Hesterberg41 Finite Population Sample size n from population size N If N is multiple of n, –repeat each observation (N/n) times, –bootstrap sample without replacement If N is not a multiple of n, –Repeat each observation same # of times round N/n up, down

August 2004Copyright Tim Hesterberg42 Resampling for Regression Resample observations (random effects) –Problem with factors, random amount of info Resample residuals (fixed effects) –Fit model –Resample residuals, with replacement –Add to fitted values –Problems with heteroskedasticity, lack of fit

August 2004Copyright Tim Hesterberg43 Basic Rule for Sampling Sample in a way consistent with how the data were produced Including any additional information –Continuous distribution (if it matters, e.g. for medians) –Null hypothesis

August 2004Copyright Tim Hesterberg44 Resampling for Hypothesis Tests Sample in a manner consistent with H0 P-value = P 0 (random value exceeds observed value)

August 2004Copyright Tim Hesterberg45 Permutation Test for 2-samples H 0 : no real difference between groups; observations could come from one group as well as the other Resample: randomly choose n 1 observations for group 1, rest for group 2. Equivalent to permuting all n, first n 1 into group 1.

August 2004Copyright Tim Hesterberg46 Verizon permutation test

August 2004Copyright Tim Hesterberg47 Test results Pooled-variance t-test t = , df = 1685, p-value = Non-pooled-variance t-test t = , df = , p-value = > permVerizon3 Call: permutationTestMeans(data = Verizon$Time, treatment = Verizon$Group, B = , alternative = "less", seed = 99) Number of Replications: Summary Statistics: Observed Mean SE alternative p.value Var less

August 2004Copyright Tim Hesterberg48 Permutation vs Pooled Bootstrap Pooled bootstrap test –Pool all n observations –Choose n 1 with replacement for group 1 –Choose n 2 with replacement for group 2 Permutation test is preferred –Condition on the observed data –Same number of outliers as the observed data

August 2004Copyright Tim Hesterberg49 Assumptions Permutation Test: –Same distribution for two populations When H 0 is true Population variances must be the same; sample variances may differ –Does not require normality –Does not require that data be a random sample from a larger population

August 2004Copyright Tim Hesterberg50 Other Statistics Procedure works for variety of statistics –Difference in means –t-statistic –difference in trimmed means Work directly with statistic of interest –Same p-value for and pooled-variance t- statistic

August 2004Copyright Tim Hesterberg51 Difference in Trimmed Means P-value =

August 2004Copyright Tim Hesterberg52 General Permutation Tests Compute Statistic for data Resample in a way consistent with H 0 and study design Construct permutation distribution P-value = percentage of resampled statistics that exceed original statistic

August 2004Copyright Tim Hesterberg53 Perm Test for Matched Pairs or Stratified Sampling Permute within each pair Permute within each stratum

August 2004Copyright Tim Hesterberg54 Example: Puromycin The data are from a biochemical experiment where the initial velocity of a reaction was measured for different concentrations of the substrate. Data are from two runs, one on cells treated with the drug Puromycin, the other on cells without Variables concentration, velocity, treatment

August 2004Copyright Tim Hesterberg55 Puromycin data

August 2004Copyright Tim Hesterberg56 Permutation Test for Puromycin Statistic: ratio of smooths, at each original concentration Stratify by original concentration Permute only the treatment variable permutationTest(data = Puromycin, statistic = f, alternative = "less", combine = T, seed = 42, group = Puromycin$conc, resampleColumns = "state")

August 2004Copyright Tim Hesterberg57 Puromycin – Permutation Graphs

August 2004Copyright Tim Hesterberg58 Puromycin – P-values Summary Statistics: Observed Mean SE alternative p-value less less less less less less Combined p-value: 0.02, 0.06, 0.11, 0.22, 0.56,

August 2004Copyright Tim Hesterberg59 Permutation test curves

August 2004Copyright Tim Hesterberg60 Permutation Test of Relationship To test H 0 : X and Y are independent Permute either X or Y (both is just extra work) Test statistic may be correlation, regression slope, chi-square statistic (Fisher’s exact test), …

August 2004Copyright Tim Hesterberg61 Perm Test in Regression Simple regression: permute X or Y Multiple regression: –Permute Y to test H0: no X contributes –To test incremental contribution of X 1 Cannot permute X 1 That loses joint relationship of Xs

August 2004Copyright Tim Hesterberg62 Example: Kyphosis Variables Kyphosis (present or absent), Age of child, Number of vertebrae in operation, Start of range of vertebrae Logistic regression

August 2004Copyright Tim Hesterberg63 Kyphosis - Logistic Regression Value Std. Error t value (Intercept) Age Start Number Null Deviance: on 80 df Residual Deviance: on 77 df

August 2004Copyright Tim Hesterberg64 Kyphosis vs. Start

August 2004Copyright Tim Hesterberg65 Kyphosis Permutation Test Permute Kyphosis (the response variable), leaving other variables fixed. Test statistic is residual deviance. Summary Statistics: Observed Mean SE alternative p-value Param less 0.001

August 2004Copyright Tim Hesterberg66 Kyphosis Permutation Distribution

August 2004Copyright Tim Hesterberg67 When Perm Testing Fails Permutation Testing is not Universal –Cannot test H 0 :  = 0 –Cannot test H 0 :  = 1 Use Confidence Intervals Bootstrap tilting –Find maximum-likelihood weighted distribution that satisfies H 0, use weighted bootstrap

August 2004Copyright Tim Hesterberg68 If time permits Bias – Portfolio optimization example, in section3.ppt More about confidence intervals, from section5.ppt

August 2004Copyright Tim Hesterberg69 Summary Basic bootstrap idea – –Substitute best estimate for population(s) For testing, match null hypothesis –Sample consistently with how data produced –Inspect bootstrap distribution – Normal? –Compare t and percentile intervals, BCa & tilting

August 2004Copyright Tim Hesterberg70 Summary Testing –Sample consistent with H0 –Permutation test to compare groups, test relationships –No permutation tests in some situations; use bootstrap confidence interval or test

August 2004Copyright Tim Hesterberg71 Resources S+Resample

August 2004Copyright Tim Hesterberg72 Supplement for pages This document is a supplement to the presentation at the AGS. This includes some material that was shown in a live demo using S-PLUS, corresponding to pages of the original presentation.

August 2004Copyright Tim Hesterberg73 Another example: Kyphosis Variables Kyphosis (present or absent), Age of child, Number of vertebrae in operation, Start of range of vertebrae Logistic regression

August 2004Copyright Tim Hesterberg74 Kyphosis - Logistic Regression Value Std. Error t value (Intercept) Age Start Number Null Deviance: on 80 df Residual Deviance: on 77 df

August 2004Copyright Tim Hesterberg75 Kyphosis Example Pseudo-code: Repeat 1000 times { Draw sample with replacement from original rows Fit logistic regression Save coefficients } Use the bootstrap distribution Live demo (kyphosis.ssc)

August 2004Copyright Tim Hesterberg76 Kyphosis vs. Start

August 2004Copyright Tim Hesterberg77 Graphical bootstrap of predictions

August 2004Copyright Tim Hesterberg78 Bootstrap Coefficients

August 2004Copyright Tim Hesterberg79 Bootstrap Scatterplots

August 2004Copyright Tim Hesterberg80 t confidence interval Statistic +- t* SE(bootstrap) Reasonable interval if bootstrap distribution is approximately normal, little bias. Compare to bootstrap percentiles. Return to Kyphosis example In the literature, “bootstrap t” means something else.

August 2004Copyright Tim Hesterberg81 Are t-limits reasonable here?

August 2004Copyright Tim Hesterberg82 Are t-limits reasonable here?

August 2004Copyright Tim Hesterberg83 Are t-limits reasonable here? Remember, the previous two plots show the bootstrap distribution, an estimate of the sampling distribution, after the Central Limit Theorem has had its chance to work.

August 2004Copyright Tim Hesterberg84 Percentiles to check Bootstrap t If bootstrap distribution is approximately normal and unbiased, then bootstrap t intervals and corresponding percentiles should be similar. Compare these If similar use either; else use a more accurate interval

August 2004Copyright Tim Hesterberg85 Compare t and percentile CIs > signif(limits.t(boot.kyphosis), 2) 2.5% 5% 95% 97.5% (Intercept) Age Start Number > signif(limits.percentile(boot.kyphosis), 2) 2.5% 5% 95% 97.5% (Intercept) Age Start Number

August 2004Copyright Tim Hesterberg86 Compare asymmetry of CIs > signif(limits.t(boot.kyphosis) - boot.kyphosis$observed, 2) 2.5% 5% 95% 97.5% (Intercept) Age Start Number > signif(limits.percentile(boot.kyphosis) - boot.kyphosis$observed, 2) 2.5% 5% 95% 97.5% (Intercept) Age Start Number