Download presentation
Presentation is loading. Please wait.
Published byJasmine Hodges Modified over 9 years ago
2
1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6
3
2 Preview of the Presentation 11/17/2004 Bootstrapping Linear Models Introduction to Bootstrap Data and Modeling Methods on Bootstrapping LM Results Issues and Discussion Summary
4
3 What is Bootstrapping ? 11/17/2004 Bootstrapping Linear Models Invented by Bradley Efron, and further developed by Efron and Tibshirani A method for estimating the sampling distribution of an estimator by resampling with replacement from the original sample A method to determine the trustworthiness of a statistic (generalization of the standard deviation)
5
4 Why uses Bootstrapping ? 11/17/2004 Bootstrapping Linear Models Start with 2 questions: What estimator should be used? Having chosen an estimator, how accurate is it? Linear Model with normal random errors having constant variance Least Square Generalized non-normal errors and non-constant variance ???
6
5 The Mammals Data 11/17/2004 Bootstrapping Linear Models A data frame with average brain and body weights for 62 species of land mammals. “body” :Body weight in Kg “brain” :Brain weight in g “name”:Common name of species
7
6 Data and Model 11/17/2004 Bootstrapping Linear Models Linear Regression Model: where j = 1, …, n, and is considered random y = log(brain weight) x = log(body weight)
8
7 Summary of Original Fit 11/17/2004 Bootstrapping Linear Models Residuals: Min 1Q Median 3Q Max -1.71550 -0.49228 -0.06162 0.43597 1.94829 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.13479 0.09604 22.23 <2e-16 *** log(body) 0.75169 0.02846 26.41 <2e-16 *** Residual standard error: 0.6943 on 60 DF Multiple R-Squared: 0.9208 Adjusted R-squared: 0.9195 F-statistic: 697.4 on 1 and 60 DF p-value: < 2.2e-16
9
8 for Original Modeling 11/17/2004 Bootstrapping Linear Models library(MASS) library(boot) c <- par(mfrow=c(1,2)) data <- data(mammals) plot(mammals$body, mammals$brain, main='Original Data', xlab='body weight', ylab='brain weight', col=’brown’) # plot of data plot(log(mammals$body), log(mammals$brain), main='Log-Transformed Data', xlab='log body weight', ylab='log brain weight', col=’brown’) # plot of log-transformed data mammal <- data.frame(log(mammals$body), log(mammals$brain)) dimnames(mammal) <- list((1:62), c("body", "brain")) attach(mammal) log.fit <- lm(brain~body, data=mammal) summary(log.fit)
10
9 Two Methods 11/17/2004 Bootstrapping Linear Models Case-based Resampling: randomly sample pairs (Xi, Yi) with replacement No assumption about variance homogeneity Design fixes the information content of a sample Model-based Resampling: resample the residuals Assume model is correct with homoscedastic errors Resampling model has the same “design” as the data
11
10 Case-Based Resample Algorithm 11/17/2004 Bootstrapping Linear Models For r = 1, …, R, 1.sample randomly with replacement from {1, 2, …,n} 2.for j = 1, …, n, set, then 3.fit least squares regression to, …, giving estimates,,.
12
11 Model-Based Resample Algorithm 11/17/2004 Bootstrapping Linear Models For r = 1, …, n, 1.For j = 1, …, n, a)Set b)Randomly sample from, …, ; then c)Set 1.Fit least squares regression to,…, giving estimates,,.
13
12 Case-Based Bootstrap 11/17/2004 Bootstrapping Linear Models ORDINARY NONPARAMETRIC BOOTSTRAP Bootstrap Statistics : original bias std. error t1* 2.134789 -0.0022155790 0.08708311 t2* 0.751686 0.0001295280 0.02277497 BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Intervals : Level Normal Percentile BCa 95% ( 1.966, 2.308 ) ( 1.963, 2.310 ) ( 1.974, 2.318 ) 95% ( 0.7069, 0.7962 ) ( 0.7082, 0.7954 ) ( 0.7080, 0.7953 ) Calculations and Intervals on Original Scale
14
13 Case-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Bootstrap Distribution Plots for intercept and Slope
15
14 Case-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Standardized Jackknife-after-Bootstrap Plots for intercept and Slope
16
15 for Case-Based 11/17/2004 Bootstrapping Linear Models # Case-Based Resampling fit.case <- function(data) coef(lm(log(data$brain)~log(data$body))) mam.case <- function(data, i) fit.case(data[i, ]) mam.case.boot <- boot(mammals, mam.case, R = 999) mam.case.boot boot.ci(mam.case.boot, type=c("norm", "perc", "bca")) boot.ci(mam.case.boot, index=2, type=c("norm", "perc", "bca")) plot(mam.case.boot) plot(mam.case.boot, index=2) jack.after.boot(mam.case.boot) jack.after.boot(mam.case.boot, index=2)
17
16 Model-Based Bootstrap 11/17/2004 Bootstrapping Linear Models ORDINARY NONPARAMETRIC BOOTSTRAP Bootstrap Statistics : original bias std. error t1* 2.134789 0.0049756072 0.09424796 t2* 0.751686 -0.0006573983 0.02719809 BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Intervals : Level Normal Percentile Bca 95% ( 1.945, 2.315 ) ( 1.948, 2.322 ) ( 1.941, 2.316 ) 95% ( 0.6990, 0.8057 ) ( 0.6982, 0.8062 ) ( 0.6987, 0.8077 ) Calculations and Intervals on Original Scale
18
17 Model-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Bootstrap Distribution Plots for intercept and Slope
19
18 Model-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Standardized Jackknife-after-Bootstrap Plots for intercept and Slope
20
19 for Model-Based 11/17/2004 Bootstrapping Linear Models # Model-Based Resampling (Resample Residuals) fit.res <- lm(brain ~ body, data=mammal) mam.res.data <- data.frame(mammal, res=resid(fit.res), fitted=fitted(fit.res)) mam.res <- function(data, i){ d <- data d$brain <- d$fitted + d$res[i] coef(update(fit.res, data=d)) } fit.res.boot <- boot(mam.res.data, mam.res, R = 999) fit.res.boot boot.ci(fit.res.boot, type=c("norm", "perc", "bca")) boot.ci(fit.res.boot, index=2, type=c("norm", "perc", "bca")) plot(fit.res.boot) plot(fit.res.boot, index=2) boot.ci(fit.res.boot, type=c("norm", "perc", "bca")) jack.after.boot(fit.res.boot) jack.after.boot(fit.res.boot, index=2)
21
20 Comparisons and Discussion 11/17/2004 Bootstrapping Linear Models Comparing Fields Original Model Case-Based (Fixed) Model-Bsed (Random) Intercept (t 1 *) Stand Error 2.13479 0.09604 2.134789 0.08708311 2.134789 0.09424796 Slope (t 2 *) Stand Error 0.75169 0.02846 0.751686 0.02277497 0.751686 0.02719809
22
21 Case-Based Vs. Model-Based 11/17/2004 Bootstrapping Linear Models Model-based resampling enforces the assumption that errors are randomly distributed by resampling the residuals from a common distribution If the model is not specified correctly – i.e., unmodeled nonlinearity, non-constant error variance, or outliers – these attributes do not carry over to the bootstrap samples The effects of outliers is clear in the case-based, but not with the model-based.
23
22 When Might Bootstrapping Fail? 11/17/2004 Bootstrapping Linear Models Incomplete Data Assume that missing data are not problematic If multiple imputation is used beforehand Dependent Data Bootstrap imposes mutual dependence on the Y j, and thus their joint distribution is Outliers and Influential Cases Remove/Correct obvious outliers Avoid the simulations to depend on particular observations
24
23 Review & More Resampling 11/17/2004 Bootstrapping Linear Models Resampling techniques are powerful tools for: -- estimating SD from small samples -- when the statistics do not have easily determined SD Bootstrapping involves: -- taking ‘new’ random samples with replacement from the original data -- calculate boostrap SD and statistical test from the average of the statistic from the bootstrap samples More resampling techniques: -- Jackknife resampling -- Cross-validation
25
24 SUMMARY 11/17/2004 Bootstrapping Linear Models Introduction to Bootstrap Data and Modeling Methods on Bootstrapping LM Results and Comparisons Issues and Discussion
26
25 Reference 11/17/2004 Bootstrapping Linear Models Anderson, B. “Resampling and Regression” McMaster University. http://socserv.mcmaster.ca/anderson Davision, A.C. and Hinkley D.V. (1997) Bootstrap methods and their application. pp.256-273. Cambridge University Press Efron and Gong (February 1983), A Leisurely Look at the Bootstrap, the Jackknife, and Cross Validation, The American Statistician. Holmes, S. “Introduction to the Bootstrap” Stanford University. http://wwwstat.stanford.edu/~susan/courses/s208/ Venables and Ripley (2002), Modern Applied Statistics with S, 4 th ed. pp. 163-165. Springer
27
26 11/17/2004 Bootstrapping Linear Models
28
27 Extra Stuff… 11/17/2004 Bootstrapping Linear Models Jackknife Resampling takes new samples of the data by omitting each case individually and recalculating the statistic each time Resampling data by randomly taking a single observation out # of jackknife samples used # of cases in the original sample Works well for robust estimators of location, but not for SD Cross-Validation randomly splits the sample into two groups comparing the model results from one sample to the results from the other. 1 st subset is used to estimate a statistical model (screening/training sample) Then test our findings on the second subset. (confirmatory/test sample)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.