1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6.

1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6

2 Preview of the Presentation 11/17/2004 Bootstrapping Linear Models  Introduction to Bootstrap  Data and Modeling  Methods on Bootstrapping LM  Results  Issues and Discussion  Summary

3 What is Bootstrapping ? 11/17/2004 Bootstrapping Linear Models  Invented by Bradley Efron, and further developed by Efron and Tibshirani  A method for estimating the sampling distribution of an estimator by resampling with replacement from the original sample  A method to determine the trustworthiness of a statistic (generalization of the standard deviation)

4 Why uses Bootstrapping ? 11/17/2004 Bootstrapping Linear Models  Start with 2 questions:  What estimator should be used?  Having chosen an estimator, how accurate is it?  Linear Model with normal random errors having constant variance  Least Square  Generalized non-normal errors and non-constant variance  ???

5 The Mammals Data 11/17/2004 Bootstrapping Linear Models  A data frame with average brain and body weights for 62 species of land mammals.  “body” :Body weight in Kg  “brain” :Brain weight in g  “name”:Common name of species

6 Data and Model 11/17/2004 Bootstrapping Linear Models Linear Regression Model: where j = 1, …, n, and is considered random y = log(brain weight) x = log(body weight)

7 Summary of Original Fit 11/17/2004 Bootstrapping Linear Models Residuals: Min 1Q Median 3Q Max -1.71550 -0.49228 -0.06162 0.43597 1.94829 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.13479 0.09604 22.23 <2e-16 *** log(body) 0.75169 0.02846 26.41 <2e-16 *** Residual standard error: 0.6943 on 60 DF Multiple R-Squared: 0.9208 Adjusted R-squared: 0.9195 F-statistic: 697.4 on 1 and 60 DF p-value: < 2.2e-16

8 for Original Modeling 11/17/2004 Bootstrapping Linear Models library(MASS) library(boot) c <- par(mfrow=c(1,2)) data <- data(mammals) plot(mammals$body, mammals$brain, main='Original Data', xlab='body weight', ylab='brain weight', col=’brown’) # plot of data plot(log(mammals$body), log(mammals$brain), main='Log-Transformed Data', xlab='log body weight', ylab='log brain weight', col=’brown’) # plot of log-transformed data mammal <- data.frame(log(mammals$body), log(mammals$brain)) dimnames(mammal) <- list((1:62), c("body", "brain")) attach(mammal) log.fit <- lm(brain~body, data=mammal) summary(log.fit)

9 Two Methods 11/17/2004 Bootstrapping Linear Models  Case-based Resampling: randomly sample pairs (Xi, Yi) with replacement  No assumption about variance homogeneity  Design fixes the information content of a sample  Model-based Resampling: resample the residuals  Assume model is correct with homoscedastic errors  Resampling model has the same “design” as the data

10 Case-Based Resample Algorithm 11/17/2004 Bootstrapping Linear Models For r = 1, …, R, 1.sample randomly with replacement from {1, 2, …,n} 2.for j = 1, …, n, set, then 3.fit least squares regression to, …, giving estimates,,.

11 Model-Based Resample Algorithm 11/17/2004 Bootstrapping Linear Models For r = 1, …, n, 1.For j = 1, …, n, a)Set b)Randomly sample from, …, ; then c)Set 1.Fit least squares regression to,…, giving estimates,,.

12 Case-Based Bootstrap 11/17/2004 Bootstrapping Linear Models ORDINARY NONPARAMETRIC BOOTSTRAP Bootstrap Statistics : original bias std. error t1* 2.134789 -0.0022155790 0.08708311 t2* 0.751686 0.0001295280 0.02277497 BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Intervals : Level Normal Percentile BCa 95% ( 1.966, 2.308 ) ( 1.963, 2.310 ) ( 1.974, 2.318 ) 95% ( 0.7069, 0.7962 ) ( 0.7082, 0.7954 ) ( 0.7080, 0.7953 ) Calculations and Intervals on Original Scale

13 Case-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Bootstrap Distribution Plots for intercept and Slope

14 Case-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Standardized Jackknife-after-Bootstrap Plots for intercept and Slope

15 for Case-Based 11/17/2004 Bootstrapping Linear Models # Case-Based Resampling fit.case <- function(data) coef(lm(log(data$brain)~log(data$body))) mam.case <- function(data, i) fit.case(data[i, ]) mam.case.boot <- boot(mammals, mam.case, R = 999) mam.case.boot boot.ci(mam.case.boot, type=c("norm", "perc", "bca")) boot.ci(mam.case.boot, index=2, type=c("norm", "perc", "bca")) plot(mam.case.boot) plot(mam.case.boot, index=2) jack.after.boot(mam.case.boot) jack.after.boot(mam.case.boot, index=2)

16 Model-Based Bootstrap 11/17/2004 Bootstrapping Linear Models ORDINARY NONPARAMETRIC BOOTSTRAP Bootstrap Statistics : original bias std. error t1* 2.134789 0.0049756072 0.09424796 t2* 0.751686 -0.0006573983 0.02719809 BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Intervals : Level Normal Percentile Bca 95% ( 1.945, 2.315 ) ( 1.948, 2.322 ) ( 1.941, 2.316 ) 95% ( 0.6990, 0.8057 ) ( 0.6982, 0.8062 ) ( 0.6987, 0.8077 ) Calculations and Intervals on Original Scale

17 Model-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Bootstrap Distribution Plots for intercept and Slope

18 Model-Based Bootstrap 11/17/2004 Bootstrapping Linear Models Standardized Jackknife-after-Bootstrap Plots for intercept and Slope

19 for Model-Based 11/17/2004 Bootstrapping Linear Models # Model-Based Resampling (Resample Residuals) fit.res <- lm(brain ~ body, data=mammal) mam.res.data <- data.frame(mammal, res=resid(fit.res), fitted=fitted(fit.res)) mam.res <- function(data, i){ d <- data d$brain <- d$fitted + d$res[i] coef(update(fit.res, data=d)) } fit.res.boot <- boot(mam.res.data, mam.res, R = 999) fit.res.boot boot.ci(fit.res.boot, type=c("norm", "perc", "bca")) boot.ci(fit.res.boot, index=2, type=c("norm", "perc", "bca")) plot(fit.res.boot) plot(fit.res.boot, index=2) boot.ci(fit.res.boot, type=c("norm", "perc", "bca")) jack.after.boot(fit.res.boot) jack.after.boot(fit.res.boot, index=2)

20 Comparisons and Discussion 11/17/2004 Bootstrapping Linear Models Comparing Fields Original Model Case-Based (Fixed) Model-Bsed (Random) Intercept (t 1 *) Stand Error 2.13479 0.09604 2.134789 0.08708311 2.134789 0.09424796 Slope (t 2 *) Stand Error 0.75169 0.02846 0.751686 0.02277497 0.751686 0.02719809

21 Case-Based Vs. Model-Based 11/17/2004 Bootstrapping Linear Models  Model-based resampling enforces the assumption that errors are randomly distributed by resampling the residuals from a common distribution  If the model is not specified correctly – i.e., unmodeled nonlinearity, non-constant error variance, or outliers – these attributes do not carry over to the bootstrap samples  The effects of outliers is clear in the case-based, but not with the model-based.

22 When Might Bootstrapping Fail? 11/17/2004 Bootstrapping Linear Models  Incomplete Data  Assume that missing data are not problematic  If multiple imputation is used beforehand  Dependent Data  Bootstrap imposes mutual dependence on the Y j, and thus their joint distribution is  Outliers and Influential Cases  Remove/Correct obvious outliers  Avoid the simulations to depend on particular observations

23 Review & More Resampling 11/17/2004 Bootstrapping Linear Models Resampling techniques are powerful tools for: -- estimating SD from small samples -- when the statistics do not have easily determined SD Bootstrapping involves: -- taking ‘new’ random samples with replacement from the original data -- calculate boostrap SD and statistical test from the average of the statistic from the bootstrap samples More resampling techniques: -- Jackknife resampling -- Cross-validation

24 SUMMARY 11/17/2004 Bootstrapping Linear Models  Introduction to Bootstrap  Data and Modeling  Methods on Bootstrapping LM  Results and Comparisons  Issues and Discussion

25 Reference 11/17/2004 Bootstrapping Linear Models  Anderson, B. “Resampling and Regression” McMaster University. http://socserv.mcmaster.ca/anderson  Davision, A.C. and Hinkley D.V. (1997) Bootstrap methods and their application. pp.256-273. Cambridge University Press  Efron and Gong (February 1983), A Leisurely Look at the Bootstrap, the Jackknife, and Cross Validation, The American Statistician.  Holmes, S. “Introduction to the Bootstrap” Stanford University. http://wwwstat.stanford.edu/~susan/courses/s208/  Venables and Ripley (2002), Modern Applied Statistics with S, 4 th ed. pp. 163-165. Springer

26 11/17/2004 Bootstrapping Linear Models

27 Extra Stuff… 11/17/2004 Bootstrapping Linear Models  Jackknife Resampling takes new samples of the data by omitting each case individually and recalculating the statistic each time  Resampling data by randomly taking a single observation out  # of jackknife samples used # of cases in the original sample  Works well for robust estimators of location, but not for SD  Cross-Validation randomly splits the sample into two groups comparing the model results from one sample to the results from the other.  1 st subset is used to estimate a statistical model (screening/training sample)  Then test our findings on the second subset. (confirmatory/test sample)

1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6.

Similar presentations

Presentation on theme: "1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6.

Similar presentations

Presentation on theme: "1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6."— Presentation transcript:

Similar presentations

About project

Feedback