Download presentation
Presentation is loading. Please wait.
Published byCordelia Griffin Modified over 9 years ago
1
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4
2
Statistical Data Analysis 2 Statistical Data Analysis: Introduction Topics Summarizing data Exploring distributions Bootstrap (continued) Robust methods Nonparametric tests Analysis of categorical data Multiple linear regression
3
Statistical Data Analysis 3 Today’s topics: Bootstrap (Chapter 4: 4.3, 4.4) 4. Bootstrap 4.1. Simulation (read yourself) (last week) 4.2. Bootstrap estimators for distribution (last week) 4.3. Bootstrap confidence intervals 4.4. Bootstrap tests
4
Statistical Data Analysis 4 Bootstrap: recap (1) Situation realizations of, independent, unknown distr. P Bootstrap to estimate distribution of estimator or test statistic Which steps? First error Second error Step 1. Estimate by Step 2. Estimate by i.e. by empirical distribution of
5
Statistical Data Analysis 5 Bootstrap: recap (2) Step 1: Determine theoretical bootstrap estimator empirical distribution i) Estimate P by parametric distribution, parameter estimated stochastic: estimator ii) Estimate by stochastic: bootstrap estimator First error
6
Statistical Data Analysis 6 Bootstrap: recap (3) Step 2: From estimator to estimate: fixed i) If has explicit expression, then done ii) If not, then estimate the estimate: use bootstrap (sampling) scheme to estimate where and from by empirical distribution of, is stochastic: estimator empirical distr. of simulated realizations of is estimate Second error
7
Statistical Data Analysis 7 Bootstrap: recap (4) Obtain empirical distr. of simulated realizations of with bootstrap (sampling) scheme: With the B bootstrap values get impression of (characteristics of) unknown distribution of T n : n draw histogram n compute sample variance n compute sample sd
8
Statistical Data Analysis 8 4.3. Bootstrap confidence intervals (1) T n : estimator of unknown parameter θ Seen: accuracy of estimator T n : variance of estimator’s distribution Now: accuracy of estimator T n : confidence interval (1 - 2α)x100% confidence interval for θ is interval around T n such that it contains `true’ θ with probability > 1 - 2α If interval is [T n - b 1, T n + b 2 ], how to determine b 1 and b 2 ? (blackboard)
9
Statistical Data Analysis 9 Bootstrap confidence intervals (2) (1 - 2α)x100% confidence interval for θ is interval around T n such that it contains `true’ θ with probability > 1 - 2α If interval is [T n - b 1, T n + b 2 ], then b 1 and b 2 determined by [T n - b 1, T n + b 2 ] = with, the distribution of T n – θ, So b 1 and –b 2 are quantiles of unknown distribution How to estimate the quantiles b 1 and –b 2 ?
10
Statistical Data Analysis 10 Bootstrap confidence intervals (3) Interval is [T n - b 1, T n + b 2 ] = How to estimate quantiles b 1 and –b 2 of unknown distribution of T n – θ? Estimate with, use bootstrap Gives estimate of conf interval: (4.1)
11
Statistical Data Analysis 11 Estimate of conf interval: (4.1) In practice, determine in steps: 1. Estimate unknown distribution of T n – θ with,: use bootstrap Same as before? No: T n – θ, need bootstrap values 2. Estimate quantiles by empirical quantiles of bootstrap values 3. Bootstrap confidence interval: Bootstrap confidence intervals (4) (4.2) (You have to know this formula!!)
12
Statistical Data Analysis 12 Estimate of confidence interval: Corresponding bootstrap confidence interval: This is original bootstrap confidence interval, also called reflection method Other method: percentile method Estimate of confidence interval: Corresponding bootstrap confidence interval: Only suitable if symmetric around 0. (Asymptotically two methods give same result) Bootstrap confidence intervals (5) (4.2) (4.1) We will use!! We just discussed:
13
Statistical Data Analysis 13 Bootstrap confidence intervals (5) How to obtain the (sample) α-quantile ? R: if zstar contains the bootstrap values > quantile(zstar, α) Note: always same function of as of For two samples and Y 1,..., Y m method is same Example: if T n,m = X n -Y m, then T n,m * = X n * - Y m * and Z n * = X n * - Y m * - (X n -Y m ) (cf. Example 4.4. in Reader)
14
Statistical Data Analysis 14 4.4. Bootstrap Tests (1) Remember last week’s slide:
15
Statistical Data Analysis 15 From lecture 3: Kolmogorov-Smirnov test (5) Data: y H 0 : F is normal ← composite null hypothesis H 1 : F is not normal Test statistic: R: > ks.test(y,pnorm) D = 0.6922, p-value = 6.661e-16 > ks.test(y,pnorm,mean=mean(y),sd=sd(y)) D = 0.1081, p-value = 0.5655 > mean(y) [1] 3.62158 > sd(y) [1] 3.043356 adj Incorrect: this is test for H 0 : F = N(0,1) H 1 : F ≠ N(0,1) Incorrect : this is test for H 0 : F = N(3.62158,(3.04335) 2 ) H 1 : F ≠ N(3.62158,(3.04335) 2 ) of y Example We have not used D adj ! ! p-value should be 0.126 (next week) Correct?
16
Statistical Data Analysis 16 Bootstrap Tests (2) Solve this with bootstrap test! General idea on blackboard
17
Statistical Data Analysis 17 Bootstrap Tests (3) Example
18
Statistical Data Analysis 18 Bootstrap Tests (4) > hist(dprec, prob=T) > qqnorm(dprec) Example dprec
19
Statistical Data Analysis 19 Bootstrap Tests (5) Example
20
Statistical Data Analysis 20 Bootstrap Tests (6) Example
21
Statistical Data Analysis 21 Bootstrap Tests (7) Example
22
Statistical Data Analysis 22 Bootstrap Tests (8) Example
23
Statistical Data Analysis 23 Bootstrap Tests (9) Example
24
Statistical Data Analysis 24 Recap Bootstrap 4.3. Bootstrap confidence intervals 4.4. Bootstrap tests
25
Statistical Data Analysis 25 Bootstrap The end
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.