Presentation is loading. Please wait.

Presentation is loading. Please wait.

Techniques for the Computing-Capable Statistician

Similar presentations


Presentation on theme: "Techniques for the Computing-Capable Statistician"— Presentation transcript:

1 Techniques for the Computing-Capable Statistician
BOOTSTRAPS Techniques for the Computing-Capable Statistician 8/2/2019 Think hard about statistical properties of estimators.

2 Think hard about statistical properties of estimators.
An Introduction to the Bootstrap Bradley Efron Robert J. Tibshirani Chapman & Hall/CRC Monographs on Statistics and Applied Probability 57 THE BOOK 8/2/2019 Think hard about statistical properties of estimators.

3 WHEN PROBABILITY THEORY WORKS...
...it works very well. sums of iid random variables ~Normal min of iid random variables ~Exponential sums of standard Normal^2 ~c2 ratios of c2 ~F See handout on transforms, etc. 8/2/2019 Think hard about statistical properties of estimators.

4 Think hard about statistical properties of estimators.
STATISTICS Estimate a quantity q-hat estimates q Predict the variability of the estimate involves predicting the distribution (form, parameters) of the estimator q-hat X-bar and s-hat have known distributions X-bar ~ Normal s2-hat ~ c2 8/2/2019 Think hard about statistical properties of estimators.

5 Think hard about statistical properties of estimators.
OTHER STATISTICS Median (an example of) quartile estimates order statistics Ratios, Transforms, non-polynomial Functions None of these have known distributions How can you assess the variability of an estimator? 8/2/2019 Think hard about statistical properties of estimators.

6 PRELIMINARY DEFINITION AND NOTATION
Given samples X1, X2 , ... , Xn X(i) is the i-th smallest sample and is called the i-th order statistic Xa = X(i) such that an ~ i is called the a-th p-tile 8/2/2019 Think hard about statistical properties of estimators.

7 Think hard about statistical properties of estimators.
EXAMPLE X(1) = 10 X(2) = 11 X(3) = 11.2 X(4) = 11.6 X0.025 = X(25) X(997) = 12.8 X(998) = 12.9 X(999) = 13.0 X(1000) = 13.9 X0.975 = X(975) 8/2/2019 Think hard about statistical properties of estimators.

8 EMPIRICAL CONFIDENCE INTERVAL
Empirical confidence interval for X is (X0.025 , X0.975) = (X(25) , X(975) ) for X, not the mean or median, etc. Can use all 1000 samples to estimate the median M = X(500) = 11.9 NO predictive value How accurate is this estimate? 8/2/2019 Think hard about statistical properties of estimators.

9 Think hard about statistical properties of estimators.
MORE VENACULAR Call F the underlying distribution of the phenomenon being studied F(x) = P(X <= x) Call F-hat the empirical (observed example) distribution of F F-hat = {X1, X2 , ... , Xn} weighted 1/n each BOOTSTRAPPING: Use F-hat as a sampling surrogate for F don’t oversell resulting reliability of estimates 8/2/2019 Think hard about statistical properties of estimators.

10 Think hard about statistical properties of estimators.
SMOOTHED F-HAT 8/2/2019 Think hard about statistical properties of estimators.

11 Think hard about statistical properties of estimators.
BOOTSTRAPPING Given samples F-hat = {X1, X2 , ... , Xn} b-th bootstrap sample x*(b) sample n times from X1, X2 , ... , Xn with replacement let m*(b) be the median of the b-th set of samples m*(1), m*(2), ..., m*(B) is a sample of medians 8/2/2019 Think hard about statistical properties of estimators.

12 THE BASE SAMPLE FORMS THE POPULATION FOR THE BOOTSTRAP SAMPLE
BOOTSTRAP WORLD EMPIRICAL F X*1, X*2 , ... , X*n ... REAL WORLD X1, X2 , ... , Xn F BOOTSTRAP estimate of Mbase ‘s distribution Mbase usual estimate 8/2/2019 Think hard about statistical properties of estimators.

13 Think hard about statistical properties of estimators.
KEY EXCEPTION Are m*(1), m*(2), ..., m*(B) independent samples of the median? With respect to F-hat but not with respect to F 8/2/2019 Think hard about statistical properties of estimators.

14 Think hard about statistical properties of estimators.
Mbase has nonparametric confidence interval .....(m*0.025 , m*0.975) Standard error of Mbase estimated as a standard deviation 8/2/2019 Think hard about statistical properties of estimators.

15 PRACTICAL APPLICATION
Bootstrap samples treated as independent B ~ 500 Practical for ANY sample statistic Spreadsheet Bootstrap.xls does an estimate of the Median and IQR (X X0.25) for IQ scores 8/2/2019 Think hard about statistical properties of estimators.

16 IS BOOTSTRAPPING CHEATING?
Example: 100 real datapoints, 200 Bootstrap samples statistic M calculated for each Bootstrap sample Standard (non-bootstrap) Error of Mbase is S(M*i – Mbase)2/199 8/2/2019 Think hard about statistical properties of estimators.

17 IS BOOTSTRAPPING CHEATING?
If we had 100 x 200 = 20,000 independent samples One large pool to estimate Mbase Standard Error of M is ~ S(Mi – Mbase)2/(19,999) As the number of bootstrap samples increase, the standard error estimate stabilizes As the number of independent samples increases, the standard error estimate converges to 0! 8/2/2019 Think hard about statistical properties of estimators.

18 Think hard about statistical properties of estimators.
SUMMARY Bootstrapping allows us to estimate the variability of sample statistics where the statistic’s probability distribution is unknown. 8/2/2019 Think hard about statistical properties of estimators.


Download ppt "Techniques for the Computing-Capable Statistician"

Similar presentations


Ads by Google