2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.

Slides:



Advertisements
Similar presentations
An introduction to the Bootstrap method Hugh Shanahan University College London November 2001 I know that it will happen, Because I believe in the certainty.
Advertisements

Hypothesis testing and confidence intervals by resampling by J. Kárász.
Mean, Proportion, CLT Bootstrap
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chap 10: Summarizing Data 10.1: INTRO: Univariate/multivariate data (random samples or batches) can be described using procedures to reveal their structures.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 10 Simple Regression.
Resampling techniques
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Evaluating Hypotheses
Sampling Distributions
Bootstrapping LING 572 Fei Xia 1/31/06.
Inference about a Mean Part II
Bootstrapping applied to t-tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Statistical Computing
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
ESTIMATING with confidence. Confidence INterval A confidence interval gives an estimated range of values which is likely to include an unknown population.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Biostatistics IV An introduction to bootstrap. 2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
PARAMETRIC STATISTICAL INFERENCE
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
10.1: Confidence Intervals – The Basics. Introduction Is caffeine dependence real? What proportion of college students engage in binge drinking? How do.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
CpSc 810: Machine Learning Evaluation of Classifier.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Resampling techniques
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Sampling and estimation Petter Mostad
Case Selection and Resampling Lucila Ohno-Machado HST951.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Review of Statistical Terms Population Sample Parameter Statistic.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Estimating standard error using bootstrap
Standard Errors Beside reporting a value of a point estimate we should consider some indication of its precision. For this we usually quote standard error.
Ch3: Model Building through Regression
Statistical Methods For Engineers
CONCEPTS OF ESTIMATION
Bootstrap - Example Suppose we have an estimator of a parameter and we want to express its accuracy by its standard error but its sampling distribution.
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Ch13 Empirical Methods.
Simple Linear Regression
Bootstrapping and Bootstrapping Regression Models
Introductory Statistics
Presentation transcript:

2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU

2008 Chingchun 2 Introduction A data-based simulation method For statistical inference –finding estimators of the parameter in interest –Confidence of the parameter in interest

2008 Chingchun 3 An example Two statistics definition for a random variable Average: sample mean Standard error: The standard deviation of the sample means Calculation of two statistics Carry out measurement “ many ” times Observations from these two statistics Standard error decreases as N increases Sample mean becomes more reliable as N increases

2008 Chingchun 4 Central limit theorem Averages taken from any distribution Averages taken from any distribution (your experimental data) will have a normal (your experimental data) will have a normal distribution distribution The error for such an statistic will The error for such an statistic will decrease slowly as the number of decrease slowly as the number of observations increase observations increase

2008 Chingchun 5 Normal distribution Averages of N.D.    distribution Averages of    distribution

2008 Chingchun 6 Uniform distribution Averages of U.D.

2008 Chingchun 7 Consequences of central limit theorem Bootstrap --- the technique to the rescue But nobody tells you how big the sample has to be.. But nobody tells you how big the sample has to be.. Should we believe a measurement of “Average”? Should we believe a measurement of “Average”? How about other objects rather than “Average”How about other objects rather than “Average”

2008 Chingchun 8 Basic idea of bootstrap Originally, from some list of data, one computes an object (e.g.: statistic). Create an artificial list by randomly drawing elements from that list. Some elements will be picked more than once. Nonparametric mode (later) Parametric mode (later) Compute a new object. Repeat times and look at the distribution of these objects.

2008 Chingchun 9 A simple example Data available comparing grades before and after leaving graduate school Some linear correlation between grades  =0.776 But how reliable is this result (  =0.776)?

2008 Chingchun 10 A simple example

2008 Chingchun 11

2008 Chingchun 12

2008 Chingchun 13 A simple example

2008 Chingchun 14 Confidence intervals Consider the similar situation as before The parameter of interest is  (e.g. Mean) is an estimator of  based on the sample. We are interested in finding the confidence interval for the parameter.

2008 Chingchun 15 The percentile algorithm Input the level=2 for the confidence interval. Generate B number of bootstrap samples. Compute for b = 1,…, B Arrange the new data set with ‘s in order. Compute and percentile for the new data. C.I. is given by ( th, th ) Percentile5%10%16%50%84%90%95% Percentile of

2008 Chingchun 16 How many bootstraps ? No clear answer to this. Rule of thumb : try it 100 times, then 1000 times, and see if your answers have changed by much.

2008 Chingchun 17 How many bootstraps ? B Std. Error

2008 Chingchun 18 Convergence This histogram is showing the distribution of the correlation coefficient for the bootstrap sample. Here B=200, B=500

2008 Chingchun 19 Contd… B=1000, B=2000

2008 Chingchun 20 Contd….. B=3000 B=4000 Now it can be seen the sampling distributions of correlation coefficient are more or less identical.

2008 Chingchun 21 Contd….. The above graph is showing the similarity in the distribution of the bootstrap distribution and the direct enumeration from random samples from the empirical distribution

2008 Chingchun 22 Is it reliable ? Observations Good agreement for Normal (Gaussian) distributions Skewed distributions tend to more problematic, particularly for the tails A tip: For now nobody is going to shoot you down for using it.

2008 Chingchun 23 Schematic representation of bootstrap procedure

2008 Chingchun 24 Bootstrap The bootstrap can be used either non- parametrically or parametrically In nonparametric mode, it avoids restrictive and sometimes dangerous parametric assumptions about the form of the underlying population. In parametric mode it can provide more accurate estimates of errors than traditional methods.

2008 Chingchun 25 Parametric Bootstrap Real World Statistic of interest Bootstrap World Estimated Bootstrap probability sample model Bootstrap Replication ( distribution) P x (samples)

2008 Chingchun 26 Bootstrap The technique was extended, modified and refined to handle a wide variety of problems including: –(1) confidence intervals and hypothesis tests, –(2) linear and nonlinear regression, –(3) time series analysis and other problems 26

2008 Chingchun 27 Fit a cubic spline: (N=50 training data) Example: one-dimensional smoothing

2008 Chingchun 28 Least squares where  where  The bootstrap and maximum likelihood method

2008 Chingchun 29 Nonparametric bootstrap Repeat B=200 times: - draw a dataset of N=50 with replacement from the training data z i =(x i,y i ) - fit a cubic spline Construct a 95% pointwise confidence interval: At each x i compute the mean and find the 2,5% and 97,5% percentiles The bootstrap and maximum likelihood method

2008 Chingchun 30 Parametric bootstrap We assume that the model errors are Gaussian: Repeat B=200 times: - draw a dataset of N=50 with replacement from the training data z i =(x i,y i ) - fit a cubic spline on z i : and estimate - simulate new responses : z i * =(x i,y i * ) - fit a cubic spline on z i * : Construct a 95 pointwise confidence interval: At each x i compute the mean and find the 2,5% and 97,5% percentiles The bootstrap and maximum likelihood method

2008 Chingchun 31 Parametric bootstrap Conclusion: least squares = parametric bootstrap as B   (only because of Gaussian errors) The bootstrap and maximum likelihood method

2008 Chingchun 32 Some notations The Bootstrap is –A computer-based method for assigning measures of accuracy to statistical estimates. –The basic idea behind bootstrap is very simple, and goes back at least two centuries. – The bootstrap method is not a way of reducing the error ! It only tries to estimate it. –Bootstrap methods depend only on the Bootstrap samples. It does not depend on the underlying distribution.

2008 Chingchun 33 A general data set-up We have dealt with –The standard error –The confidence interval –With the assumption that distribution is either unknown or very complicated. The situation can be more general –Like regression, –Sometimes using maximum likelihood estimation.

2008 Chingchun 34 Conclusion The bootstrap allow the data analyst to –Asses the statistical accuracy of complicated procedures, by exploiting the power of the computer. The use of the bootstrap either –Relief the analyst from having to do complex mathematical derivation or –Provide an answer where no analytical answer can be obtained.

2008 Chingchun 35 Addendum : The Jack-knife Jack-knife is a special kind of bootstrap. Each bootstrap subsample has all but one of the original elements of the list. For example, if original list has 10 elements, then there are 10 jack-knife subsamples.

2008 Chingchun 36 Introduction (continued) Definition of Efron ’ s nonparametric bootstrap. Given a sample of n independent identically distributed (i.i.d.) observations X 1, X 2, …, X n from a distribution F and a parameter  of the distribution F with a real valued estimator  (X 1, X 2, …, X n ), the bootstrap estimates the accuracy of the estimator by replacing F with F n, the empirical distribution, where F n places probability mass 1/n at each observation X i. 36

2008 Chingchun 37 Introduction (continued) Let X 1 *, X 2 *, …, X n * be a bootstrap sample, that is a sample of size n taken with replacement from F n. The bootstrap, estimates the variance of  (X 1, X 2, …, X n ) by computing or approximating the variance of  * =  (X 1 *, X 2 *, …, X n * ). 37

2008 Chingchun 38 Introduction (continued) The bootstrap is similar to earlier techniques which are also called resampling methods: –(1) jackknife, –(2) cross-validation, –(3) delta method, –(4) permutation methods, and –(5) subsampling.. 38

2008 Chingchun 39 Bootstrap Remedies In the past decade many of the problems where the bootstrap is inconsistent remedies have been found by researchers to give good modified bootstrap solutions that are consistent. For both problems describe thus far a simple procedure called the m-out-n bootstrap has been shown to lead to consistent estimates. 39

2008 Chingchun 40 The m-out-of-n Bootstrap This idea was proposed by Bickel and Ren (1996) for handling doubly censored data. Instead of sampling n times with replacement from a sample of size n they suggest to do it only m times where m is much less than n. To get the consistency results both m and n need to get large but at different rates. We need m=o(n). That is m/n→0 as m and n both → ∞. This method leads to consistent bootstrap estimates in many cases where the ordinary bootstrap has problems, particularly (1) mean with infinite variance and (2) extreme value distributions. 40 Don ’ t know why.

2008 Chingchun 41 Examples where the bootstrap fails Athreya (1987) shows that the bootstrap estimate of the sample mean is inconsistent when the population distribution has an infinite variance. Angus (1993) provides similar inconsistency results for the maximum and minimum of a sequence of independent identically distributed observations. 41