Resampling techniques

Slides:

Advertisements

Similar presentations

SAMPLE DESIGN: HOW MANY WILL BE IN THE SAMPLE—DESCRIPTIVE STUDIES ?

Advertisements

October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.

Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 

Sampling: Final and Initial Sample Size Determination

Model Assessment, Selection and Averaging

Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.

Maximum likelihood (ML) and likelihood ratio (LR) test

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Elementary hypothesis testing

Point estimation, interval estimation

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

Regression II Model Selection Model selection based on t-distribution Information criteria Cross-validation.

Maximum likelihood (ML)

Elementary hypothesis testing

Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.

Maximum likelihood (ML) and likelihood ratio (LR) test

Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.

Sampling Distributions

Evaluating Hypotheses

Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of scientific research When you know the system: Estimation.

2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.

Basics of discriminant analysis

Generalised linear models Generalised linear model Exponential family Example: Log-linear model - Poisson distribution Example: logistic model- Binomial.

Chapter 11 Multiple Regression.

Bootstrapping LING 572 Fei Xia 1/31/06.

Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?

Resampling techniques

Linear and generalised linear models

July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.

Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.

Linear and generalised linear models

Basics of regression analysis

Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.

Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.

Maximum likelihood (ML)

Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,

Statistical Computing

1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.

Random Sampling, Point Estimation and Maximum Likelihood.

Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.

Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”

PARAMETRIC STATISTICAL INFERENCE

University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.

9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping

1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.

Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

Limits to Statistical Theory Bootstrap analysis ESM April 2006.

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Sampling and estimation Petter Mostad

1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.

Validation methods.

1 Probability and Statistics Confidence Intervals.

Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.

Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.

BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.

Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.

Bootstrapping and Randomization Techniques Q560: Experimental Methods in Cognitive Science Lecture 15.

Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.

Ch13 Empirical Methods.

Learning From Observed Data

Presentation transcript:

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap

Why resampling? One of the purposes of statistics is to estimate some parameters and their reliability. Since estimators are functions of sample points they are random variables. If we could find distribution of this random variable (sample statistic) then we could estimate reliability of the estimators. Unfortunately apart from the simplest cases, sampling distribution is not easy to derive. There are several techniques to approximate them. These include: Edgeworth series, Laplace approximation, saddle-point approximations. They give analytical forms for the approximate distributions. With advent of computers, computationally intensive methods are emerging. They work in many cases satisfactorily. Examples of simplest cases where sample distributions are known include: Sample mean, when sample is from a population with normal distribution, has normal distribution with mean value equal to the population mean and variance equal to variance of the population divided by the sample size if population variance is known. If population variance is not known then variance of sample mean is the sample variance divided by n. Sample variance has the distribution of multiple of 2 distribution. Again it is valid if population distribution is normal and sample points are independent. Sample mean divided by square root of sample variance has the multiple of the t distribution – again normal and independence case For independent samples and normal distribution: sample variance divided by sample variance has the multiple of F-distribution.

Resampling techniques Three of the popular computer intensive resampling techniques are: Jacknife. It is a useful tool for bias removal. It may work fine for medium and large samples. Cross-validation. Very useful technique for model selection. It may help to choose “best” model among those under consideration. Bootstrap. Perhaps one of the most important resampling techniques. It can reduce bias as well as can give variance of the estimator. Moreover it can give the distribution of the statistic under consideration. This distribution can be used for such wide variety purposes as interval estimation, hypothesis testing.

Jacknife Jacknife is used for bias removal. As we know, mean-square error of an estimator is equal to the square of the bias plus the variance of the estimator. If the bias is much higher than variance then under some circumstances Jacknife could be used. Description of Jacknife: Let us assume that we have a sample of size n. We estimate some sample statistics using all the data – tn. Then by removing one point at a time we estimate tn-1,i, where subscript indicates the size of the sample and the index of the removed sample point. Then new estimator is derived as: If the order of the bias of the statistic tn is O(n-1) then after the jacknife the order of the bias becomes O(n-2). Variance is estimated using: This procedure can be applied iteratively. I.e. for the new estimator jacknife can be applied again. First application of Jacknife can reduce bias without changing variance of the estimator. But its second and higher order application can in general increases the variance of the estimator.

Jacknife: An example Let us take a data set of size 12 and perform jacknife for mean value. data mean 0) 368 390 379 260 404 318 352 359 216 222 283 332 323.5833 Jacknife samples 1) 390 379 260 404 318 352 359 216 222 283 332 319.5455 2) 368 379 260 404 318 352 359 216 222 283 332 317.5455 3) 368 390 260 404 318 352 359 216 222 283 332 318.5455 4) 368 390 379 404 318 352 359 216 222 283 332 329.3636 5) 368 390 379 260 318 352 359 216 222 283 332 316.2727 6) 368 390 379 260 404 352 359 216 222 283 332 324.0909 7) 368 390 379 260 404 318 359 216 222 283 332 321.0000 8) 368 390 379 260 404 318 352 216 222 283 332 320.3636 9) 368 390 379 260 404 318 352 359 222 283 332 333.3636 10) 368 390 379 260 404 318 352 359 216 283 332 332.8182 11) 368 390 379 260 404 318 352 359 216 222 332 327.2727 12) 368 390 379 260 404 318 352 359 216 222 283 322.8182 tjack = 12*323.5833-11*mean(t) =323.5833. It is an unbiased estimator

Cross-validation Cross-validation is a resampling technique to overcome overfitting. Let us consider a least-squares technique. Let us assume that we have a sample of size n y=(y1,y2,,,yn). We want to estimate the parameters =(1, 2,,, m). Now let us further assume that mean value of the observations is a function of these parameters (we may not know form of this function). Then we can postulate that function has a form g. Then we can find values of the parameters using least-squares techniques. Where X is a fixed (design) matrix. After minimisation of h we will have values of the parameters, therefore complete definition of the function. Form of the function g defines model we want to use. We may have several forms of the function. Obviously if we have more parameters, the fit will be “better”. Question is what would happen if we would have new observations. Using estimated values of the parameters we could estimate the square of differences. Let us say we have new observations (yn+1,,,yn+l). Can our function predict these new observations? Which function predicts better? To answer to these questions we can calculate new differences: Where PE is the prediction error. Function g that gives smallest value for PE have higher predictive power. Model that gives smaller h but larger PE corresponds to overfitted model.

Cross-validation: Cont. If we have a sample of observations, can we use this sample and choose among given models. Cross validation attempts to reduce overfitting thus helps model selection. Description of cross-validation: We have a sample of the size n. Divide the sample into K roughly equal size parts. For the k-th part, estimate parameters using K-1 parts excluding k-th part. Calculate prediction error for k-th part. Repeat it for all k=1,2,,,K and combine all prediction errors and get cross-validation prediction error. If K=n then we will have leave-one-out cross-validation technique. Let us denote an estimate at the k-th step by k (it is a vector of parameters). Let k-th subset of the sample be Ak and number of points in this subset is Nk.. Then prediction error per observation is: Then we would choose the function that gives the smallest prediction error. We can expect that in future when we will have new observations this function will give smallest prediction error. This technique is widely used in modern statistical analysis. It is not restricted to least-squares technique. Instead of least-squares we could could use other techniques such as maximum-likelihood, Bayesian estimation.

Bootstrap Bootstrap is one of the computationally expensive techniques. Its simplicity and increasing computational power makes this technique as a method of choice in many applications. In a very simple form it works as follows. We have a sample of size n. We want to estimate some parameter . The estimator for this parameter gives t. For each sample point we assign probability (usually equal to 1/n, i.e. all sample points have equal probability). Then from this sample with replacement we draw another random sample of size n and estimate . This procedure is repeated times. Let us denote an estimate of the parameter by tj* at the j-th resampling stage. Bootstrap estimator for  and its variance is calculated as: It is a very simple form of application of the bootstrap resampling. For the parameter estimation, the number of the bootstrap samples is usually chosen to be around 200. When the distribution is desired then the recommended number is around 1000-2000 Let us analyse the working of bootstrap in one simple case. Consider a random variable X with sample (outcome) space x=(x1,,,,xM). Each point have the probability fj. I.e. f =(f1,,,fM) represents the distribution of the population. The sample of size n will have relative frequencies for each sample point as

Bootstrap: Cont. Then the distribution of conditional on f will be multinomial distribution: Multinomial distribution is the extension of the binomial distribution and expressed as: Limiting distribution of: is multinormal distribution. If we resample from the given sample then we should consider conditional distribution of the following (that is also multinomial distribution): Limiting distribution of is the same as the conditional distribution of the original sample. Since these two distribution converge to the same distribution then well behaved function of them also will have same limiting distributions. Thus if we use bootstrap to derive distribution of the sample statistic we can expect that in the limit it will converge to the distribution of sample statistic. I.e. following two function will have the same limiting distributions:

Bootstrap: Cont. If we could enumerate all possible resamples from our sample then we could build “ideal” bootstrap distribution. In practice even with modern computers it is impossible to achieve. Instead Monte Carlo simulation is used. Usually it works like: Draw a random sample of size of n with replacement from the given sample of size n. Estimate parameter and get the estimate tj. Repeat step 1) and 2) B times and build frequency and cumulative distributions for t

Bootstrap: Cont. While resampling we did not use any assumption about the population distribution. So, this bootstrap is a non-parametric bootstrap. If we have some idea about the population distribution then we can use it in resampling. I.e. when we draw randomly from our sample we can use population distribution. For example if we know that population distribution is normal then we can estimate its parameters using our sample (sample mean and variance). Then we can approximate population distribution with this sample distribution and use it to draw new samples. As it can be expected if assumption about population distribution is correct then parametric bootstrap will perform better. If it is not correct then non-parametric bootstrap will overperform its parametric counterpart.

Balanced bootstrap One of the variation of bootstrap resampling is balanced bootstrap. In this case, when resampling one makes sure that number of occurrences of each sample point is the same. I.e. if we make B bootstrap we try to make the number of xi equal to B in all bootstrap samples. Of course, in each sample some of the observation will be present several times and other will be missing. But for all of them we want to make sure that all sample points are present and their number of occurrences is the same. It can be achieved as follows: Let us assume that the number of sample points is n. Repeat numbers from 1 to n, B times Find a random permutation of numbers from 1 to nB. Call it a vector N(nB) Take the first n points from N and the corresponding sample points. Estimate parameter of interest. Then take the second n points (from n+1 to 2n) and corresponding sample points and do estimation. Repeat it B times and find bootstrap estimators, distributions and etc.

Balanced bootstrap: Example. Let us assume that we have 3 sample points and number of bootstraps we want is 3. Our observations are: (x1,x2,x3) Then we repeat numbers from 1 to 3 three times: 1 2 3 1 2 3 1 2 3 Then we take one of the random permutations of numbers from 1 to 3x3=9. E.g. 4 3 9 5 6 1 2 8 7 First we take observations x1,x3,x3 estimate the parameter Then we take x2,x3,x1 and estimate the parameter Then we take x2,x2,x1 and we estimate parameter. As it can be seen each observation is present 3 times. This technique meant to improve the results of bootstrap resampling.

Bootstrap: Example. Let us take the example we used for Jackknife. We generate 10000 (simple) bootstrap samples and estimate for each of them the mean value. Here is the bootstrap distribution of the estimated parameter. This distribution now can be used for various purposes (for variance estimation, for interval estimation, hypothesis testing and so on). For comparison the normal distribution with mean equal to the sample mean and variance equal to the sample variance divided by number of elements is also given (black line) . It seems that the approximation with the normal distribution was sufficiently good.

References Efron, B (1979) Bootstrap methods: another look at the jacknife. Ann Statist. 7, 1-26 Efron, B Tibshirani, RJ (1993) “An Introduction to the Bootstrap” Chernick, MR. (1999) Bootstrap Methods: A practitioner’s Guide. Berthold, M and Hand, DJ (2003) Intelligent Data Analysis Kendall’s advanced statistics, Vol 1 and 2

Exercise 2 Differences between means and bootstrap confidence intervals Two species (A and B) of trees were planted randomly. Each specie had 10 plots. Average height for each plot was measured after 6 years. Analyze differences in means. A: 3.2 2.7 3.0 2.7 1.7 3.3 2.7 2.6 2.9 3.3 B: 2.8 2.7 2.0 3.0 2.1 4.0 1.5 2.2 2.7 2.5 Test hypothesis. H0: means are equal, H1: means are not equal Use var.test for equality of variances, t.test for equality of means. Use bootstrap distributions and define confidence intervals. Functions from the course website: boot_mean, boot_with_function and several others. Write a report.