Download presentation
Presentation is loading. Please wait.
1
Ch13 Empirical Methods
2
Objective of this chapter:
CHAPTER CONTENTS CHAPTER CONTENTS 13.1 Introduction 13.2 The Jackknife Method 13.3 An Introduction to Bootstrap Methods 13.4 The Expectation Maximization Algorithm 13.5 Introduction to Markov Chain Monte Carlo 13.6 Chapter Summary 13.7 Computer Examples Projects for Chapter Objective of this chapter: To introduce several empirical methods that are being increasingly used in statistical computations as an alternative or as an improvement to classical Statistical methods.
4
13.1 Introduction
6
13.2 The Jackknife Method The “jackknife” method:
<> also called the “Quenouille-Tukey jackknife” method; <> invented by Maurice Quenouille in 1956; <> for testing hypotheses and finding confidence intervals where traditional methods are not applicable or not well suited; <> could also be used with multivariate data; <> very useful when outliers are present in the data or the dispersion of the distribution is wide; <> used to estimate the variability of statistic from the variability of that statistic between subsamples; <> avoids the parametric assumptions that we used in obtaining the sampling distribution of the statistic to calculate standard error; <> can be considered as a nonparametric estimate of the parameter; <> was introduced for bias reduction (thus improving a given estimator) and is a useful method for variance estimation.
8
EXAMPLE A random sample of n = 6 from a given population resulted in the following data: (a) Find a jackknife point estimate of the population mean . (b) Construct a 95% jackknife confidence interval for the population mean .
10
13.3 An Introduction to Bootstrap Methods
The bootstrap method When to use? the statistical distribution is unknown, or the assumptions of normality are not satisfied For what? for estimating sampling distributions What it does provides a simple method for obtaining an approximate sampling distribution of the statistic, conditional on the observed data Features Easier to calculate Simulation based Based on one sample what we are creating is not what happened, but rather what could have happened in the past from what did happen
11
Empirical (or sample) cumulative distribution function
F: the population probability distribution. F is unknown. Let X1, . . ., Xn be a random sample from a probability distribution F with = E(xi) and 2 = V(Xi).
12
* : bootstrap
14
Subsampling in bootstrap algorithm
Necessity The size of the repeated subsamples = n. (n is the size of the original sample.) Subsampling is done with replacement. Not necessary. But that will lead to the best result.
15
EXAMPLE The following data represent the total ozone levels measured in Dobson units at randomly selected locations on Earth on a particular day. 269 246 388 354 266 303 295 259 274 249 271 254 Generate N = 6 bootstrap samples of size 12 each and find the bootstrap mean and standard deviation (standard error).
16
Let ^ = ^(X1, . . . , Xn) be a sample statistic that estimates of the parameter of
an unknown distribution F using some procedure. We wish to estimate the standard error of ^ using the bootstrap procedure, which is summarized next.
17
The accuracy of the bootstrap approximation depends on
the accuracy of F^ as an estimate of F and how large a bootstrap sample is used to estimate the standard error of ^.
18
13.3.1 BOOTSTRAP CONFIDENCE INTERVALS
PROCEDURE TO FIND BOOTSTRAP CONFIDENCE INTERVAL FOR THE MEAN 1. Draw N (sub)samples (N will be in the hundreds, and if the software allows, in the thousands) from the original sample with replacement. 2. For each of the (sub)samples, find the (sub)sample mean. 3. Arrange these (sub)sample means in order of magnitude. 4. To obtain, say, a 95% confidence interval, we will find the middle 95% of the sample means. For this, find the means at the 2.5% and 97.5% quartile. The 2.5th percentile will be at the position (0.025)(N+1), and the 97.5th percentile will be at the position (0.975)(N+1). If any of these numbers are not integers, round to the nearest integer. The values of these positions are the lower and upper limits of the 95% bootstrap interval for the true mean.
19
EXAMPLE For the data given in Example , obtain a 95% bootstrap confidence interval for . Because the bootstrap methods are more in tune with nonparametric methods, sometimes it makes sense to obtain a confidence interval about the median rather than the mean. With a slight modification of the procedure that we have described for the bootstrap confidence interval for the mean, we can obtain the bootstrap confidence interval for the median. Comparing the classical confidence interval we obtained in Example 5.5.9, which is (257.81, ), the bootstrap confidence interval of Example has smaller length, and thus less variability. In addition, we saw in Example that the normality assumption necessary for the confidence interval there was suspect. In the bootstrap method, we did not have any distributional assumptions. classical confidence interval bootstrap confidence interval Variability Higher Lower Distributional assumptions Yes No
21
PROCEDURE TO FIND BOOTSTRAP CONFIDENCE INTERVAL FOR THE MEDIAN
1. Draw N samples (N will be in the hundreds, and if the software allows, in the thousands) from the original sample with replacement. 2. For each of the samples, find the sample median. 3. Arrange these sample medians in order of magnitude. 4. To obtain, say, a 95% confidence interval we will find the middle 95% of the sample medians. For this, find the medians at the 2.5% and 97.5% quartile. The 2.5th percentile will be at the position (0.025)(N+1), and the 97.5th percentile will be at the position (0.975)(N+1). If any of these numbers are not integers, round to the nearest integer. The values of these positions are the lower and upper limits of the 95% bootstrap interval for the median.
22
In practice, how many bootstrap samples should be taken?
The answer depends on two things: 1) how much the result matters, and 2) what type of computing power is available. In general, it is better to start with 1000 subsamples.
23
13.4 The Expectation Maximization Algorithm
28
13.5 Introduction to Markov Chain Monte Carlo
a brief introduction to Markov chain Monte Carlo (MCMC) Methods MCMC : a computational simulation method MCMC : enormously useful for realistic statistical modeling MCMC : uses two standard algorithms: the Metropolis algorithm, and the Gibbs sampler MCMC : to generate random variables having certain target distributions with pdf (x) MCMC : especially useful when the functional form of (x) is not known MCMC : (basic idea : ) find a Markov chain with a stationary distribution that is the same as the desired probability distribution (x) The probability that the chain is in state x ~= the probability that the discrete random variable equals x
29
The objective of MCMC techniques is to generate random variables having certain
distributions called target distributions with pdf p(x). The simulation of standard distributions is readily available in many statistical software packages, such as Minitab. In cases where the functional form of p(x) is not known, MCMC techniques become very useful. The basic idea of MCMC methods is to find a Markov chain with a stationary distribution that is the same as the desired probability distribution p(x); this is the target distribution. Run the Markov chain for a long time (say, K iterations) and observe in which state the chain is after these K iterations. The probability that the chain is in state x will be approximately the same as the probability that the discrete random variable equals x. In Bayesian analysis, whether we are
31
METROPOLIS ALGORITHM
34
13.5.2 THE METROPOLIS-HASTINGS ALGORITHM
37
GIBBS ALGORITHM
40
MCMC ISSUES
44
13.6 Chapter Summary
45
13.7 Computer Examples
46
Projects for Chapter 13
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.