Application of the Bootstrap Estimating a Population Mean

Slides:



Advertisements
Similar presentations
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Advertisements

Sampling: Final and Initial Sample Size Determination
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
Section 3.4 Bootstrap Confidence Intervals using Percentiles.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.3 Estimating a Population mean µ (σ known) Objective Find the confidence.
ELEC 303 – Random Signals Lecture 18 – Statistics, Confidence Intervals Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 10, 2009.
The Central Limit Theorem
3 pivot quantities on which to base bootstrap confidence intervals Note that the first has a t(n-1) distribution when sampling from a normal population.
Ch 6 Introduction to Formal Statistical Inference.
Relationship Between Sample Data and Population Values You will encounter many situations in business where a sample will be taken from a population, and.
8 Statistical Intervals for a Single Sample CHAPTER OUTLINE
Bootstrapping LING 572 Fei Xia 1/31/06.
2.3. Measures of Dispersion (Variation):
Introduction to Probability and Statistics Chapter 7 Sampling Distributions.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
© 2004 Prentice-Hall, Inc.Chap 8-1 Basic Business Statistics (9 th Edition) Chapter 8 Confidence Interval Estimation.
8-1 Introduction In the previous chapter we illustrated how a parameter can be estimated from sample data. However, it is important to understand how.
Quiz 6 Confidence intervals z Distribution t Distribution.
QM-1/2011/Estimation Page 1 Quantitative Methods Estimation.
Standard error of estimate & Confidence interval.
STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Topic 5 Statistical inference: point and interval estimate
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
1 SAMPLE MEAN and its distribution. 2 CENTRAL LIMIT THEOREM: If sufficiently large sample is taken from population with any distribution with mean  and.
Interval Estimation for Means Notes of STAT6205 by Dr. Fan.
Ch 6 Introduction to Formal Statistical Inference
Active Learning Lecture Slides For use with Classroom Response Systems Statistical Inference: Confidence Intervals.
Sampling Error.  When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Statistics 1: Introduction to Probability and Statistics Section 3-2.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Introduction to Inference Sampling Distributions.
CHAPTER 2: Basic Summary Statistics
Lecture 4 Confidence Intervals. Lecture Summary Last lecture, we talked about summary statistics and how “good” they were in estimating the parameters.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
ESTIMATION OF THE MEAN. 2 INTRO :: ESTIMATION Definition The assignment of plausible value(s) to a population parameter based on a value of a sample statistic.
Quantifying Uncertainty
Bootstrapping James G. Anderson, Ph.D. Purdue University.
© 2001 Prentice-Hall, Inc.Chap 8-1 BA 201 Lecture 12 Confidence Interval Estimation.
Active Learning Lecture Slides For use with Classroom Response Systems
Confidence Intervals Cont.
Where Are You? Children Adults.
CHAPTER 10 Comparing Two Populations or Groups
Summary of t-Test for Testing a Single Population Mean (m)
Chapter 7 Review.
Confidence Interval Estimation
Sampling distribution
When we free ourselves of desire,
Sampling Distribution
Sampling Distribution
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Confidence Intervals Chapter 10 Section 1.
QQ Plot Quantile to Quantile Plot Quantile: QQ Plot:
Ch13 Empirical Methods.
Section 7.7 Introduction to Inference
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Tutorial 9 Suppose that a random sample of size 10 is drawn from a normal distribution with mean 10 and variance 4. Find the following probabilities:
CHAPTER 10 Comparing Two Populations or Groups
SP 225 Lecture 11 Confidence Intervals.
Sampling Distribution Models
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 2: Basic Summary Statistics
Advanced Algebra Unit 1 Vocabulary
Sampling Distributions
2.3. Measures of Dispersion (Variation):
Techniques for the Computing-Capable Statistician
Introductory Statistics
How Confident Are You?.
Presentation transcript:

Application of the Bootstrap Estimating a Population Mean Movie Average Shot Lengths Sources: Barry Sands’ Average Shot Length Movie Database L. Chihara and T. Hesterberg (2011). Mathematical Statistics with Resampling and R. Wiley, Hoboken, NJ.

Data Description Average Shot Length (seconds) for a population of 11001 films (Barry Sands’ movie database) Very highly right-skewed population. Min=1.330 LQ=4.510 Median=6.400 UQ=8.910 Max=1000 m = 7.739 s = 12.765 Coefficient of Variation: CV=100(12.765/7.739)=164.94% Goal: Small sample estimation of m with unknown small-sample sampling distribution of sample mean (in terms of shape)

Introduction to the Bootstrap Makes use of a sample from a population to estimate the sampling distribution of a statistic/estimator. Treats the sample as an “estimate” of the population of measurements (sample empirical cumulative distribution function as estimate of population cdf)

Population and Sample Empirical CDF’s (sample size: n=25)

Applying the Bootstrap Obtain a random sample of size n from the population Determine the estimator(s) of interest Compute the estimate(s) based on the sample: Determine B, the number of bootstrap samples to be taken Obtain B random samples of size n from the original sample with replacement Compute the estimate for each bootstrap sample: The bootstrap distribution is the collection of estimates The bootstrap standard error is the standard deviation of the estimates

Properties of the Bootstrap Sampling Distribution Center: The center of the bootstrap sampling distribution is the estimate based on the full sample, not the population parameter it is estimating Spread: The spread is representative of the spread of the estimator’s sampling distribution Bias: Represents the difference between the center of the bootstrap sampling distribution and the true parameter the estimator is used for. The bootstrap bias estimate is accurate for the true bias. Skewness: Skewness in bootstrap sampling distribution is representative of the skewness of the estimator’s sampling distribution

Example – Movie Average Shot Lengths (ASL) Interested in approximating the sampling distributions of the sample mean. Population value: m = 7.739 (Pseudo) Random sample of n=25 films’ ASLs: 4.40 14.98 7.80 9.50 9.50 6.70 7.50 9.20 3.70 8.04 4.47 9.40 8.40 8.88 5.50 16.30 6.70 3.65 4.27 11.60 9.30 3.40 2.90 12.00 16.60

Bootstrap Samples Taking B=10000 bootstrap samples from the original samples. Summaries for original sample, mean, sd, CV: > summary(ASL.sample1) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.900 4.470 8.040 8.188 9.500 16.600 > summary(ASL.mean) 5.560 7.666 8.182 8.190 8.687 11.100 > summary(ASL.sd) 1.916 3.423 3.800 3.772 4.137 5.494 > summary(ASL.CV) 26.19 42.28 46.19 46.16 50.13 67.20

Bootstrap Standard Error and Sampling Distribution In terms of the sampling distribution of the sample mean: Mean of bootstrap sample means: 8.1899 (Close to original sample mean (8.1876), not so close to population mean (7.7394). Bootstrap estimate of bias: 8.1899-8.1876=0.0023. Bootstrap standard error: Standard deviation of the 10000 bootstrap sample means: 0.7620. Bias/BSE=.0023/.7620=.0030 (0.30%) Bootstrap 95-percentile interval: (.025,.975) quantiles of the bootstrap mean sampling distribution: (6.7444,9.7113) which does include the population mean (7.739) Note: Interval is of the following form (reflecting an asymmetric bootstrap sampling distribution:

Bootstrap t Confidence Interval for m

ASL Example

Comparison of 3 Methods – 95% CI for m Repeat methods described previously, based on each of M=1000 random samples from the original population. Obtain empirical coverage rates for each method based on the M=1000 random samples, with B=1000 bootstrap samples per random sample of n=25. Method 1: (t-interval based on normality assumption): Coverage Probability: .869 Average width: 5.05 seconds Method 2: Bootstrap Percentile Interval: Coverage Probability: .849 Average width: 4.40 seconds Method 3: Bootstrap t Confidence Interval: Coverage Probability: .903 Average width: 22.23 seconds