STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.

Slides:



Advertisements
Similar presentations
Introduction Simple Random Sampling Stratified Random Sampling
Advertisements

Review bootstrap and permutation
Inferences based on TWO samples
Hypothesis testing and confidence intervals by resampling by J. Kárász.
Plausible values and Plausibility Range 1. Prevalence of FSWs in some west African Countries 2 0.1% 4.3%
Chap 10: Summarizing Data 10.1: INTRO: Univariate/multivariate data (random samples or batches) can be described using procedures to reveal their structures.
April 21, 2010 STAT 950 Chris Wichman. Motivation Every ten years, the U.S. government conducts a population census, and every five years the U. S. National.
3 pivot quantities on which to base bootstrap confidence intervals Note that the first has a t(n-1) distribution when sampling from a normal population.
Stat 301 – Day 36 Bootstrapping (4.5). Last Time – CI for Odds Ratio Often the parameter of interest is the population odds ratio,   Especially with.
Chapter 17 Additional Topics in Sampling
Why sample? Diversity in populations Practicality and cost.
Chapter 8 Estimation: Single Population
Chapter 14 Simulation. Monte Carlo Process Statistical Analysis of Simulation Results Verification of the Simulation Model Computer Simulation with Excel.
Bootstrapping LING 572 Fei Xia 1/31/06.
STAT 4060 Design and Analysis of Surveys Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%
Quiz 6 Confidence intervals z Distribution t Distribution.
Bootstrapping applied to t-tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Sampling: Theory and Methods
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
QBM117 Business Statistics Estimating the population mean , when the population variance  2, is known.
Introduction to Statistical Inference Chapter 11 Announcement: Read chapter 12 to page 299.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, June 2011 Paul Kershaw University.
Chapter 4 Statistics. 4.1 – What is Statistics? Definition Data are observed values of random variables. The field of statistics is a collection.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007.
© Copyright McGraw-Hill 2000
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.
Chapter 8: Confidence Intervals based on a Single Sample
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 6-4 Sampling Distributions and Estimators.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
From Wikipedia: “Parametric statistics is a branch of statistics that assumes (that) data come from a type of probability distribution and makes inferences.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Modern Approaches The Bootstrap with Inferential Example.
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
Estimating standard error using bootstrap
Confidence Intervals Cont.
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Application of the Bootstrap Estimating a Population Mean
Sampling Why use sampling? Terms and definitions
Confidence Interval Estimation
CHAPTER 10 Comparing Two Populations or Groups
PSIE Pasca Sarjana Unsri
CHAPTER 10 Comparing Two Populations or Groups
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Ch13 Empirical Methods.
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Simulation Berlin Chen
CHAPTER 10 Comparing Two Populations or Groups
Bootstrapping and Bootstrapping Regression Models
Introductory Statistics
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor

Histograms of Complex Population Distribution

Histograms of Population Sampling Distribution of the Median and Estimated Bootstrap Sampling Distributions

What is a Bootstrap A method of Resampling: creating many samples from a single sample Generally, resampling is done with replacement Used to develop a sampling distribution of statistics such as mean, median, proportion, others.

The Bootstrap and Complex Surveys Number of bootstrap samples – n = sample size, N = population size – Possible resamples n n (example n=200, =1.6x ) Too many possibilities N!/[n!(N-n)!], limit to B a large number, (example = 1000) - the Monte Carlo approximation Determine sampling distribution with parameters Calculate variance in the normal way

Advantages and Disadvantages Advantages: – Avoids the costs of taking new samples (Estimate a sampling distribution when only one sample is available) – Checking parametric assumptions – Used when parametric assumptions cannot be made or are very complicated – Estimation of variance in quantiles Disadvantages: – Relies on a representative sample – Variability due to finite replications (Monte Carlo)

Computations With more computing power available, bootstrap is possible for a large number of resamples Possible programs: – Matlab – Minitab – SAS – Excel – S-Plus – SPSS – Fathom

Bootstrap using SURVEY program Main parameter of interest is the median price that all households in Lockhart City are wiling to pay for cable. The price that a household is willing to pay for cable is positively correlated with average-district house value. Districts in Lockhart City are divided into strata based on average house value. Estimate the variance and create 95% CI

Lockhart City Strata Characteristics: Take a stratified random sample of size 200 using proportional allocation. Using the stratified random sample, implement the general bootstrap procedure, BWO, and mirror- match.

Variations of the Bootstrap in Strata General Bootstrap – Mimic the original sampling method BWO: Bootstrap Without Replacement – Grow the sample to the size of the population Mirror-Match – Repeated miniature resamples

BWO: Bootstrap Without Replacement Grow the sample to the size of the population For each stratum L, create a pseudo- population by replicating the sample k L times. Resample n’ L units from each stratum without replacement to obtain a single bootstrap sample for stratum L. Repeat a large number of times

BWO: Variable Definitions

Disadvantages of extended BWO N L must be known n’ L and k L are often non-integers Must bracket between integers if n’ L and k L are non-integer Computing time

Mirror-Match Repeated miniature resamples Resample size is determined to match the proportion of the original sample size to the population sample size (n L /N L ). Using the resample size n’ L, we resample n’ L units (SRSWOR) from each stratum L. Repeat previous step k L times with replacement to obtain a single bootstrap sample for stratum L. Repeat a large number times

Mirror-Match: Variable Definitions

Mirror Match: Disadvantages N L must be known k L is often non-integer Must bracket between integers when k L is non-integer Computing time

Estimation of the Population Sampling Distributions 100,000 independent stratified random samples. Medians computed and plotted to form empirical sampling distributions. Variables: house value, cable price, and TV hours.

Estimation of the Population Sampling Distributions

Simulations Matlab code: General, BWO, and Mirror-match. Two independent stratified random samples from Lockhart City. Comparison of the sample bootstrap sampling distributions with the population sampling distributions. 95% confidence intervals were determined bootstrap 2.5 and 97.5 percentiles.

Sampling Distributions 1

Sampling Distributions 2

Confidence Intervals

The Empirical verses the Bootstrap Sampling Distributions Bootstrap sampling distributions are expected to mimic actual sampling distributions. Bootstrap sampling is sensitive to individual samples. The shape of bootstrap sampling distributions may vary, but the statistic of interest and its variance are considered accurate.

Comparison of Bootstrap Methods

Empirical Coverages The empirical coverages were close to the expected 95%. They differed very little between the different bootstrap procedures.

Empirical Coverages Empirical coverages are dependent on the type of confidence interval that was originally selected. Our confidence intervals were calculated from the 2.5 and 97.5 percentiles of each bootstrap distribution. There are many different types of bootstrap confidence intervals. The one we selected, although intuitive in design, is considered generally biased (Bedrick 2006).

Computer Processing Times Computer processing times varied greatly. Mean processing time per sample in seconds.

Computer Processing Times BWO took 381 times as long as general bootstrapping procedures. Mirror-match took 293 times as long as general bootstrapping procedures. For our study, the BWO and mirror-match conferred no advantage over general bootstrapping with regard to statistical estimates. However, their vastly greater processing times are a great disadvantage.

CONCLUSIONS: General Bootstrap verses BWO and Mirror-Match BWO and Mirror-match procedures are designed to mimic complex sampling designs. We only analyzed stratified samples of 200 from a fictitious city. BWO and Mirror-match methods may be advantageous in other complex sampling scenarios.