Techniques for the Computing-Capable Statistician

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.3 Estimating a Population mean µ (σ known) Objective Find the confidence.
Chapter 6 Introduction to Sampling Distributions
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Bootstrapping LING 572 Fei Xia 1/31/06.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
8-1 Introduction In the previous chapter we illustrated how a parameter can be estimated from sample data. However, it is important to understand how.
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Statistical Intervals Based on a Single Sample.
Standard error of estimate & Confidence interval.
Chapter 7 Estimation: Single Population
Estimation Basic Concepts & Estimation of Proportions
Empirical Research Methods in Computer Science Lecture 2, Part 1 October 19, 2005 Noah Smith.
Topic 5 Statistical inference: point and interval estimate
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Biostatistics IV An introduction to bootstrap. 2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Estimating and Constructing Confidence Intervals.
Slide 1 © 2002 McGraw-Hill Australia, PPTs t/a Introductory Mathematics & Statistics for Business 4e by John S. Croucher 1 n Learning Objectives –Identify.
Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.
Determination of Sample Size: A Review of Statistical Theory
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
+ “Statisticians use a confidence interval to describe the amount of uncertainty associated with a sample estimate of a population parameter.”confidence.
§ 5.3 Normal Distributions: Finding Values. Probability and Normal Distributions If a random variable, x, is normally distributed, you can find the probability.
The final exam solutions. Part I, #1, Central limit theorem Let X1,X2, …, Xn be a sequence of i.i.d. random variables each having mean μ and variance.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Nonparametric Methods II 1 Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Analysis of Experimental Data; Introduction
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Section 6.2 Confidence Intervals for the Mean (Small Samples) Larson/Farber 4th ed.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Chapter 8 Interval Estimates For Proportions, Mean Differences And Proportion Differences.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
Quantifying Uncertainty
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Estimating standard error using bootstrap
Sampling and Sampling Distributions
Inference: Conclusion with Confidence
Statistical Estimation
Confidence Intervals and Sample Size
Standard Errors Beside reporting a value of a point estimate we should consider some indication of its precision. For this we usually quote standard error.
Application of the Bootstrap Estimating a Population Mean
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
STATISTICAL INFERENCE
Inference: Conclusion with Confidence
Distribution functions
Sampling Distributions
Parameter, Statistic and Random Samples
Statistics in Applied Science and Technology
Estimates of Bias & The Jackknife
Summarising and presenting data - Univariate analysis continued
SA3202 Statistical Methods for Social Sciences
Random Sampling Population Random sample: Statistics Point estimate
Quantifying uncertainty using the bootstrap
CONCEPTS OF ESTIMATION
Bootstrap - Example Suppose we have an estimator of a parameter and we want to express its accuracy by its standard error but its sampling distribution.
ESTIMATION
Regression Models - Introduction
Sampling Distribution
Sampling Distribution
Ch13 Empirical Methods.
Chapter 8 Estimation.
Introductory Statistics
Presentation transcript:

Techniques for the Computing-Capable Statistician BOOTSTRAPS Techniques for the Computing-Capable Statistician 8/2/2019 Think hard about statistical properties of estimators.

Think hard about statistical properties of estimators. An Introduction to the Bootstrap Bradley Efron Robert J. Tibshirani Chapman & Hall/CRC Monographs on Statistics and Applied Probability 57 THE BOOK 8/2/2019 Think hard about statistical properties of estimators.

WHEN PROBABILITY THEORY WORKS... ...it works very well. sums of iid random variables ~Normal min of iid random variables ~Exponential sums of standard Normal^2 ~c2 ratios of c2 ~F See handout on transforms, etc. 8/2/2019 Think hard about statistical properties of estimators.

Think hard about statistical properties of estimators. STATISTICS Estimate a quantity q-hat estimates q Predict the variability of the estimate involves predicting the distribution (form, parameters) of the estimator q-hat X-bar and s-hat have known distributions X-bar ~ Normal s2-hat ~ c2 8/2/2019 Think hard about statistical properties of estimators.

Think hard about statistical properties of estimators. OTHER STATISTICS Median (an example of) quartile estimates order statistics Ratios, Transforms, non-polynomial Functions None of these have known distributions How can you assess the variability of an estimator? 8/2/2019 Think hard about statistical properties of estimators.

PRELIMINARY DEFINITION AND NOTATION Given samples X1, X2 , ... , Xn X(i) is the i-th smallest sample and is called the i-th order statistic Xa = X(i) such that an ~ i is called the a-th p-tile 8/2/2019 Think hard about statistical properties of estimators.

Think hard about statistical properties of estimators. EXAMPLE X(1) = 10 X(2) = 11 X(3) = 11.2 X(4) = 11.6 X0.025 = X(25) X(997) = 12.8 X(998) = 12.9 X(999) = 13.0 X(1000) = 13.9 X0.975 = X(975) 8/2/2019 Think hard about statistical properties of estimators.

EMPIRICAL CONFIDENCE INTERVAL Empirical confidence interval for X is (X0.025 , X0.975) = (X(25) , X(975) ) for X, not the mean or median, etc. Can use all 1000 samples to estimate the median M = X(500) = 11.9 NO predictive value How accurate is this estimate? 8/2/2019 Think hard about statistical properties of estimators.

Think hard about statistical properties of estimators. MORE VENACULAR Call F the underlying distribution of the phenomenon being studied F(x) = P(X <= x) Call F-hat the empirical (observed example) distribution of F F-hat = {X1, X2 , ... , Xn} weighted 1/n each BOOTSTRAPPING: Use F-hat as a sampling surrogate for F don’t oversell resulting reliability of estimates 8/2/2019 Think hard about statistical properties of estimators.

Think hard about statistical properties of estimators. SMOOTHED F-HAT 8/2/2019 Think hard about statistical properties of estimators.

Think hard about statistical properties of estimators. BOOTSTRAPPING Given samples F-hat = {X1, X2 , ... , Xn} b-th bootstrap sample x*(b) sample n times from X1, X2 , ... , Xn with replacement let m*(b) be the median of the b-th set of samples m*(1), m*(2), ..., m*(B) is a sample of medians 8/2/2019 Think hard about statistical properties of estimators.

THE BASE SAMPLE FORMS THE POPULATION FOR THE BOOTSTRAP SAMPLE BOOTSTRAP WORLD EMPIRICAL F X*1, X*2 , ... , X*n ... REAL WORLD X1, X2 , ... , Xn F BOOTSTRAP estimate of Mbase ‘s distribution Mbase usual estimate 8/2/2019 Think hard about statistical properties of estimators.

Think hard about statistical properties of estimators. KEY EXCEPTION Are m*(1), m*(2), ..., m*(B) independent samples of the median? With respect to F-hat but not with respect to F 8/2/2019 Think hard about statistical properties of estimators.

Think hard about statistical properties of estimators. Mbase has nonparametric confidence interval .....(m*0.025 , m*0.975) Standard error of Mbase estimated as a standard deviation 8/2/2019 Think hard about statistical properties of estimators.

PRACTICAL APPLICATION Bootstrap samples treated as independent B ~ 500 Practical for ANY sample statistic Spreadsheet Bootstrap.xls does an estimate of the Median and IQR (X0.75 - X0.25) for IQ scores 8/2/2019 Think hard about statistical properties of estimators.

IS BOOTSTRAPPING CHEATING? Example: 100 real datapoints, 200 Bootstrap samples statistic M calculated for each Bootstrap sample Standard (non-bootstrap) Error of Mbase is S(M*i – Mbase)2/199 8/2/2019 Think hard about statistical properties of estimators.

IS BOOTSTRAPPING CHEATING? If we had 100 x 200 = 20,000 independent samples One large pool to estimate Mbase Standard Error of M is ~ S(Mi – Mbase)2/(19,999) As the number of bootstrap samples increase, the standard error estimate stabilizes As the number of independent samples increases, the standard error estimate converges to 0! 8/2/2019 Think hard about statistical properties of estimators.

Think hard about statistical properties of estimators. SUMMARY Bootstrapping allows us to estimate the variability of sample statistics where the statistic’s probability distribution is unknown. 8/2/2019 Think hard about statistical properties of estimators.