Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.

Slides:



Advertisements
Similar presentations
Hypothesis testing and confidence intervals by resampling by J. Kárász.
Advertisements

Happiness comes not from material wealth but less desire. 1.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Sampling: Final and Initial Sample Size Determination
Business Statistics - QBM117 Selecting the sample size.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Stat 301 – Day 36 Bootstrapping (4.5). Last Time – CI for Odds Ratio Often the parameter of interest is the population odds ratio,   Especially with.
Resampling techniques
PSY 1950 Nonparametric Statistics November 24, 2008.
Statistical Concepts (continued) Concepts to cover or review today: –Population parameter –Sample statistics –Mean –Standard deviation –Coefficient of.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Bootstrapping LING 572 Fei Xia 1/31/06.
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
Standard error of estimate & Confidence interval.
STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Chapter 9 Two-Sample Tests Part II: Introduction to Hypothesis Testing Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social & Behavioral.
1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal.
Statistical Inference: Which Statistical Test To Use? Pınar Ay, MD, MPH Marmara University School of Medicine Department of Public Health
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Model Building III – Remedial Measures KNNL – Chapter 11.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, June 2011 Paul Kershaw University.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Interval Estimation for Means Notes of STAT6205 by Dr. Fan.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 03/10/2015 6:40 PM Final project: submission Wed Dec 15 th,2004.
PARAMETRIC STATISTICAL INFERENCE
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Resampling techniques
Chapter 10: Basics of Confidence Intervals
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
BIOL 4240 Field Ecology. How many? “Estimates of abundance themselves are not valuable, and a large book filled with estimates of the abundance of every.
Confidence intervals. Estimation and uncertainty Theoretical distributions require input parameters. For example, the weight of male students in NUS follows.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Case Selection and Resampling Lucila Ohno-Machado HST951.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Nonparametric Tests with Ordinal Data Chapter 18.
Lecture 4 Confidence Intervals. Lecture Summary Last lecture, we talked about summary statistics and how “good” they were in estimating the parameters.
1 Probability and Statistics Confidence Intervals.
Module 25: Confidence Intervals and Hypothesis Tests for Variances for One Sample This module discusses confidence intervals and hypothesis tests.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Notes on Bootstrapping Jeff Witmer 10 February 2016.
Confidence Intervals Dr. Amjad El-Shanti MD, PMH,Dr PH University of Palestine 2016.
Independent Samples: Comparing Means Lecture 39 Section 11.4 Fri, Apr 1, 2005.
Estimating standard error using bootstrap
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
More on Inference.
Standard Errors Beside reporting a value of a point estimate we should consider some indication of its precision. For this we usually quote standard error.
Confidence Intervals with Means
Analysis of Data Graphics Quantitative data
When we free ourselves of desire,
Estimates of Bias & The Jackknife
SA3202 Statistical Methods for Social Sciences
More on Inference.
Bootstrap - Example Suppose we have an estimator of a parameter and we want to express its accuracy by its standard error but its sampling distribution.
ESTIMATION
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Categorical Data Analysis Review for Final
Bootstrapping Jackknifing
Bootstrap and randomization methods
Techniques for the Computing-Capable Statistician
Bootstrapping and Bootstrapping Regression Models
Presentation transcript:

Bootstraps and Jackknives Hal Whitehead BIOL4062/5062

Confidence in estimators Why use bootstraps or jackknives? The jackknife The parametric bootstrap The non-parametric bootstrap – (“The bootstrap”)

Estimation without confidence (standard error, confidence interval) has little value

Confidence in estimates: Traditional approach DATA Biological model Estimator Statistical (Statistic) model Confidence in estimator ?

Confidence in estimates: Traditional approach e.g. What is sex ratio of vole population? Trap: 12 males 15 females Estimate ratio 12/(12+15)=0.444 Using binomial distribution: SE =  [0.444x( )/(12+15)]=0.096 So: Sex ratio is estimated to be (SE 0.096)

e.g. Asymmetry of size among nestlings in nests of 6 Measure: difference between size of nestling and its most similar neighbour { } => [ ] = 0.58 But what confidence have we in this?

Confidence in estimator: Mean distance between animals In a small population: what is the expected distance between any two animals? Estimate is: mean of distances between all pairs of animals What is confidence in this estimate? no easy formula (lack of independence)

Use Bootstraps and Jackknives when: No clear biological model Deriving statistical model –very difficult, impossible, or tedious Statistical model too complicated to be useful Model may not be quite valid Accurate measure of precision under statistical model only possible with large n

The Jackknife Data D = {X 1, X 2, X 3,....,X n } => statistic s Jackknife replicates miss out units (or groups of units) in turn: –J1 = X 2, X 3,....,X n => statistic s -1 (missing unit 1) –J2 = X 1, X 3,....,X n => statistic s -2 (missing unit 2) –etc. Convert into pseudovalues: –φ 1 = n ⋅ s - (n-1)s -1 –φ 2 = n ⋅ s - (n-1)s -2 –etc.

The Jackknife The Jackknifed Estimate of s is then: –s J = mean(φ 1,...,φ n ) SE(s) = SE(φ 1,...,φ n )

The Jackknife Jackknifed Estimate removes bias Jackknife SE “rough and ready” –usually “conservative” (overestimates SE) Jackknife on blocks of units, if data not independent Assumes normality for confidence intervals

Correlation between gill weight and body weight in 12 crabs Jackknife r = [Mean φ i ] SE [SD( φ i )/  12)] r = Gill(mg) Body(g) r -i φ i

Bootstraps

Parametric Bootstrap Assume Data produced by Model with some Parameters unknown, which need to be estimated: –Model => Data => Parameter estimates (s) The Bootstrap process: –Model + Parameter estimates (s) => Random data => Bootstrap replicate estimates (s*) Distribution of Bootstrap replicate estimates (s*s) give distribution, confidence intervals and standard errors of s (plus indicator of bias) Usually use ,000 bootstrap replicates

Parametric Bootstrap–an example Mark-Recapture Estimate Mark 25 animals Recapture 46 of which 12 Marked What is population size? “Petersen” estimate is 25x46/12=95.8 What is confidence in this estimate, expected bias?

Parametric Bootstrap–an example Mark-Recapture Estimate Mark 25 animals; Recapture 46, 12 Marked “Petersen” estimate is 25x46/12=95.8 What is confidence, expected bias? Parametric Bootstrap Replicates: –96 Animals, mark 25, recapture 46 –How many marked? –From simulation (m s =): –Calculate population estimates (n s = 25x46/m s )

Parametric Bootstrap–an example Mark-Recapture Estimate “Petersen” estimate is 25x46/12=95.8 Bootstrap population estimates (assuming n=96) – Expected Bias: –mean(n s ) - 96= = 3.7 Estimated standard error: –SD(n s ) = 20.4 So population estimate is: 92.1 (SE 20.4)

Parametric Bootstrap–an example Mark-Recapture Estimate

Non-Parametric Bootstrap (A.K.A. “The Bootstrap”) Data D = X 1, X 2, X 3,....,X n => statistic s Bootstrap replicate: –D*1 = X* 1, X* 2, X* 3,....,X* n => statistic s*1 –D*2 = X* 1, X* 2, X* 3,....,X* n => statistic s*2 –... X* 1, X* 2, X* 3,....,X* n are randomly selected with replacement, from X 1, X 2, X 3,....,X n Distribution, confidence interval and SE of s estimated from the distribution, confidence interval and standard error of the s*’s Usually use ,000 bootstrap replicates

Non-Parametric Bootstrap: an example: Median Gill Weight in Crabs Gill weights (in mg): Median = 195mg Median Real Bootstrap replicates: B B B B B B B B B

Non-Parametric Bootstrap: an example: Median Gill Weight in Crabs Gill weights (in mg): Median = 195mg Bootstrap mean(1000 samples) median = 188mg 95% c.i. = mg [b(25) -b(975)]

Bootstraps in Molecular Genetics Calculate tree based on genetic data –(e.g. 20 species and 300 loci) For each bootstrap replicate: –Resample loci with replacement –(20 species with 300 loci, some repeats) –Calculate tree Look at agreement between original and bootstrap trees

Bootstrapped spanning tree Glazko & Nei Mol. Biol. Evol. 2003

Bootstraps “Better” estimate of confidence Variable n Self-comparisons a problem –e.g. Mean of associations Gives SE’s, confidence intervals and profile of confidence Jackknives “Worse” estimate of confidence –Usually conservative underestimates precision Fixed n Self-comparisons not a problem Reduces Bias Only directly gives SE –Confidence intervals need assumption of normality

Bootstraps and Jackknives Give estimates of confidence (and bias) when: –distributions unknown, approximate, or intractable Parametric bootstrap –very useful if model known –needs programming Non-parametric bootstrap –widely applicable (except self-referencing situations) –few assumptions Jackknife –approximate –only standard error given directly –useful when bootstrap not applicable