Today: Quizz 11: review. Last quizz! Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions DEC 8 – 9am FINAL.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Review bootstrap and permutation
Hypothesis testing and confidence intervals by resampling by J. Kárász.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Plausible values and Plausibility Range 1. Prevalence of FSWs in some west African Countries 2 0.1% 4.3%
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Resampling techniques
Bootstrapping in regular graphs
Statistical Concepts (continued) Concepts to cover or review today: –Population parameter –Sample statistics –Mean –Standard deviation –Coefficient of.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Bootstrapping LING 572 Fei Xia 1/31/06.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Central Limit Theorem The Normal Distribution The Standardised Normal.
Standard error of estimate & Confidence interval.
STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.
Bootstrapping applied to t-tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 24 Statistical Inference: Conclusion.
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Statistical Computing
1 CSI5388 Error Estimation: Re-Sampling Approaches.
Empirical Research Methods in Computer Science Lecture 2, Part 1 October 19, 2005 Noah Smith.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, June 2011 Paul Kershaw University.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Biostatistics IV An introduction to bootstrap. 2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
PARAMETRIC STATISTICAL INFERENCE
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
Testing Theories: The Problem of Sampling Error. The problem of sampling error It is often the case—especially when making point predictions—that what.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Resampling techniques
Sampling And Resampling Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2007.
Section 10.1 Confidence Intervals
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.
+ Using StatCrunch to Teach Statistics Using Resampling Techniques Webster West Texas A&M University.
Nonparametric Methods II 1 Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Case Selection and Resampling Lucila Ohno-Machado HST951.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4.
1 Probability and Statistics Confidence Intervals.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Modern Approaches The Bootstrap with Inferential Example.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Bootstrapping and Randomization Techniques Q560: Experimental Methods in Cognitive Science Lecture 15.
Lecture 8 Resampling inference Trevor A. Branch FISH 553 Advanced R School of Aquatic and Fishery Sciences University of Washington.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
Based on “An Introduction to the Bootstrap” (Efron and Tibshirani)
Chapter 3 INTERVAL ESTIMATES
Inference: Conclusion with Confidence
Introduction to estimation: 2 cases
When we free ourselves of desire,
Bootstrap - Example Suppose we have an estimator of a parameter and we want to express its accuracy by its standard error but its sampling distribution.
Ch13 Empirical Methods.
Bootstrapping Jackknifing
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
Presentation transcript:

Today: Quizz 11: review. Last quizz! Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions DEC 8 – 9am FINAL EXAM EN 2007

Resampling

Resampling Introduction We have relied on idealized models of the origins of our data (ε ~N) to make inferences But, these models can be inadequate Resampling techniques allow us to base the analysis of a study solely on the design of that study, rather than on a poorly-fitting model

Fewer assumptions –Ex: resampling methods do not require that distributions be Normal or that sample sizes be large Generality: Resampling methods are remarkably similar for a wide range of statistics and do not require new formulas for every statistic Promote understanding: Boostrap procedures build intuition by providing concrete analogies to theoretical concepts Resampling Why resampling

Resampling Collection of procedures to make statistical inferences without relying on parametric assumptions - bias - variance, measures of error - Parameter estimation - hypothesis testing

Resampling Procedures Permutation (randomization) Bootsrap Jackknife Monte Carlo techniques

Resampling With replacement Without replacement

Resampling permutation

If number of samples in each group=10 And n groups=2  number of permutations= If n groups = 3  number of permutations > billions

Resampling Bootstrap "to pull oneself up by one's bootstraps" The Surprising Adventures of Baron Munchausen, (1781) by Rudolf Erich Raspe Bradley Efron 1979

Resampling Bootstrap Hypothesis testing, parameter estimation, assigning measures of accuracy to sample estimates e.g.: se, CI Useful when: formulas for parameter estimates are based on assumptions that are not met computational formulas only valid for large samples computational formulas do not exist

Resampling Bootstrap Assume that sample is representative of population Approximate the distribution of the population by repeatedly resampling (with replacement) from the sample

Resampling Bootstrap

Non-parametric bootstrap resample observation from original samples Parametric bootstrap fit a particular model to the data and then use this model to produce bootstrap samples

Confidence intervals Non Parametric Bootstrap Large Capelin Small Capelin Other Prey

Confidence intervals Non Parametric Bootstrap 74 lc  %N lc = 48.3% 76 sc  %N sc = 49.6% 3 ot  %N lc = 1.9% What about uncertainty around the point estimates? Bootstrap

Confidence intervals Non Parametric Bootstrap Bootstrap: balls, each with a tag: 76 sc, 74 lc, 3 ot - Draw 153 random samples (with replacement) and record tag - Calculate %N lc * 1, %N sc * 1, %N ot * 1 - Repeat nboot= times (%N lc * 1, %N sc * 1, %N ot * 1 ), (%N lc * 2, %N sc * 2, %N ot * 2 ),…, ( %N lc * nboot, %N sc * nboot, %N ot * nboot ) - sort the %N i * b  UCL = th %N i * b (0.975)  LCL = 1250 th %N i * b (0.025)

Confidence intervals Parametric Bootstrap

Params β α Confidence intervals Parametric Bootstrap

Params β* 1 α* 1 Confidence intervals Parametric Bootstrap

Params β* 2 α* 2 Confidence intervals Parametric Bootstrap

Params β* nboot α* nboot

Params β* 1, β* 2, …,β* nboot Construct Confidence Interval for β 1. Percentile Method 2. Bias-Corrected Percentile Confidence Limits 3. Accelerated Bias-Corrected Percentile Limits 4.Bootstrap-t 5, 6, …., Other methods Confidence intervals Parametric Bootstrap

Bootstrap CI – Percentile Method

Bootstrap Caveats Independence Incomplete data Outliers Cases where small perturbations to the data-generating process produce big swings in the sampling distribution

Resampling Jackknife Tukey 1958 Quenouille 1956 Estimate bias and variance of a statistic Concept: Leave one observation out and recompute statistic

Cross-validation Jackknife Assess the performance of the model How accurately will the model predict a new observation?

Cross-validation Jackknife

Jackknife Bootstrap Differences Both estimate variability of a statistic between subsamples Jackknife provides estimate of the variance of an estimator Bootstrap first estimates the distribution of the estimator. From this distribution, we can estimate the variance Using the same data set: bootstrap results will always be different (slightly) jackknife results will always be identical

Resampling Monte Carlo Stanisław Ulam John von Neumann Mid 1940s “The first thoughts and attempts I made to practice [the Monte Carlo Method] were suggested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than "abstract thinking" might not be to lay it out say one hundred times and simply observe and count the number of successful plays.”

Resampling Monte Carlo Monte Carlo methodS: not just one no clear consensus on how they should be defined Commonality: repeated sampling from populations with known characteristics, i.e. we assume a distribution and create random samples that follow that distribution, then compare our estimated statistic to the distribution of outcomes

Monte Carlo Goal: assess the robustness of constant escapement and constant harvest rate policies with respect to management error for Pacific salmon fisheries

Monte Carlo Spawner-return dynamic model Constant harvest rate Constant escapement Stochastic production variation Management error Average catchCV of catch

Discussion Was the bootstrap example I showed parametric or non-parametric? Could you think an example of the other case? So, what’s the difference between a bootstrap and a Monte Carlo?

74 large capelin 76 small capelin 3 other prey Bootstrap samples: 153 balls, each with a tag: sc, lc, ot Draw 153 random samples and record the tag Repeat times

Discussion In principle both the parametric and the non-parametric bootstrap are special cases of Monte Carlo simulations used for a very specific purpose: estimate some characteristics of the sampling distribution. The idea behind the bootstrap is that the sample is an estimate of the population, so an estimate of the sampling distribution can be obtained by drawing many samples (with replacement) from the observed sample, compute the statistic in each new sample. Monte Carlo simulations are more general: basically it refers to repeatedly creating random data in some way, do something to that random data, and collect some results. This strategy could be used to estimate some quantity, like in the bootstrap, but also to theoretically investigate some general characteristic of an estimator which is hard to derive analytically. In practice it would be pretty safe to presume that whenever someone speaks of a Monte Carlo simulation they are talking about a theoretical investigation, e.g. creating random data with no empirical content what so ever to investigate whether an estimator can recover known characteristics of this random `data', while the (parametric) bootstrap refers to an emprical estimation. The fact that the parametric bootstrap implies a model should not worry you: any empirical estimate is based on a model. Hope this helps, Maarten Maarten L. Buis