Neuroinformatics 1.1: the bootstrap

Slides:



Advertisements
Similar presentations
Inference in the Simple Regression Model
Advertisements

Review bootstrap and permutation
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
"I was still a couple of miles above the clouds when it broke, and with such violence I fell to the ground that I found myself stunned, and in a hole nine.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Neuroinformatics 1: review of statistics Kenneth D. Harris UCL, 28/1/15.
Probability & Statistical Inference Lecture 7 MSc in Computing (Data Analytics)
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter Seventeen HYPOTHESIS TESTING
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Behavioural Science II Week 1, Semester 2, 2002
Sample size computations Petter Mostad
BCOR 1020 Business Statistics Lecture 22 – April 10, 2008.
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Bootstrapping LING 572 Fei Xia 1/31/06.
8-2 Basics of Hypothesis Testing
Chapter 6: Introduction to Formal Statistical Inference November 19, 2008.
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Standard Error of the Mean
AM Recitation 2/10/11.
Lecture Slides Elementary Statistics Twelfth Edition
Overview Definition Hypothesis
1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©
Hypothesis Tests In statistics a hypothesis is a statement that something is true. Selecting the population parameter being tested (mean, proportion, variance,
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Biostatistics IV An introduction to bootstrap. 2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many.
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
1 When we free ourselves of desire, we will know serenity and freedom.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Information Technology and Decision Making Information Technology and Decision Making Example 10.1 Experimenting with a New Pizza Style at the Pepperoni.
Neuroinformatics 18: the bootstrap Kenneth D. Harris UCL, 5/8/15.
1 When we free ourselves of desire, we will know serenity and freedom.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Testing the Differences between Means Statistics for Political Science Levin and Fox Chapter Seven 1.
© Copyright McGraw-Hill 2004
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Confirmatory analysis for multiple spike trains Kenneth D. Harris 29/7/15.
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Review Statistical inference and test of significance.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Lecture Slides Elementary Statistics Twelfth Edition
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
More on Inference.
Two-Sample Hypothesis Testing
3. The X and Y samples are independent of one another.
Chapter 4. Inference about Process Quality
Sampling distribution
Testing Hypotheses about Proportions
CHAPTER 9 Testing a Claim
When we free ourselves of desire,
Chapter 9 Hypothesis Testing.
More on Inference.
BOOTSTRAPPING: LEARNING FROM THE SAMPLE

CHAPTER 9 Testing a Claim
Hypothesis Testing.
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Presentation transcript:

Neuroinformatics 1.1: the bootstrap Kenneth D. Harris UCL, 1/9/18

Types of data analysis Exploratory analysis Confirmatory analysis Graphical Interactive Aimed at formulating hypotheses No rules – whatever helps you find a hypothesis Confirmatory analysis For testing hypotheses once they have been formulated Several frameworks for testing hypotheses Rules need to be followed

Confidence interval Probability distribution characterized by parameter 𝜃 𝑝(𝐱;𝜃) Classical statistics: 𝐱 is random, but 𝜃 is not. 𝜃 has a true value, which we don’t know. We don’t want to make incorrect statements more than 5% of the time. Confidence interval: from data 𝐱, compute an interval 𝜃 𝑙 (𝐱), 𝜃 𝑢 (𝐱) . 𝜃 𝑙 𝐱 <𝜃< 𝜃 𝑢 (𝐱) with 95% probability (whatever the actual value of 𝜃).

How to compute a confidence interval Most often: Assume that 𝑝(𝐱;𝜃) is a known distribution family (e.g. Gaussian, Poisson) Look up formula for confidence interval in a textbook, or use standard software Assumptions: Your assumed distribution is appropriate (Often) the sample is sufficiently large

The bootstrap An alternative way to compute confidence intervals, that does not require an assumption for the form of 𝑝 𝐱;𝜃 . “… I found myself stunned, and in a hole nine fathoms under the grass, when I recovered, hardly knowing how to get out again.  Looking down, I observed that I had on a pair of boots with exceptionally sturdy straps. Grasping them firmly, I pulled with all my might. Soon I had hoist myself to the top and stepped out on terra firma without further ado.” - Singular Travels, Campaigns and Adventures of Baron Munchausen, ed. J. Carswell, 1948

Use the bootstrap with caution It looks simple, but… There are many subtly different variants of the bootstrap Different variants work in different situations Often they you false-positive errors (without warning) Like Baron Munchausen’s way of getting out of a hole, the bootstrap is not guaranteed to work in all circumstances.

Bootstrap resampling Original sample 𝐱 1 , 𝐱 2 , … 𝐱 𝑛 . Resample with replacement: choose 𝑛 random integers 𝑖 1 , 𝑖 2 ,… 𝑖 𝑛 between 1 and 𝑛, create resampled data set 𝐱 𝑖 1 , 𝐱 𝑖 2 , … 𝐱 𝑖 𝑛 . For example 𝐱 1 , 𝐱 2 , 𝐱 3 , 𝐱 4 , 𝐱 5 → 𝐱 2 , 𝐱 2 , 𝐱 4 , 𝐱 4 , 𝐱 5

Simplest method “Percentile bootstrap” Given estimator 𝜃 of parameter 𝜃 E.g. sample mean, sample variance, etc. Make 𝐵 bootstrap resamples. (At least several thousand) Compute confidence interval as 2.5th and 97.5th percentiles of distribution of 𝜃 computed from these resamplings.

An example … of why you have to be careful. We observe a set of angles 𝜃 𝑖 . Are they drawn from a uniform distribution? Naïve application of bootstrap to compute confidence interval for vector strength Gives incorrect result with 100% probability

Circular mean Treat angles as points on a circle 𝑧= 𝑒 𝑖 𝜃 𝑧 =𝑅 𝑒 𝑖 𝜃 𝜃 R Treat angles as points on a circle The mean of these gives you Circular mean 𝜃 Vector strength 𝑅 If all angles are the same: 𝜃 is this angle 𝑅 is 1 If angles are completely uniform 𝑅 is 0 𝜃 is meaningless.

Bootstrap resamples of vector strength 𝑒 𝑖𝜃 Circular mean Bootstrap resamples 95% confidence interval The actual vector strength was zero There is a 0% chance that this will fall within the bootstrap confidence interval

Why did it go wrong? Vector strength is a biased statistic The bias gets worse the smaller the sample size Bootstrapping makes the equivalent sample size even smaller There are variants of the bootstrap that make this kind of mistake less often, but you need to know exactly when to use which version.

Bootstrap vs. permutation test Permutation test: is the observed statistic in the null distribution? Bootstrap: is the null value in the bootstrap distribution? 95% interval for null distribution Observed statistic 95% interval of bootstrap distribution Null value

When to use the bootstrap When you can’t use a traditional method (e.g. permutation test) When you actually understand the conditions for a particular bootstrap variant to give valid results When you can prove these conditions hold in your circumstance

When NOT to use the bootstrap When you tried a traditional test, but it gave you p>0.05