Neuroinformatics 1: review of statistics Kenneth D. Harris UCL, 28/1/15.

Slides:



Advertisements
Similar presentations
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Advertisements

Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
1 Hypothesis testing. 2 A common aim in many studies is to check whether the data agree with certain predictions. These predictions are hypotheses about.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Analysis of Variance Chapter 3Design & Analysis of Experiments 7E 2009 Montgomery 1.
Chapter 2 Simple Comparative Experiments
Chapter 2Design & Analysis of Experiments 7E 2009 Montgomery 1 Chapter 2 –Basic Statistical Methods Describing sample data –Random samples –Sample mean,
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Major Points Formal Tests of Mean Differences Review of Concepts: Means, Standard Deviations, Standard Errors, Type I errors New Concepts: One and Two.
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Descriptive Statistics
Inferential Statistics
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Basic Statistics. Basics Of Measurement Sampling Distribution of the Mean: The set of all possible means of samples of a given size taken from a population.
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
Slide 23-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Inference for Regression
Inferences for Regression
Choosing and using statistics to test ecological hypotheses
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Introduction to Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright © 2010 Pearson Education, Inc. Slide
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
DOX 6E Montgomery1 Design of Engineering Experiments Part 2 – Basic Statistical Concepts Simple comparative experiments –The hypothesis testing framework.
Neuroinformatics 18: the bootstrap Kenneth D. Harris UCL, 5/8/15.
Simple linear regression Tron Anders Moger
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Summary.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
ANOVA, Regression and Multiple Regression March
Inference About Means Chapter 23. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it’d be nice.
Confirmatory analysis for multiple spike trains Kenneth D. Harris 29/7/15.
The Idea of the Statistical Test. A statistical test evaluates the "fit" of a hypothesis to a sample.
Model adequacy checking in the ANOVA Checking assumptions is important –Normality –Constant variance –Independence –Have we fit the right model? Later.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Inferential Statistics Psych 231: Research Methods in Psychology.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Chi–squared Tests for Ordinal and Nominal data 1.
Advanced Data Analytics
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Inference for Regression
Chapter 25 Comparing Counts.
Inferential Statistics
Neuroinformatics 1.1: the bootstrap
Paired Samples and Blocks
CHAPTER 12 More About Regression
Chapter 26 Comparing Counts.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Inference for Regression
Presentation transcript:

Neuroinformatics 1: review of statistics Kenneth D. Harris UCL, 28/1/15

Types of data analysis Exploratory analysis Graphical Interactive Aimed at formulating hypotheses No rules – whatever helps you find a hypothesis Confirmatory analysis For testing hypotheses once they have been formulated Several frameworks for testing hypotheses Rules need to be followed In principle, you should collect a new data set for confirmatory analysis (For drug trials, this really matters. For basic research, people usually don’t bother).

Exploratory analysis In low dimensions: Histograms Scatterplots Bar charts In high dimensions: Scatterplot matrix Dimensionality reduction (PCA etc) Cluster analysis Does NOT confirm a hypothesis CAN go into a paper – and should, provided you also do confirmatory analysis

Interactive data exploration with gGobi

Confirmatory analysis We will discuss three types of confirmatory analysis Classical hypothesis test (p-value) Model selection with cross-validation Bayesian inference Most analyses have a natural “summary plot” to go with them For correlation, a scatter plot For ANOVA, a bar chart Ideally, the summary plot makes the hypothesis test obvious

The “illustrative example” Show a single example of the phenomenon you are measuring Pick carefully, because readers will take it far too literally

Building from illustrative example to summary plot Xue, Atallah, Scanziani, Nature 2014

Classical hypothesis testing Null hypothesis What you are trying to disprove Test statistic A number you compute from the data Null distribution The distribution of the test statistic if the null hypothesis is true p-value Probability of getting at least the test statistic you saw, if the null hypothesis is true

T-test

What a p-value is NOT “I have spent a lot of time with reading figure 4 but I am still not convinced how conclusive the effect is. While I totally buy that the probability of the two variables having zero correlations is P=0.008 … ” -Anonymous reviewer, Nature magazine.

What a hypothesis test is NOT Failure to disprove a null hypothesis tells you nothing at all. It does not tell you the null hypothesis is true. Hypothesis tests should not falsely reject the null hypothesis very often (1 time in 20) They never falsely confirm the null hypothesis, because they never confirm the null hypothesis. There is nothing magic about the number.05, it is a convention. Hippocampal pyramidal cells Hirase et al, PNAS 2001

Assumptions made by hypothesis tests Many tests have specific assumptions e.g. Large sample Gaussian distribution Check these on a case-by-case basis This matters most when your p-value is marginal Nearly all tests make one additional, major assumption Independent, Identically Distributed samples (IID) Think carefully whether this holds

Example: correlation of correlations IID assumption violated (even excluding diagonal elements) False positive result for Pearson and Spearman correlation much more than 1 time in 20 (39.4%, 26.2% for chosen parameters). Exercise: simulate this.

Permutation test Fisher, “Mathematics of a Lady Tasting Tea”, 1935 Lehmann & Stein, Ann Math Stat 1949 Hoeffding, Ann Math Stat 1952

Example: correlation of correlations Solution: shuffling method randomly permutes variables in second matrix. (p=0.126 in this example) Test statistic can be Pearson correlation, or whatever you like We will see lots of ways later to shuffle spike trains etc.

Model selection with cross-validation

Example: curve fitting by least squares Which model fits better: a straight line or a curve? Curve appears to win!

Test both models on new validation set Now curve does worse

Cross-validation Repeatedly divide data into training and test sets Fit both models each time, measure fit on test set See which one wins If curve fits better than line, infer that relationship is not actually linear. Formal theory of inference using cross-validation not yet developed (as far as I know)

Bayesian Inference

Advantages Well-developed philosophical and theoretical framework Optimal inference when models are correct Some statisticians really, really like it Allows one to accept as well as reject hypotheses Disadvantages Math can be intractable, requiring long computational approximations Requires defining prior probabilities – sometimes you have no idea Incorrect inferences if models are wrong Unfamiliar to many experimental scientists/reviewers