Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Independent t -test Features: One Independent Variable Two Groups, or Levels of the Independent Variable Independent Samples (Between-Groups): the two.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Applied statistics Katrin Jaedicke
MARE 250 Dr. Jason Turner Hypothesis Testing II To ASSUME is to make an… Four assumptions for t-test hypothesis testing: 1. Random Samples 2. Independent.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Biol 500: basic statistics
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Today Concepts underlying inferential statistics
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 14: Non-parametric tests Marshall University Genomics.
Assumption of Homoscedasticity
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Power and Sample Size IF IF the null hypothesis H 0 : μ = μ 0 is true, then we should expect a random sample mean to lie in its “acceptance region” with.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Week 9: QUANTITATIVE RESEARCH (3)
Active Learning Lecture Slides
Inferential Statistics: SPSS
Selecting the Correct Statistical Test
Inference for regression - Simple linear regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
Hypothesis testing – mean differences between populations
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Choosing and using statistics to test ecological hypotheses
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
MULTIPLE REGRESSION Using more than one variable to predict another.
TAUCHI – Tampere Unit for Computer-Human Interaction ERIT 2015: Data analysis and interpretation (1 & 2) Hanna Venesvirta Tampere Unit for Computer-Human.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
2 Categorical Variables (frequencies) Testing mean differences of a continuous variable between groups (categorical variable) 2 Continuous Variables 2.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
SPSS Basics and Applications Workshop: Introduction to Statistics Using SPSS.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
Review Hints for Final. Descriptive Statistics: Describing a data set.
Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics ANalysis Of VAriance: ANOVA.
ANOVA: Analysis of Variance.
Experimental Design and Statistics. Scientific Method
1 Statistical Significance Testing. 2 The purpose of Statistical Significance Testing The purpose of Statistical Significance Testing is to answer the.
Experimental Psychology PSY 433 Appendix B Statistics.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly.
Data Analysis.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Non-parametric Tests e.g., Chi-Square. When to use various statistics n Parametric n Interval or ratio data n Name parametric tests we covered Tuesday.
Comparing Two Means Chapter 9. Experiments Simple experiments – One IV that’s categorical (two levels!) – One DV that’s interval/ratio/continuous – For.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.
Principles of statistical testing
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
ANOVA, Regression and Multiple Regression March
Soc 3306a Lecture 7: Inference and Hypothesis Testing T-tests and ANOVA.
Copyright (c) Bani Mallick1 STAT 651 Lecture 8. Copyright (c) Bani Mallick2 Topics in Lecture #8 Sign test for paired comparisons Wilcoxon signed rank.
STATS 10x Revision CONTENT COVERED: CHAPTERS
PART 2 SPSS (the Statistical Package for the Social Sciences)
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Nonparametric Statistics
ANalysis Of VAriance (ANOVA) Used for continuous outcomes with a nominal exposure with three or more categories (groups) Result of test is F statistic.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
Appendix I A Refresher on some Statistical Terms and Tests.
Correlation – Regression
Multiple Regression.
Presentation transcript:

Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :

We are unlikely to finish all the slides Keep them, they may be helpful for your miniproject

Lecture outline Taught component – How to present statistics – Hypothesis testing – Normal distribution Practical component – Plots – Statistical tests – Multiple testing corrections

Taught component

Why are statistics important? Help to make science more repeatable and objective Help you to interpret your results Help you to assess the level of evidence you have supporting a hypothesis A vital skill for a scientific career!

How to report statistics Always report: (1)Statistical software you used (2)Statistical tests you used (3)Significance level you used (4)Sample size I checked these in 17 randomly chosen neuroscience project posters

How not to report statistics! I found that: (1)16/17 didn’t report the statistical software used (2)11/17 didn’t report the statistical tests used (3)9/17 didn’t report the significance level used (4)2/17 didn’t report the sample size!

Commonly used analysis methods Plotting: -Box plots -Line plots Hypothesis testing – T-test – ANOVA – Chi-squared

Hypothesis testing Two types of hypothesis – Null hypothesis (H 0 ) Usually that there are no differences between groups or that two variables are unrelated – Example : (H 0 ) Smoking and lung cancer are unrelated – Alternative hypothesis (H 1 ) There are differences between groups, or that two variables are related – Example : (H 1 ) Smoking and lung cancer are associated

Significance levels You accept the alternative hypothesis if the chance of your data being generated under the null hypothesis (the ‘p-value’) is beneath a pre-specified significance level α – Typically α = 0.05 You should state the significance threshold you use in your report

Multiple hypothesis testing I Suppose you have a significance threshold of α = 0.05 Suppose that you measure 100 variables that are NOT related to a disease You perform 100 hypothesis tests to compare your variables to disease state H 0 : Variable is not affected by disease state H 1 : Variable is affected by disease state For how many variables do you expect to reject the null hypothesis (H 0 ) even though its true?

Multiple hypothesis testing II α = 0.05 means that if the null hypothesis (H 0 ) is true, we would expect to reject it 5% of the time So if H 0 is true and we did 100 tests, we would expect to reject H 0 5 times by chance alone That is bad, these findings will not replicate How do we stop it? Multiple testing corrections

Bonferroni correction – If we want α = 0.05, instead use α = 0.05/n where n is the number of tests you want to use – So for 100 tests, we would use α = , and would only have 5% chance of any test rejecting the null hypothesis Benjamini-Hochberg correction – Popular alternative

Normal distribution

Tests that rely on assumptions of normality T-tests ANOVA / linear models

How to check if you data is normally distributed Histograms Statistical tests Can apply to data But better to apply to residuals of the models – For t-test, that means looking at the groups separately – For ANOVA, that means extracting residuals from the model

What do you do if your data is not normally distributed? If sample size is really small – Nothing you can do – use test anyway If data is skewed – Transform data (e.g. log? square root?) Use non-parametric tests – Mann-whitney U instead of T-test – Spearman’s Rank Correlation Last resort - Remove outliers? – Systematically and preferably only if you know what causes them

How to present plots Label both axes – Large enough to read Show units If using stars (*) for significance levels, explain what *, **, *** means Lunnon et al., (2012) Journal of Alzheimer’s Disease

How to present statistics I Say what statistical software used, e.g. – SPSS, STATA, R, MATLAB, etc Say what the sample size is Say what statistical test is being performed – T-test, ANOVA, chi-squared, etc Say what significance level you are using for the study – Think, is it appropriate given my sample size and number of hypotheses being tested?

How to present statistics II Report p-value – And/or multiple testing corrected p-value E.g. Q-values for Benjamini-Hochberg Report coefficient ( β ), and ideally it’s standard error for each reported statistic – This can be more informative than a p-value, especially for small datasets

How to present statistics III A more complete guide, tailored to SPSS and specific tests is given at:

Be cautious in your interpretations Correlation does not equal causation! Can you hypothesise a mechanism by which causation could occur?

Why does correlation not equal causation? It looks like the variables are correlated when they are not – How this happens? By chance, especially when multiple testing is performed but not corrected for Variables are truly correlated but there is either: – Reverse causation – Confounding by other variables

Confounding

Statistical software Excel – Point and click, quite limited SPSS – Point and click, a little limited STATA – Command line R, MATLAB, etc – Command line, very useful, steep learning curve

R introduction

Practical component - SPSS Data is faked to show large differences, real data will not be so clear cut

Outline Data Tests for normality Plots T-test ANOVA Chi-squared Non-parametric tests

Data Create folder in ‘My Documents’ Download data and save in your new folder: Slides & data : Open zip folder Double click on ‘neuroscience_example.sav’ to open SPSS

Introduction to the data 5 variables – a, b, c, d, e 2 are binary – a, b 3 are continuous – c, d, e

Normality checks I Need to check data is normally distributed when we want to apply – T-test – ANOVA – Linear regression

Normality checks II Let’s see if the variable ‘d’ is normally distributed

Normality checks III Can see that the data has two peaks Rejects the null hypothesis that the data is normally distributed

Normality checks III Now we take into account variable ‘a’, we find that ‘d’ is normally distributed when we take into account ‘a’

Plots Histograms (shown in normality check) – Show distribution of a continuous variable Boxplots – Show the distribution of a continuous variable between groups Line plot/scatter plot – Shows the relationship between two continuous variables

Generating a boxplot I

Generating a boxplot II

Generating a boxplot III

Generating a boxplot IV Double click on plot to label axis

Labelling a plot I Double click to change label

How to present plots Label both axes – Large enough to read Show units If using stars (*) for significance levels, explain what *, **, *** means Lunnon et al., (2012) Journal of Alzheimer’s Disease

Labelling a plot II

Saving plots Rename document and save in your folder You can now open the document and extract the plot as an image

Boxplot exercises Make a few more boxplots comparing binary variables to continuous variables Try adding labels Try saving Try to interpret the boxplot – Do you see differences between the groups

Generating a line plot I

Generating a line plot II

T-test Compares a binary variable (yes/no) to a continuous variable Example null hypothesis – Mean height is the same across males and females Example alternative hypothesis – Mean height is different between males and females

Performing a t-test I

Performing a t-test II Descriptive statistics

Performing a t-test II Levene test null hypothesis : the variance (and standard deviation) of the two groups are the same. At the significance level 0.05 we can reject the null hypothesis. Therefore we should use the second row (‘Equal variances not assumed’).

Performing a t-test II T-test null hypothesis : the mean of the two groups are the same At the significance level 0.05 we can reject the null hypothesis (p-value is less than 0.001). I.e. the data supports the fact that the variable ‘d’ is different between the two groups.

How to report findings (option A) A two-sample t-test assuming unequal variances performed in SPSS showed differences in variable ‘d’ between groups 0 (N = 19) and 1 (N = 31) in variable ‘a’ at the 5% significance level (mean difference = 1.6, standard error = 0.89, p-value < ).

How to report findings (option B) Materials and methods: – Statistical analysis Statistical analysis was performed in SPSS 20. Group differences were analysed using two sample t-test assuming unequal variances. A significance level of 5% was applied to all hypothesis tests. Results: – Variable ‘d’ was found to differ between groups 0 (N = 19) and 1 (N = 31) in variable ‘a’ (mean difference = 1.6, standard error = 0.89, p-value < ).

How that might look Materials and methods: – Statistical analysis Statistical analysis was performed in SPSS 20. Group differences were analysed using two sample t-test assuming unequal variances. A significance level of 5% was applied to all hypothesis tests. Results: – Change in blood glucose levels differed between males (N = 19) and females (N = 31) (mean difference = 1.6 ng/ml, standard error = 0.89, p-value < ).

How to present statistics A more complete guide, tailored to SPSS and specific tests is given at:

One-way ANOVA Extension of t-test idea Compares a binary variable (yes/no) to a several variables, continuous or nominal (including binary) Example null hypothesis – Mean height is the same across males and females, regardless of age Example alternative hypotheses – Mean height is different between males and females – Mean height differs across ages

Performing ANOVA I

Performing ANOVA II Which variables differ between variable ‘a’ group 0 and 1? How would you perform a Bonferroni multiple testing correction? How would you report these findings? (clue: )

Chi-squared test Compares multiple nominal variables Example null hypothesis – Lung cancer and smoking are unrelated Example alternative hypotheses – Smokers are more likely to have lung cancer

Performing chi-squared I

Performing chi-squared II How would you report these findings? (clue: )

Non-parametric tests If a continuous variable is not normally distributed, using parametric tests may give you misleading results – T-test and ANOVA are parametric tests Solution, use non-parametric tests – Such as Mann-Whitney U Spearman’s Rank Correlation

Mann-Whitney U A non-parametric equivalent of a t-test

Performing Mann-Whitney U I

Performing Mann-Whitney U II How would you report these findings? (clue: )

Spearman’s Rank Correlation A non-parametric equivalent of correlation, or a one-way ANOVA between a binary variable and a continuous variable

Performing Spearman’s Rank Correlation I

Performing Spearman’s Rank Correlation II How would you report these findings? (clue: )