Talking to Biologists, by a biologist

Slides:



Advertisements
Similar presentations
Genetic Statistics Lectures (5) Multiple testing correction and population structure correction.
Advertisements

Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Structural Equation Modeling. What is SEM Swiss Army Knife of Statistics Can replicate virtually any model from “canned” stats packages (some limitations.
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
ANALYSIS OF BIOLOGICAL DATA BIOL4062/5062 Hal Whitehead.
Correlation & Regression
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Today: Quizz 8 Friday: GLM review Monday: Exam 2.
1 Why do we need statistics? A.To confuse students B.To torture students C.To put the fear of the almighty in them D.To ruin their GPA, so that they don’t.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Chapter 22: Comparing Two Proportions. Yet Another Standard Deviation (YASD) Standard deviation of the sampling distribution The variance of the sum or.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
ECON 338/ENVR 305 CLICKER QUESTIONS Statistics – Question Set #8 (from Chapter 10)
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
ANOVA Overview of Major Designs. Between or Within Subjects Between-subjects (completely randomized) designs –Subjects are nested within treatment conditions.
Lecture 3 (Chapter 4). Linear Models for Longitudinal Data Linear Regression Model (Review) Ordinary Least Squares (OLS) Maximum Likelihood Estimation.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
29 October 2009 MRC CBU Graduate Statistics Lectures 4: GLM: The General Linear Model - ANOVA & ANCOVA1 MRC Cognition and Brain Sciences Unit Graduate.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.
Lecture Slides Elementary Statistics Twelfth Edition
Advanced Data Analytics
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Chapter 11: Linear Regression and Correlation
CHAPTER 4 Designing Studies
Measurement, Quantification and Analysis
Introduction to Hypothesis Test – Part 2
ANOVA Econ201 HSTS212.
Stats/Methods II JEOPARDY.
Political Science 30: Political Inquiry
ANOVA (Chapter - 04/D) Dr. C. Ertuna.
Regression.
Analysis of Data Graphics Quantitative data
Principles of Experiment
Chapter 13- Experiments and Observational Studies
Statistics in Applied Science and Technology
Simple Linear Regression - Introduction
Introduction to Inferential Statistics
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
CHAPTER 4 Designing Studies
Review of Hypothesis Testing
Regression Models - Introduction
Statistics review Basic concepts: Variability measures Distributions
Hypothesis Testing Two Proportions
CHAPTER 4 Designing Studies
One way ANALYSIS OF VARIANCE (ANOVA)
Regression Chapter 8.
Independent variables correlate with each other
The general linear model and Statistical Parametric Mapping
Multivariate Linear Regression Models
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Chapter 14 Inference for Regression
An Introductory Tutorial
CHAPTER 4 Designing Studies
A protocol for data exploration to avoid common statistical problems
CHAPTER 4 Designing Studies
WellcomeTrust Centre for Neuroimaging University College London
Research Methods & Statistics
SRM II Review of key concepts
CHAPTER 4 Designing Studies
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
CHAPTER 4 Designing Studies
Chi Square Test of Homogeneity
Regression Models - Introduction
Presentation transcript:

Talking to Biologists, by a biologist

My background BSc Botany MSc Environmental Science / Ecology PhD Evolutionary Biology / Ecological Genetics Lecturer – Victoria University of Wellington – University of New Brunswick Canada Taught: Quantitative Genetics Population Genetics Evolutionary Ecology Hypothesis Testing in Biology (aka Bio-statistics) Graduate level Statistics for Biologists Plant and Food Research – Data Science Group

My view: Statistics is philosophy of science The key is what is the question? How do we ask the question in a way that: Contrasts the right things Makes sure we are only contrasting those things How do we collect our data in a way that: Minimises bias Reduces variation The better we biologists ask questions the better our science is.

First year undergraduate biology courses teach t-test Chi-square test But most don’t teach why we choose one test or another And the two are taught as a dichotomy. - Does it involve Drosophila? – Chi-square test - Does it measure finger lengths? – t-test

But most biological data is not normal or are counts of Drosophila So why do we teach the normal distribution as the default?

Hypothesis testing in Biology (3rd year Bio-Statistics course) Graduate students has a problem: “What I really want to know is …” Students design experiments in groups Plan to minimize bias Plan to reduce variability Samples must be independent Best design judged by graduate student Students analyse data and report back to grad student Most designs more complicated than simple anova/regression Covariates Repeated measures Lack of independence of samples

Biology has changed to a data intensive science Most PhD students spend >1 year analysing data Datasets are large, complex with complicated structures and correlations between observations and/or variables Understanding matrix algebra & calculus (at least the principles) is key to many biological disciplines Generalized linear (mixed) model approach taught in most grad courses

Biology has changed to a data intensive science

How to move Biologists to Conscious (In)competence?

How to know when you need help? Zuur Assumptions of linear models (i) normality, (ii) homogeneity, (iii) fixed X (X represents explanatory variables), (iv) independence, and (v) a correct model specification Tried to find a dataset that didn’t violate those assumptions. Failed

When linear modelling just won’t do Zuur et al: Is your response variable heterogeneous? Does your data have repeated measurements? Is it nested (hierarchical)? Is it sampled at multiple locations or sampled repeatedly over time?

Statistics is philosophy of science (in my view)

Biologists want a road-map but it doesn’t have to be straight-forward

My key concepts for every Biologist Design experiments to minimize bias and sampling error Observational studies are not experiments (the observer does not randomly assign treatments to subjects) Biological interactions are the interesting parts of biology “Things” in biology are often confounded Samples are rarely independent (what is independent in Biology, anyway?) Time and space are important Zero does not always mean zero Unmeasured things can have important effects Biologically important is the key parameter, not statistically significant

“All models are wrong. Some are useful” - George Box Key concepts should provide an “oh-oh” Communication should emphasise checking for model suitability Emphasis on better models, not right ones

Thank you Linley.Jesson@plantandfood.co.zn