Statistics review Basic concepts: Variability measures Distributions

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Hypothesis Testing Steps in Hypothesis Testing:
Statistics review Basic concepts: Variability measures Distributions Hypotheses Types of error Common analyses T-tests One-way ANOVA Randomized block ANOVA.
Statistics review 1 Basic concepts: Variability measures Distributions Hypotheses Types of error Common analyses T-tests One-way ANOVA Two-way ANOVA Randomized.
ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.
Spotting pseudoreplication 1.Inspect spatial (temporal) layout of the experiment 2.Examine degrees of freedom in analysis.
Chapter 11 Multiple Regression.
Biol 500: basic statistics
k r Factorial Designs with Replications r replications of 2 k Experiments –2 k r observations. –Allows estimation of experimental errors Model:
Today Concepts underlying inferential statistics
Lecture 5 Correlation and Regression
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Choosing and using statistics to test ecological hypotheses
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
1 G Lect 11a G Lecture 11a Example: Comparing variances ANOVA table ANOVA linear model ANOVA assumptions Data transformations Effect sizes.
Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Methods of Presenting and Interpreting Information Class 9.
Chapter 11 Analysis of Variance
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
Nonparametric Statistics
Dependent-Samples t-Test
Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
Statistics made simple Dr. Jennifer Capers
Lecture Nine - Twelve Tests of Significance.
CHAPTER 7 Linear Correlation & Regression Methods
Two-way ANOVA with significant interactions
Analysis of Variance -ANOVA
Advanced Quantitative Techniques
Comparing several means: ANOVA (GLM 1)
General Linear Model & Classical Inference
Comparing Three or More Means
PCB 3043L - General Ecology Data Analysis.
Hypothesis testing using contrasts
Understanding Results
Chapter 10: Analysis of Variance: Comparing More Than Two Means
Chapter 25 Comparing Counts.
12 Inferential Analysis.
Simple Linear Regression
Kin 304 Inferential Statistics
CHAPTER 29: Multiple Regression*
Nonparametric Statistics
Comparing Several Means: ANOVA
Chapter 11 Analysis of Variance
Prepared by Lee Revere and John Large
Chapter 14: Analysis of Variance One-way ANOVA Lecture 8
Regression designs Y X1 Plant size Growth rate 1 10
12 Inferential Analysis.
Quadrat sampling Quadrat shape Quadrat size Lab Regression and ANCOVA
Chapter 26 Comparing Counts.
Tutorial 6 SEG rd Oct..
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Quantitative Methods ANOVA.
MGS 3100 Business Analysis Regression Feb 18, 2016
STATISTICS INFORMED DECISIONS USING DATA
F test for Lack of Fit The lack of fit test..
Regression designs Y X1 Plant size Growth rate 1 10
Presentation transcript:

Statistics review Basic concepts: Variability measures Distributions Hypotheses Types of error Common analyses T-tests One-way ANOVA Two-way ANOVA Regression

Variance Ecological rule # 1: Everything varies …but how much does it vary?

Variance S2= Σ (xi – x )2 n-1 x

Variance S2= Σ (xi – x )2 n-1 x

Variance S2= Σ (xi – x )2 n-1 What is the variance of 4, 3, 3, 2 ? 2/3

1. Standard deviation (s, or SD) = Square root (variance) Variance variants 1. Standard deviation (s, or SD) = Square root (variance) Advantage: units

Advantage: indicates reliability Variance variants 2. Standard error (S.E.) = s n Advantage: indicates reliability

How to report We observed 29.7 (+ 5.3) grizzly bears per month (mean + S.E.). A mean (+ SD)of 29.7 (+ 7.4) grizzly bears were seen per month + 1SE or SD - 1SE or SD

Distributions Normal Quantitative data Poisson Count (frequency) data

Normal distribution 67% of data within 1 SD of mean

Poisson distribution mean Mostly, nothing happens (lots of zeros)

Poisson distribution Frequency data Lots of zero (or minimum value) data Variance increases with the mean

What do you do with Poisson data? Correct for correlation between mean and variance by log-transforming y (but log (0) is undefined!!) Use non-parametric statistics (but low power) Use a “general linear model” specifying a Poisson distribution

Hypotheses Null (Ho): no effect of our experimental treatment, “status quo” Alternative (Ha): there is an effect

Whose null hypothesis? Conditions very strict for rejecting Ho, whereas accepting Ho is easy (just a matter of not finding grounds to reject it). A criminal trial? Environmental protection? Industrial viability? Exotic plant species? WTO?

Hypotheses Null (Ho) and alternative (Ha): always mutually exclusive So if Ha is treatment>control…

Types of error Type 1 error Reject Ho Accept Ho Ho true Type 2 error Ho false

Types of error Usually ensure only 5% chance of type 1 error (ie. Alpha =0.05) Ability to minimize type 2 error: called power

Statistics review Basic concepts: Variability measures Distributions Hypotheses Types of error Common analyses T-tests One-way ANOVA Two-way ANOVA Regression

The t-test Asks: do two samples come from different populations? YES NO DATA Ho A B

The t-test Depends on whether the difference between samples is much greater than difference within sample. A B Between >> within… A B

The t-test Depends on whether the difference between samples is much greater than difference within sample. A B Between < within… A B

Difference between means Standard error within each sample The t-test T-statistic= Difference between means Standard error within each sample sp2 + sp2 n1 n2

How many degrees of freedom? The t-test How many degrees of freedom? (n1-1) + (n2-1) sp2 + sp2 n1 n2

T-tables v 0.10 0.05 0.025 1 3.078 6.314 12.706 2 1.886 2.920 4.303 3 1.638 2.353 3.182 4 1.533 2.132 2.776 infinity 1.282 1.645 1.960 Two samples, each n=3, with t-statistic of 2.50: significantly different?

T-tables v 0.10 0.05 0.025 1 3.078 6.314 12.706 2 1.886 2.920 4.303 3 1.638 2.353 3.182 4 1.533 2.132 2.776 infinity 1.282 1.645 1.960 Two samples, each n=3, with t-statistic of 2.50: significantly different? No!

If you have two samples with similar n and S. E If you have two samples with similar n and S.E., why do you know instantly that they are not significantly different if their error bars overlap? v 0.10 0.05 0.025 1 3.078 6.314 12.706 2 1.886 2.920 4.303 3 1.638 2.353 3.182 4 1.533 2.132 2.776 infinity 1.282 1.645 1.960

If you have two samples with similar n and S. E If you have two samples with similar n and S.E., why do you know instantly that they are not significantly different if their error bars overlap? v 0.10 0.05 0.025 1 3.078 6.314 12.706 2 1.886 2.920 4.303 3 1.638 2.353 3.182 4 1.533 2.132 2.776 infinity 1.282 1.645 1.960 } Careful! Doesn’t work the other way around!! the difference in means < 2 x S.E., i.e. t-statistic < 2 and, for any df, t must be > 1.96 to be significant!

General form of the t-test, can have more than 2 samples One-way ANOVA General form of the t-test, can have more than 2 samples Ho: All samples the same… Ha: At least one sample different

General form of the t-test, can have more than 2 samples One-way ANOVA General form of the t-test, can have more than 2 samples A B C DATA A B C Ho Ha A B C A C B

One-way ANOVA Just like t-test, compares differences between samples to differences within samples A B C Difference between means Standard error within sample T-test statistic (t) MS between groups MS within group ANOVA statistic (F)

MS= Sum of squares df Mean squares: Analogous to variance

Variance: S2= Σ (xi – x )2 n-1 Sum of squared differences

} } ANOVA tables df SS MS F p Treatment (between groups) df (X) SSX MSX MSE Look up ! Error (within groups) df (E) SSE Total df (T) SST } }

Do three species of palms differ in growth rate Do three species of palms differ in growth rate? We have 5 observations per species. Complete the table! df SS MS F p Treatment (between groups) 69 Error (within groups) k(n-1) Total 104

Hint: For the total df, remember that we calculate total SS as if there are no groups… MS F p Treatment (between groups) 69 Error (within groups) k(n-1) Total 104

At alpha = 0.05, F2,12 = 3.89 df SS MS F p Treatment (between groups) 2 69 34.5 11.8 ? Error (within groups) 12 35 2.92 Total 14 104

Two-way ANOVA Just like one-way ANOVA, except subdivides the treatment SS into: Treatment 1 Treatment 2 Interaction 1&2

Two-way ANOVA Suppose we wanted to know if moss grows thicker on north or south side of trees, and we look at 10 aspen and 10 fir trees: Aspect (2 levels, so 1 df) Tree species (2 levels, so 1 df) Aspect x species interaction (1df x 1df = 1df) Error? k(n-1) = 4 (10-1) = 36

v df SS MS F Aspect 1 SS(Aspect) MS(Aspect) MS(As) MSE Species SS(Species) MS(Species) MS(Sp) Aspect x Species SS(Int) MS(Int) Error (within groups) 36 SSE Total 39 SST

Interactions Combination of treatments gives non-additive effect North South Alder Fir

Interactions Combination of treatments gives non-additive effect Anything not parallel! North South North South

Careful! If you log-transformed your variables, the absence of interaction is a multiplicative effect: log (a) + log (b) = log (ab) y Log (y) North South North South

Regression Problem: to draw a straight line through the points that best explains the variance

Regression Problem: to draw a straight line through the points that best explains the variance

Regression Problem: to draw a straight line through the points that best explains the variance

Regression Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Variance explained (change in line lengths2) Variance unexplained (residual line lengths2)

Regression Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df In regression, each x-variable will normally have 1 df

Regression Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Essentially a cost: benefit analysis – Is the benefit in variance explained worth the cost in using up degrees of freedom?

Regression example Total variance for 32 data points is 300 units. An x-variable is then regressed against the data, accounting for 150 units of variance. What is the R2? What is the F ratio?

Regression example Total variance for 32 data points is 300 units. An x-variable is then regressed against the data, accounting for 150 units of variance. What is the R2? What is the F ratio? R2 = 150/300 = 0.5 F 1,30 = 150/1 = 15 300/30 Why is df error = 30?

ANCOVA In regression, x-variables can be continuous or categorical. Eg. To convert a treatment (size) with two levels (small, large) into a regression variable, we could code small=0, large= 1. Y= size*B + constant Mean value for small Difference between large and small Test significance of “size” in regression

ANCOVA In an Analysis of Covariance, we look at the effect of a treatment (categorical) while accounting for a covariate (continuous) Fertilized N+P Fertilized N Growth rate (g/day) Plant height (cm)

ANCOVA Fit full model (categorical treatment, covariate, interaction) Test for interaction (if significant- stop!) Test for differences in intercept Fertilized N+P Fertilized N Growth rate (g/day) No interaction Intercepts differ Plant height (cm)