ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.

Slides:



Advertisements
Similar presentations
Prepared by Lloyd R. Jaisingh
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Chapter 11 Analysis of Variance
© 2010 Pearson Prentice Hall. All rights reserved The Complete Randomized Block Design.
Multiple regression analysis
Independent Sample T-test Formula
Part I – MULTIVARIATE ANALYSIS
ANOVA Analysis of Variance: Why do these Sample Means differ as much as they do (Variance)? Standard Error of the Mean (“variance” of means) depends upon.
Analysis of Variance: Inferences about 2 or More Means
Lesson #23 Analysis of Variance. In Analysis of Variance (ANOVA), we have: H 0 :  1 =  2 =  3 = … =  k H 1 : at least one  i does not equal the others.
Chapter 3 Analysis of Variance
PSY 307 – Statistics for the Behavioral Sciences
Lecture 9: One Way ANOVA Between Subjects
Testing for differences between 2 means Does the mean weight of cats in Toledo differ from the mean weight of cats in Cleveland? Do the mean quiz scores.
One-way Between Groups Analysis of Variance
ANOVA Single Factor Models Single Factor Models. ANOVA ANOVA (ANalysis Of VAriance) is a natural extension used to compare the means more than 2 populations.
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Basic Analysis of Variance and the General Linear Model Psy 420 Andrew Ainsworth.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
F-Test ( ANOVA ) & Two-Way ANOVA
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Hypothesis Testing in Linear Regression Analysis
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
1 Experimental Statistics - week 7 Chapter 15: Factorial Models (15.5) Chapter 17: Random Effects Models.
ANOVA Greg C Elvers.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
One-Way Analysis of Variance Comparing means of more than 2 independent samples 1.
More complicated ANOVA models: two-way and repeated measures Chapter 12 Zar Chapter 11 Sokal & Rohlf First, remember your ANOVA basics……….
 The idea of ANOVA  Comparing several means  The problem of multiple comparisons  The ANOVA F test 1.
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
Testing Hypotheses about Differences among Several Means.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Chapter 19 Analysis of Variance (ANOVA). ANOVA How to test a null hypothesis that the means of more than two populations are equal. H 0 :  1 =  2 =
The Completely Randomized Design (§8.3)
General Linear Model 2 Intro to ANOVA.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Data Analysis.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Experimental Statistics - week 3
One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.
Statistics for Political Science Levin and Fox Chapter Seven
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
1 Experimental Statistics - week 8 Chapter 17: Mixed Models Chapter 18: Repeated Measures.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Six Easy Steps for an ANOVA 1) State the hypothesis 2) Find the F-critical value 3) Calculate the F-value 4) Decision 5) Create the summary table 6) Put.
Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)
ANOVA Econ201 HSTS212.
CHAPTER 13 Design and Analysis of Single-Factor Experiments:
Comparing Three or More Means
After ANOVA If your F < F critical: Null not rejected, stop right now!! If your F > F critical: Null rejected, now figure out which of the multiple means.
Analysis of Variance (ANOVA)
Introduction to ANOVA.
The Analysis of Variance
Experimental Statistics - week 8
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m 2 = m 3 Why not test: m 1 = m 2 m 1 = m 3 m 2 = m 3 For each test 95% probability to correctly fail to reject (accept?) null, when null is really true = probability of correctly failing to reject all 3 = 0.86

Probability of if incorrectly rejecting at least one of the (true) null hypotheses = = 0.14 As you increase the number of means compared, the probability of incorrectly rejecting a true null (type I error) increases towards one Side note: possible to correct (lower)  if you need to do multiple tests (Bonferroni correction)- unusual

ANOVA: calculate ratios of different portions of variance of total dataset to determine if group means differ significantly from each other Calculate ‘F’ ratio, named after R.A. Fisher 1) Visualize data sets 2) Partition variance (SS & df) 3) Calculate F (tomorrow)

Plot number Yield (tonnes) Pictures first 3 fertilizers applied to 10 plots each (N=30), yield measured How much variability comes from fertilizers, how much from other factors? Fert 1 Fert 2 Fert 3 Overall mean

What factors other than fertilizer (uncontrolled) may contribute to the variance in crop yield? How do you minimize uncontrolled factors contribution to variance when designing an experiment or survey study? If one wants to measure the effect of a factor in nature (most of ecology/geology), how can or should you minimize background variability between experimental units? Thought Questions

Fertilizer (in this case) is termed the independent or predictor variable or explanatory variable Can have any number of levels, we have 3 Can have more than one independent variable. We have 1, one way ANOVA Crop yield (in this case) is termed the dependent or response variable Can have more than one response variable…. multivariate analysis (ex MANOVA). Class taught by J. Harrell

Plot number Yield (tonnes) Pictures first -calculate deviation of each point from mean -some ‘+’ and some ‘-’ -sum to zero (remember definition of mean) Fert 1 Fert 2 Fert 3 Overall mean

Square all values Sum the squared values

n-1 calculate ~mean SS a.k.a. variance = ** why (n-1)?? Because… all deviations must sum to zero, therefore if you calculate n-1 deviations, you know what the final one must be. You do not actually have n independent pieces of information about the variance. SS not useful for comparing between groups, it is always big when n is big. Using the mean SS (variance) allows you to compare among groups

Back to the question: How much variability in crop yield comes from fertilizers (what you manipulated), how much from other factors (that you cannot control)? Partitioning Variability Calculate mean for each group, ie plots with fert1, fert2, and fert3 (3 group means) But first imagine a data set where…………

Plot number Yield (tonnes) -Imagine case were the group (treatment) means differ a lot, with little variation within a group -Group means explain most of the variability Fert 1 Fert 2 Fert 3 Overall mean Group means

Plot number Yield (tonnes) Fert 1 Fert 2 Fert 3 Overall mean Now…. imagine case were the group (treatment) means are not distinct, with much variation within a group -Group means explain little of the variability -3 fertilizers did not affect yield differently Group means

H 0 : mean yield fert1= mean yield fert2 = mean yield fert3 Or Fertilizer type has no effect on crop yield -calculating 3 measures of variability, start by partitioning SS

Total SS = Sum of squares of deviations of data around the grand (overall) mean (measure of total variability) Within group SS = (Error SS) Sum of squares of deviations of data around the separate group means (measure of variability among units given same treatment) Among groups SS = Sum of squares of deviations of group means around the grand mean (measure of variability among units given different treatments) Unfortunate word usage

Total SS = Sum of squares of deviations of data around the grand (overall) mean (measure of total variability) k = number experimental groups X ij = datum j in experimental group I Xbar i = mean of group I Xbar = grand mean  k i=1  nini j=1 X ij - X 2 Total SS = Sum of deviations of each datum from the grand mean, squared, summed across all k groups

Within group SS =  k i=1  nini j=1 X ij - X i 2 Within group SS = k = number experimental groups X ij = datum j in experimental group I Xbar i = mean of group I Xbar = grand mean Within group SS = Sum of squares of deviations of data around the separate group means (measure of variability among units given same treatment) Sum of deviations of each datum from its group mean, squared, summed across all k groups

Among groups SS = Sum of squares of deviations of group means around the grand mean (measure of variability among units given different treatments) k = number experimental groups X ij = datum j in experimental group I Xbar i = mean of group I Xbar = grand mean  k i=1 nini X i - X 2 Among groups SS = Sum of deviations of each group mean from the grand mean, squared

partitioning DF Total df = Within group df = (Error df) Among groups df = Total number experimental units -1 In fertilizer experiment, n-1= 29 units in each group -1, summed for all groups In fertilizer experiment, (10-1)*3; 9*3=27 Number group means -1 In fertilizer experiment, 3-1=2 Unfortunate word usage

SS and df sum Total SS = within groups SS + among groups SS Total df = within groups df + among groups df

Mean squares Combine information on SS and df Total mean squares = total SS/ total df total variance of data set Within group mean squares = within SS/ within df variance (per df) among units given same treatment Among groups mean squares = among SS / among df variance (per df) among units given different treatments Unfortunate word usage Error MS

Tomorrow: the big ‘F’ example calculations

Mean squares Combine information on SS and df Total mean squares = total SS/ total df total variance of data set Within group mean squares = within SS/ within df variance (per df) among units given same treatment Among groups mean squares = among SS / among df variance (per df) among units given different treatments Unfortunate word usage Error MS

Among groups mean squares Within group mean squares F =  Back to the question: Does fitting the treatment mean explain a significant amount of variance? In our example…. if fertilizer doesn’t influence yield, then variation between plots with the same fertilizer will be about the same as variation between plots given different fertilizers Compare calculated F to critical value from table (B4)

If calculated F as big or bigger than critical value, then reject H 0 But remember……. H0: m1 = m2 = m3 Need separate test (multiple comparison test) to tell which means differ from which

See handout Remember… Shape of t-distribution approaches normal curve as sample size gets very large But…. F distribution is different… always positive skew shape differs with df

Two types of ANOVA: fixed and random effects models Among groups mean squares Within group mean squares F = Calculation of F as: Assumes that the levels of the independent variable have been specifically chosen, as opposed to being randomly selected from a larger population of possible levels

Exs Fixed: Test for differences in growth rates of three cultivars of roses. You want to decide which of the three to plant. Random: Randomly select three cultivars of roses from a seed catalogue in order to test whether, in general, rose cultivars differ in growth rate Fixed: Test for differences in numbers of fast food meals consumed each month by students at UT, BG, and Ohio State in order to determine which campus has healthier eating habits Random: Randomly select 3 college campuses and test whether the number of fast food meals per month differs among college campuses in general

In random effects ANOVA the denominator is not the within groups mean squares Proper denominator depends on nature of the question ***Be aware that default output from most stats packages (eg, Excel, SAS) is fixed effect model

Assumptions of ANOVA Assumes that the variances of the k samples are similar (homogeneity of variance of homoscedastic) robust to violations of this assumption, especially when all n i are equal Assumes that the underlying populations are normally distributed also robust to violations of this assumption

Model Formulae Expression of the questions being asked Does fertilizer affect yield? yield = fertilizer (word equation) response var explanatory var Right side can get more complicated

General Linear Models Linear models relating response and explanatory variables and encompassing ANOVA (& related tests) which have categorical explanatory variables and regression (& related tests) which have categorical explanatory variables In SAS proc glm executes ANOVA, regression and other similar linear models Other procedures can also be used, glm is most general

data start; infile 'C:\Documents and Settings\cmayer3\My Documents\teaching\Biostatistics\Lectures\ANOVA demo.csv' dlm=',' DSD; inputplot fertilizer yield; options ls=80; proc print; data one; set start; proc glm; class fertilizer; model yield=fertilizer; run;

The SAS System 5 12:53 Thursday, September 22, 2005 The GLM Procedure Class Level Information Class Levels Values fertilizer Number of Observations Read 30 Number of Observations Used 30 The SAS System 6 12:53 Thursday, September 22, 2005 The GLM Procedure Dependent Variable: yield Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE yeild Mean Source DF Type I SS Mean Square F Value Pr > F fertilizer Source DF Type III SS Mean Square F Value Pr > F fertilizer