ANOVA: Analysis of Variance Xuhua Xia
Review of t-test Parametric –Pair-sample t-test: t.test(x1, x2, paired=TRUE) –Unpaired two-sample t-test assuming equal variance: t.test(x1, x2, var.equal=TRUE) when the two variances are not equal (Always do a non-parametric test and use the results of the more sensitive test): t.test(x1, x2) –Consequence of violating the assumption Nonparametric Man-Whitney-Wilcoxon test (Ensure that x is a 'factor'): wilcox.test(y~x,data=myDat,paired=T|F) Test equality of variance var.test(x1,x2) p <- 2*pf(Var small /Var large,DF small,DF large ) Alternative: rank the variables and perform a regular t-test) Equivalent methods in EXCEL Xuhua Xia
Review of Standard Error (SE) Xuhua Xia
Head of the statistics Division at the Rothamsted Experimental Station in Hertfordshire. One of the three founders of theoretical population genetics. Developer of statistical methods, especially the likelihood methods. Published The Genetical Theory of Natural Selection in 1930, in which he proposed the fundamental theory of natural selection. ANOVA was mainly developed by Ronald A. Fisher The F statistic was named after him. “To call in a statistician after the experiment is done may be no more than asking him to perform a postmortem examination; he may be able to say what the experiment died of.” Ronald A. Fisher ( )
Xuhua Xia x ij = + i + ij vs. x ij = + ij One-way ANOVA Model Is this effect zero? This is the same model for t-test, except that the subscript i is 1 and 2 in t-test, but 1, 2,..., n in one-way ANOVA
Xuhua Xia ANOVA Rationale The essence of ANOVA is to partition the total variation into its components. Suppose we have three groups (e.g., Control plus two treatment), each with N 1 =N 2 =N 3 =200 test animals. Given the null hypothesis that all three groups do not differ from each other, i.e., they all represent random samples from the same underlying population, we can estimate the population variance in three ways: –From all 600 animals: Var = Total SS/DF –From individual groups: SS 1 /DF 1, SS 2 /DF 2, SS 3 /DF 3 Var withinGroup = (SS 1 +SS 2 +SS 3 )/(DF 1 +DF 2 +DF 3 ) –From the three group means: M 1, M 2, M 3 and the grand mean M: SE = sqrt{[(M 1 -M) 2 + (M 2 -M) 2 + (M 3 -M) 2 ]/2} Var betweenGroup = SE 2 *200 = [N 1 *(M 1 -M) 2 + N 2 *(M 2 -M) 2 + N 3 (M 3 -M) 2 ]/2 Given the null hypothesis, Var withinGroup = Var betweenGroup. So ANOVA is an F-test of the two variances. In ANOVA termination, Var withinGroup is MS Error and Var betweenGroup is MS Model.
Xuhua Xia Low-fat foodMedium-fat foodHigh-fat food Weight048 gain2610 One-way experimental design
Xuhua Xia Numerical Illustration of One-Way ANOVA Assignment: Repeat the ANOVA computation by first replacing 10 in the High-fat food group by two values 9 and 20. Submit this slide with all updated values. Name: ID:
Xuhua Xia Dependent variable: Weight Gain SourceDFSSMSFp Model Error Total570.0 ANOVA Table The null hypothesis H0: X1 = X2 = X3 is rejected. The three kinds of food differ significantly in their effect on weight gain of rabbits. In particular, Medium-fat and High-fat foods are significantly better than Low-fat food. However, Medium-fat and High-fat foods do not differ in their effect on rabbit weight gain.
ANOVA and t-test Parametric: –aov(DV~IV1+IV2+… –aov(DV~IV1+IV2+IV1:IV2) or aov(DV~IV1*IV2) –Contrast ANOVA and t-test by using Mercury2Gr_A.txt and Mercury2Gr_B.txt (same data in two different format, one for t.test and one for aov : DarwinPlantBreeding_A.txt and DarwinPlantBreeding_B.txt (Ensure that the variable Speies is a factor Nonparametric: –One-way ANOVA: kruskal.test(DV~IV) –Randomized block design: friedman.test(y~A+B) Others: –summary(fit) print(model.tables(fit,"means"),digits=3) –boxplot(DV~IV) Xuhua Xia
Which of the six strains of clover has the highest protein content? The experimenter divided his field into 5 relatively homogenous blocks each with 6 plots, and randomly assigned his 6 strains to the 6 plots within each block. After harvesting, he determined the nitrogen content for each strain in each plot. Randomized complete blocks
Xuhua Xia Randomized complete blocks Block3dok13dok133dok43dok53dok7compos B B B B B Recode the data into three columns (variables): Yield, Variety and Block, and save it to a text file such as RandCompleteBlock.txt for data analysis in R, e.g., YieldVarietyBlock 333dok1B dok1B2 ……
R functions Xuhua Xia md<-read.table("RandCompleteBlock.txt",header=T) attach(md) fit<-aov(Yield~Block+Variety) summary(fit) anova(fit) TukeyHSD(fit) $Block diff lwr upr p adj B2-B B3-B B4-B B5-B B3-B $Variety diff lwr upr p adj 3dok13-3dok dok4-3dok dok5-3dok dok7-3dok compos-3dok dok4-3dok dok5-3dok dok7-3dok
Xuhua Xia Example A researcher needs to assess the effect of 3 drugs on reduce appetite. Appetite reduction is measured by inter-meal interval (in minutes). The half-life of the drugs is about 3 days. Seven human subjects differ in age, gender, appetite, degree of obesity and potentially many other ways. If the researcher randomly allocates these seven subjects into three groups, then some groups may contain young subjects than others or more males than others, etc., so that any group differences would be confounded by potentially many other factors. He decided to use randomized complete block design and administer the drugs on Monday in three consecutive weeks. For each subject, he randomized the three drugs into the three Mondays (top right), took an index of appetite, and obtained the data table (bottom right) Using test subjects as blocks is also called repeated measures ANOVA or within-subject ANOVA Assignment A: analyze the data and report the effect size and the result of the significance test (in short, what you want to include in a manuscript) SubjectDrug 1Drug 2Drug SubjectWeek1Week 2Week 3 1Drug2Drug1Drug3 2Drug1Drug3Drug2 3 Drug3Drug1 4Drug3Drug1Drug2 5Drug1Drug2Drug3 6 Drug2Drug1 7Drug2Drug1Drug3