Analysis of Variance (ANOVA)

Slides:



Advertisements
Similar presentations
ANALYSIS OF VARIANCE (ONE WAY)
Advertisements

Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.
SPH 247 Statistical Analysis of Laboratory Data April 2, 2010SPH 247 Statistical Analysis of Laboratory Data1.
ANOVA: Analysis of Variation
ANOVA notes NR 245 Austin Troy
Lecture 23: Tues., Dec. 2 Today: Thursday:
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Experimental Design Terminology  An Experimental Unit is the entity on which measurement or an observation is made. For example, subjects are experimental.
1 Some R Basics EPP 245/298 Statistical Analysis of Laboratory Data.
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
1 Analysis of Variance (ANOVA) EPP 245 Statistical Analysis of Laboratory Data.
Analysis of Variance Introduction The Analysis of Variance is abbreviated as ANOVA The Analysis of Variance is abbreviated as ANOVA Used for hypothesis.
F-Test ( ANOVA ) & Two-Way ANOVA
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
1 Design of Engineering Experiments Part 10 – Nested and Split-Plot Designs Text reference, Chapter 14, Pg. 525 These are multifactor experiments that.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
The Completely Randomized Design (§8.3)
Exercise 1 You have a clinical study in which 10 patients will either get the standard treatment or a new treatment Randomize which 5 of the 10 get the.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Statistics for Differential Expression Naomi Altman Oct. 06.
Lecture 3 Linear Models II Olivier MISSA, Advanced Research Skills.
Statistics for Political Science Levin and Fox Chapter Seven
EPP 245 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Exercise 1 You have a clinical study in which 10 patients will either get the standard treatment or a new treatment Randomize which 5 of the 10 get the.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.
1 Experimental Statistics - week 8 Chapter 17: Mixed Models Chapter 18: Repeated Measures.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Differences Among Groups
> xyplot(cat~time|id,dd,groups=ses,lty=1:3,type="l") > dd head(dd) id ses cat pre post time
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Methods of Presenting and Interpreting Information Class 9.
ANOVA: Analysis of Variation
Chapter 11 Analysis of Variance
ANOVA: Analysis of Variation
Chapter 10 Two-Sample Tests and One-Way ANOVA.
Chapter 12 Simple Linear Regression and Correlation
Statistical Data Analysis - Lecture /04/03
Analysis of variance ANOVA.
CHAPTER 7 Linear Correlation & Regression Methods
Comparing Three or More Means
Mixed models and their uses in meta-analysis
Correlation and regression
Chapter 25 Comparing Counts.
Chapter 5 Introduction to Factorial Designs
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Correlation and Regression
CHAPTER 29: Multiple Regression*
Chapter 11 Analysis of Variance
Console Editeur : myProg.R 1
Bayesian style dependent tests
Chapter 11: The ANalysis Of Variance (ANOVA)
Chapter 12 Simple Linear Regression and Correlation
Multiple Regression Models
Chapter 26 Comparing Counts.
Alexandra Kuznetsova Biostatistician, Leo Pharma
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Analysis of Variance (ANOVA)
MGS 3100 Business Analysis Regression Feb 18, 2016
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Analysis of Variance (ANOVA) 9/2/2019 Analysis of Variance (ANOVA) SPH 247 Statistical Analysis of Laboratory Data March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

ANOVA—Fixed and Random Effects We will review the analysis of variance (ANOVA) and then move to random and fixed effects models Nested models are used to look at levels of variability (days within subjects, replicate measurements within days) Crossed models are often used when there are both fixed and random effects. March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

9/2/2019 The Basic Idea The analysis of variance is a way of testing whether observed differences between groups are too large to be explained by chance variation One-way ANOVA is used when there are k ≥ 2 groups for one factor, and no other quantitative variable or classification factor. March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

9/2/2019 A B C 9 10 12 7 14 8 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Column Deviations from grand mean + Cell Deviations from column mean 9/2/2019 Data = Grand Mean + Column Deviations from grand mean + Cell Deviations from column mean Are the column deviations from the grand mean too big to be accounted for by the cell deviations from the column means? March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

9/2/2019 Data A B C 9 10 12 7 14 8 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Column Means A B C 8 9 13 9/2/2019 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Deviations from Column Means 9/2/2019 Deviations from Column Means A B C 1 -1 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

red.cell.folate package:ISwR R Documentation Red cell folate data 9/2/2019 red.cell.folate package:ISwR R Documentation Red cell folate data Description: The 'folate' data frame has 22 rows and 2 columns. It contains data on red cell folate levels in patients receiving three different methods of ventilation during anesthesia. Format: This data frame contains the following columns: folate: a numeric vector. Folate concentration (μg/l). ventilation: a factor with levels: 'N2O+O2,24h': 50% nitrous oxide and 50% oxygen, continuously for 24~hours; 'N2O+O2,op': 50% nitrous oxide and 50% oxygen, only during operation; 'O2,24h': no nitrous oxide, but 35-50% oxygen for 24~hours. March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

> data(red.cell.folate) > help(red.cell.folate) 9/2/2019 > library(ISwR) > data(red.cell.folate) > help(red.cell.folate) > summary(red.cell.folate) folate ventilation Min. :206.0 N2O+O2,24h:8 1st Qu.:249.5 N2O+O2,op :9 Median :274.0 O2,24h :5 Mean :283.2 3rd Qu.:305.5 Max. :392.0 > attach(red.cell.folate) > plot(folate ~ ventilation) March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

> folate.lm <- lm(folate ~ ventilation) > summary(folate.lm) 9/2/2019 > folate.lm <- lm(folate ~ ventilation) > summary(folate.lm) Call: lm(formula = folate ~ ventilation) Residuals: Min 1Q Median 3Q Max -73.625 -35.361 -4.444 35.625 75.375 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 316.62 16.16 19.588 4.65e-14 *** ventilationN2O+O2,op -60.18 22.22 -2.709 0.0139 * ventilationO2,24h -38.62 26.06 -1.482 0.1548 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 45.72 on 19 degrees of freedom Multiple R-Squared: 0.2809, Adjusted R-squared: 0.2052 F-statistic: 3.711 on 2 and 19 DF, p-value: 0.04359 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Analysis of Variance Table Response: folate 9/2/2019 > anova(folate.lm) Analysis of Variance Table Response: folate Df Sum Sq Mean Sq F value Pr(>F) ventilation 2 15516 7758 3.7113 0.04359 * Residuals 19 39716 2090 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > TukeyHSD(aov(folate~ventilation)) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = folate ~ ventilation) $ventilation diff lwr upr p adj N2O+O2,op-N2O+O2,24h -60.18056 -116.61904 -3.74207 0.0354792 O2,24h-N2O+O2,24h -38.62500 -104.84037 27.59037 0.3214767 O2,24h-N2O+O2,op 21.55556 -43.22951 86.34062 0.6802018 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Two- and Multi-way ANOVA 9/2/2019 Two- and Multi-way ANOVA If there is more than one factor, the sum of squares can be decomposed according to each factor, and possibly according to interactions One can also have factors and quantitative variables in the same model (cf. analysis of covariance) All have similar interpretations March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Heart rates after enalaprilat (ACE inhibitor) Description: 9/2/2019 Heart rates after enalaprilat (ACE inhibitor) Description: 36 rows and 3 columns. data for nine patients with congestive heart failure before and shortly after administration of enalaprilat, in a balanced two-way layout. Format: hr a numeric vector. Heart rate in beats per minute. subj a factor with levels '1' to '9'. time a factor with levels '0' (before), '30', '60', and '120' (minutes after administration). March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

> attach(heart.rate) > heart.rate hr subj time 1 96 1 0 9/2/2019 > data(heart.rate) > attach(heart.rate) > heart.rate hr subj time 1 96 1 0 2 110 2 0 3 89 3 0 4 95 4 0 5 128 5 0 6 100 6 0 7 72 7 0 8 79 8 0 9 100 9 0 10 92 1 30 ...... 18 106 9 30 19 86 1 60 27 104 9 60 28 92 1 120 36 102 9 120 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

> hr.lm <- lm(hr~subj+time) > anova(hr.lm) 9/2/2019 > plot(hr~subj) > plot(hr~time) > hr.lm <- lm(hr~subj+time) > anova(hr.lm) Analysis of Variance Table Response: hr Df Sum Sq Mean Sq F value Pr(>F) subj 8 8966.6 1120.8 90.6391 4.863e-16 *** time 3 151.0 50.3 4.0696 0.01802 * Residuals 24 296.8 12.4 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > sres <- resid(lm(hr~subj)) > plot(sres~time) Note that when the design is orthogonal, the ANOVA results don’t depend on the order of terms. March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Fixed and Random Effects A fixed effect is a factor that can be duplicated (dosage of a drug) A random effect is one that cannot be duplicated Patient/subject Repeated measurement There can be important differences in the analysis of data with random effects The error term is always a random effect March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Estradiol data from Rosner 5 subjects from the Nurses’ Health Study One blood sample each Each sample assayed twice for estradiol (and three other hormones) The within-subject variability is strictly technical/assay Variability within a person over time will be much greater March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

> anova(lm(Estradiol ~ Subject,data=endocrin)) Analysis of Variance Table Response: Estradiol Df Sum Sq Mean Sq F value Pr(>F) Subject 4 593.31 148.329 24.546 0.001747 ** Residuals 5 30.21 6.043 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Replication error variance is 6.043, so the standard deviation of replicates is 2.46 pg/mL This compared to average levels across subjects from 8.05 to 18.80 Estimated variance across subjects is (148.329 − 6.043)/2 = 71.143 Standard deviation across subjects is 8.43 pg/mL If we average the replicates, we get five values, the standard deviation of which is also 8.43 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Fasting Blood Glucose Part of a larger study that also examined glucose tolerance during pregnancy Here we have 53 subjects with 6 tests each at intervals of at least a year The response is glucose as mg/100mL March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

> anova(lm(FG ~ Subject,data=fg2)) Analysis of Variance Table Response: FG Df Sum Sq Mean Sq F value Pr(>F) Subject 52 10936 210.310 2.9235 9.717e-09 *** Residuals 265 19064 71.938 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > Estimated within-Subject variance is 71.938, so the standard deviation is 8.48 mg/100mL Estimated between-Subject variance is (210.310 − 71.938)/6 = 23.062, sd = 4.80 mg/100mL The variance of the 53 means is 35.05, which is larger because it includes a component of the within-subject variance March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Nested Random Effects Models Cooperative trial with 6 laboratories, one analyte (7 in the full data set), 3 batches per lab (a month apart), and 2 replicates per batch Estimate the variance components due to labs, batches, and replicates Test for significance if possible Effects are lab, batch-in-lab, and error March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

> coop2 <- coop[coop$Spc=="S1",] > summary(coop2) > library(MASS) > data(coop) > names(coop) [1] "Lab" "Spc" "Bat" "Conc" > summary(coop) Lab Spc Bat Conc L1:42 S1:36 B1:84 Min. :0.1100 L2:42 S2:36 B2:84 1st Qu.:0.4675 L3:42 S3:36 B3:84 Median :1.0600 L4:42 S4:36 Mean :1.9215 L5:42 S5:36 3rd Qu.:1.7000 L6:42 S6:36 Max. :9.9000 S7:36 > coop2 <- coop[coop$Spc=="S1",] > summary(coop2) Lab Spc Bat Conc L1:6 S1:36 B1:12 Min. :0.2900 L2:6 S2: 0 B2:12 1st Qu.:0.3575 L3:6 S3: 0 B3:12 Median :0.4000 L4:6 S4: 0 Mean :0.5081 L5:6 S5: 0 3rd Qu.:0.4600 L6:6 S6: 0 Max. :1.3000 S7: 0 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Analysis using lm or aov > anova(lm(Conc ~ Lab + Lab:Bat,data=coop2)) Analysis of Variance Table Response: Conc Df Sum Sq Mean Sq F value Pr(>F) Lab 5 1.89021 0.37804 60.0333 1.354e-10 *** Lab:Bat 12 0.20440 0.01703 2.7049 0.02768 * Residuals 18 0.11335 0.00630 The test for batch-in-lab is correct, but the test for lab is not—the denominator should be The Lab:Bat MS, so F(5,12) = 0.37804/0.01703 = 22.198 and p = 3.47e-4, still significant Residual 0.00630 0.0794 Batch 0.00537 0.0733 Lab 0.06017 0.2453 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Analysis using lmer R package lme4 One term per effect, in this case nested In this case, no fixed effects > lmer(Conc ~ 1+(1|Lab)+(1|Bat:Lab),data=coop2) March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

> lmer(Conc ~ 1+(1|Lab)+(1|Bat:Lab),data=coop2) > library(lme4) > lmer(Conc ~ 1+(1|Lab)+(1|Bat:Lab),data=coop2) Linear mixed model fit by REML ['lmerMod'] Formula: Conc ~ 1 + (1 | Lab) + (1 | Bat:Lab) Data: coop2 REML criterion at convergence: -42.0432 Random effects: Groups Name Std.Dev. Bat:Lab (Intercept) 0.07327 Lab (Intercept) 0.24529 Residual 0.07936 Number of obs: 36, groups: Bat:Lab, 18; Lab, 6 Fixed Effects: (Intercept) 0.5081 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Hypothesis Tests When data are balanced, one can compute expected mean squares, and many times can compute a valid F test. In more complex cases, or when data are unbalanced, this is more difficult One requirement for certain hypothesis tests to be valid is that the null hypothesis value is not on the edge of the possible values For H0: α = 0, we have that α could be either positive or negative For H0: σ2 = 0, negative variances are not possible March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

Effect Variance SD ------------------------------------ Residual 0.00630 0.0794 Batch 0.00537 0.0733 Lab 0.06017 0.2453 The variance among replicates a month apart (0.00630 + 0.00537 = 0.01167) is about twice that of those on the same day (0.00630), and the standard deviations are 0.1080 and 0.0794. These are CV’s on the average of 21% and 16% respectively The variance among values from different labs is about 0.00630+0.00537+0.06017 = 0.07184, with a standard deviation of 0.2680 and a CV of about 52% March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data

More complex models When data are balanced and the expected mean squares can be computed, this is a valid way for testing and estimation Programs like lme and lmer in R and Proc Mixed in SAS can handle complex models But most likely this is a time when you may need to consult an expert March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data