Analysis of Covariance ANOVA is a class of statistics developed to evaluate controlled experiments. Experimental control, random selection of subjects,

Slides:



Advertisements
Similar presentations
Multiple-choice question
Advertisements

ANALYSIS OF VARIANCE (ONE WAY)
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
1 1 Slide © 2009, Econ-2030 Applied Statistics-Dr Tadesse Chapter 10: Comparisons Involving Means n Introduction to Analysis of Variance n Analysis of.
Chapter 3 Analysis of Variance
Experimental Design Terminology  An Experimental Unit is the entity on which measurement or an observation is made. For example, subjects are experimental.
Two Groups Too Many? Try Analysis of Variance (ANOVA)
8. ANALYSIS OF VARIANCE 8.1 Elements of a Designed Experiment
One-way Between Groups Analysis of Variance
Incomplete Block Designs
Chapter 9 - Lecture 2 Computing the analysis of variance for simple experiments (single factor, unrelated groups experiments).
The Kruskal-Wallis Test The Kruskal-Wallis test is a nonparametric test that can be used to determine whether three or more independent samples were.
1 Chapter 13: Introduction to Analysis of Variance.
Chapter 12: Analysis of Variance
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Chapter 13: Inference in Regression
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Chapter 12: Introduction to Analysis of Variance
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
One-way Analysis of Variance 1-Factor ANOVA. Previously… We learned how to determine the probability that one sample belongs to a certain population.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
Randomized Block Design (Kirk, chapter 7) BUSI 6480 Lecture 6.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
T- and Z-Tests for Hypotheses about the Difference between Two Subsamples.
Testing Hypotheses about Differences among Several Means.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Comparing Three or More Means ANOVA (One-Way Analysis of Variance)
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
1 Experimental Statistics Spring week 6 Chapter 15: Factorial Models (15.5)
Experimental Statistics - week 3
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.Introduction To Basic Ratios 3.Basic Ratios In Excel 4.Cumulative.
1 Experimental Statistics - week 8 Chapter 17: Mixed Models Chapter 18: Repeated Measures.
1/54 Statistics Analysis of Variance. 2/54 Statistics in practice Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality.
Statistics (cont.) Psych 231: Research Methods in Psychology.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
1 Pertemuan 19 Analisis Varians Klasifikasi Satu Arah Matakuliah: I Statistika Tahun: 2008 Versi: Revisi.
Inferential Statistics Psych 231: Research Methods in Psychology.
i) Two way ANOVA without replication
Statistics Analysis of Variance.
12 Inferential Analysis.
Statistics for Business and Economics (13e)
Econ 3790: Business and Economic Statistics
Kin 304 Inferential Statistics
Correlation and Regression
Chapter 14: Analysis of Variance One-way ANOVA Lecture 8
Statistical Inference about Regression
12 Inferential Analysis.
Chapter 10 – Part II Analysis of Variance
Presentation transcript:

Analysis of Covariance

ANOVA is a class of statistics developed to evaluate controlled experiments. Experimental control, random selection of subjects, and random assignment of subjects to subgroups are devices to control or hold constant all the other (UNMEASURED) influences on the dependent (Y ij ) variable so that the effects of the independent (X ij ) variable can be assessed. Without experimental control, random selection, and random assignment, other (non-random) differences besides the treatment variable enter the picture. Remember: Inferential statistics only assess the likelihood that chance could have affected the sample results; they do not take into account non-random factors.

For example, without randomly selecting students and compelling them to take PPD 404, then randomly assigning them to an instructor, plus controlling their lives for an entire semester (e.g., forbidding them to work), differences that are not random creep in. To some extent, this problem of uncontrolled, non- random differences can be compensated for by introducing covariates as statistical controls. Covariates are continuous variables that hold constant non-random differences. For example, by asking students how many hours per week they were working, we could add this variable to our ANOVA model. Let's look briefly at the analysis of covariance with one of the classic examples in the statistical literature.

The data are from an experiment involving the use of two drugs for treating leprosy. Drug A and Drug B were experimental drugs; Drug C was a placebo. Subjects were children in the Philippines suffering from leprosy. Thirty children who were taken to a clinic were given either Drug A, B, or C (the treatments) in order of their arrival. Thus each subgroup consisted of 10 children. The outcome measure, Y ij, was a microscopic count of leprosy bacilli in samples taken from six body sites on each child at the end of the experiment. Data are in the following table.

——————————————————————————————————————————————————————— Group A Group B Group C Y Y 2 X X 2 XY Y Y 2 X X 2 XY Y Y 2 X X 2 XY ——————————————————————————————————————————————————————— ——————————————————————————————————————————————————————— n 1 = 10 n 2 = 10 n 3 = 10  Y = 237  X = 322N = 30 _ _ Y = 7.90 X = 10.73

First, let’s perform a one-way analysis of variance on these data. As a short-cut for calculating the sum of squares, we will use the following algorithm: SS =  Y 2 - (  Y) 2 / N This is read: sum of squares equals the sum of the squared values of Y minus the sum of the Y-values squared with the difference between them divided by N. This short-cut was developed to speed up calculations in the days before widespread use of computer software. Three variations of this short-cut will be used.

First, let's find the total sum of squares. This is the sum of all the squared Y-values less the sum of the Y- values squared with the difference between them divided by N: SS Total(Y) = ( ) - [(237) 2 /30] SS Total(Y) = (3161) - (56169/30) SS Total(Y) = SS Total(Y) = 1289

Next, the between sum of squares can be obtained by applying the short-cut equation: SS Between(Y) = [(53) 2 /10 + (61) 2 /10 + (123) 2 /10] - [(237) 2 /30] SS Between(Y) = SS Between(Y) = 294

Finally, the sum of squares within: SS Within(Y) = SS Total(Y) - SS Between(Y) SS Within(Y) = SS Within(Y) = 995 Degrees of freedom are as before: N - 1 for total, J - 1 for between, and N - J for within.

These results can be assembled in the usual ANOVA summary table. —————————————————————————————————————————————————————————————— SourceSS df Mean SquareF —————————————————————————————————————————————————————————————— Between Groups Within Groups Total ——————————————————————————————————————————————————————————————

Because we do not want to "jump to conclusions" with the experimental drugs, an alpha level of 0.05 is too modest. Let's set alpha at This means that we have only one chance in 100 of wrongly rejecting the null hypothesis (ruling out chance as the explanation for differences in the effectiveness of the drugs). With alpha = 0.01, the critical value of F with 2 and 27 degrees of freedom is 5.49 (Appendix 3, p. 545). Since F is only 3.989, we CANNOT reject the null hypothesis. There is no evidence that either Drug A or Drug B is different from the placebo, Drug C, nor is there evidence that Drugs A and B differ from one another.

The researchers wanted to be sure that the children in each of the three groups were equally ill at the beginning of the experiment. Perhaps one of the drugs was effective, but, because the children who received it were more sick than those in the other groups, its effects were masked. A measure of illness at the START of the experiment was added to the statistical analysis as a control variable—as a covariate. This covariate was the count of bacilli at the same six body sites, but these counts were taken BEFORE any drugs were given. These data are in the table under columns headed X.

——————————————————————————————————————————————————————— Group A Group B Group C Y Y 2 X X 2 XY Y Y 2 X X 2 XY Y Y 2 X X 2 XY ——————————————————————————————————————————————————————— ——————————————————————————————————————————————————————— n 1 = 10 n 2 = 10 n 3 = 10  Y = 237  X = 322N = 30 _ _ Y = 7.90 X = 10.73

The general linear model that includes the influence of this covariate is written: Y ij =  +  j X 1ij +  X 2ij +  ij where  is a linear coefficient expressing the influence of the covariate, X 2ij, on the dependent variable, Y ij. If the covariate has no influence,  = 0.0. Therefore, the  X 2ij products all would be 0.0, and this term would drop out, leaving Y ij =  +  j X 1ij +  ij

To adjust for the presence of the covariate, we need to calculate sums of squares and degrees of freedom for the covariate, X 2, AS WELL AS for the covariance between X 2 and Y. We construct covariance sums of squares from the cross-products, XY (the final column in each of the three table panels). Total sum of squares, between sum of squares, and within sum of squares for the covariate, X 2, are straightforward. For the total sum of squares for X: SS Total(X) = ( ) - [(322) 2 /30] SS Total(Y) = (4122) - (103,684/30) SS Total(X) = SS Total(X) = 666

——————————————————————————————————————————————————————— Group A Group B Group C Y Y 2 X X 2 XY Y Y 2 X X 2 XY Y Y 2 X X 2 XY ——————————————————————————————————————————————————————— ——————————————————————————————————————————————————————— n 1 = 10 n 2 = 10 n 3 = 10  Y = 237  X = 322N = 30 _ _ Y = 7.90 X = 10.73

For the between sum of squares: SS Between(X) = [(93) 2 /10 + (100) 2 /10 + (129) 2 /10] - [(322) 2 /30] SS Between(X) = 3529 – 3456 SS Between(X) = 73 And for the within sum of squares: SS Within(X) = SS Total(X) - SS Between(X) SS Within(X) = 666 – 73 SS Within(X) = 593

For the cross-products, we use the same approach; e.g., for the cross-product total sum of squares: SS Total(XY) = ( ) - [(322)(237)/30] SS Total(XY) = (3277) - (76,314/30) SS Total(XY) = 3277 – 2544 SS Total(XY) = 733

For the cross-product between sum of squares: SS Between(XY) = [(53)(93)10 + (61)(100)10 + (123)(129)/10] - [(322)(237)/30] SS Between(XY) = SS Between(YX) = 146 And for the cross-product within sum of squares: SS Within(XY) = SS Total(XY) - SS Between(XY) SS Within(XY) = SS Within(XY) = 587

Adjustments to the simple ANOVA results for the presence of the covariate should also look familiar. We need to adjust the within sum of squares, the between sum of squares, the within degrees of freedom, and the between degrees of freedom. Total sum of squares and total degrees of freedom are unchanged because (a) we are still trying to account for total variance in the dependent variable, Y ij, and (b) we have the same number of subjects, 30.

The within sum of squares adjustment is: SS Within(Adj) = SS Within(Y) - [(SS Within(XY) ) 2 / SS Within(X) ] SS Within(Adj) = [(587) 2 / 593] SS Within(Adj) = (344,569 / 593) SS Within(Adj) = 995 – 581 SS Within(Adj) = 414

The adjustment for the between sum of squares is: SS Between(Adj) = SS Total - SS Within(Adj) SS Between(Adj) = SS Between(Adj) = 875 We lose a degree of freedom within from the total degrees of freedom for the presence of the covariate. The adjustment is df Within(Adj) = N - J - K where K is the number of covariates. Here, df Within(Adj) = = 26

Because of the IDENTITY between degrees of freedom, the adjustment for the between degrees of freedom is simply df Between(Adj) = df Total - df Within(Adj) df Between(Adj) = = 3 The analysis of covariance results are contained in following table.

—————————————————————————————————————————————————————————————— SourceSS df Mean SquareF —————————————————————————————————————————————————————————————— Between Groups Within Groups Total —————————————————————————————————————————————————————————————— The presence of the covariate in the general linear model makes quite a difference. The within sum of squares has been reduced to 414 from 995 with the loss of only one degree of freedom. The between sum of squares—reflecting the differences among drugs—has nearly tripled, from 294 to 875 with a gain of only one degree of freedom. As a result, the F-ratio is now

With alpha at 0.01 and 3 and 26 degrees of freedom, the critical value is now 4.64 (Appendix 3, p. 545). Thus, we REJECT the null hypothesis that none of the three drugs was different from any other: H 0 :  1 =  2 =  3 We conclude that the effect of at least ONE of the drugs differed from that of the others when we control for the seriousness of illness at the start of the experiment. To determine which drug(s) differ, we need to perform a comparison test such as the Scheffé test.

First, we need to "adjust" the subgroup means for the effect of the covariate. To do this, we need to calculate the value of the constant, . With this value we can calculate adjusted subgroup means. The algorithm is:  = SS Within(XY) / SS Within(X) From the analysis of variance summary table,  = (587) / (593)  = The adjustment for the covariate is:

The adjustments for the influence of the covariate are: _ Y adj = [0.99( )] = 6.72 _ Y adj = [0.99( )] = 6.82 _ Y adj = [0.99( )] = We can test the significance of difference between pairs of these subgroup means using the post hoc comparison method described earlier. Visual inspection of these adjusted means shows that children receiving Drug A and Drug B had fewer leprosy bacilli at the end of the experiment than did those children receiving Drug C, the placebo, controlling for pre-treatment illness.

Using SAS for Analysis of Variance and Covariance LIBNAME perm 'a:\'; LIBNAME library 'a:\'; OPTIONS NODATE NONUMBER PS=66; PROC GLM DATA=perm.drugtest; CLASS drug; MODEL posttest = drug; TITLE1 'One ‑ Way Analysis of Variance Example'; TITLE2; TITLE3 'PPD 404'; RUN; PROC GLM DATA=perm.drugtest; CLASS drug; MODEL posttest = drug pretest; TITLE1 'Analysis of Covariance Example'; TITLE2; TITLE3 'PPD 404'; RUN;

One ‑ Way Analysis of Variance Example PPD 404 General Linear Models Procedure Class Level Information Class Levels Values DRUG 3 A B C Number of observations in data set = 30 General Linear Models Procedure Dependent Variable: POSTTEST Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total R ‑ Square C.V. Root MSE Y Mean Source DF Type I SS Mean Square F Value Pr > F DRUG Source DF Type III SS Mean Square F Value Pr > F DRUG

Analysis of Covariance Example PPD 404 General Linear Models Procedure Class Level Information Class Levels Values DRUG 3 A B C Number of observations in data set = 30 Dependent Variable: POSTTEST Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total R ‑ Square C.V. Root MSE Y Mean Source DF Type I SS Mean Square F Value Pr > F DRUG PRETEST Source DF Type III SS Mean Square F Value Pr > F DRUG PRETEST