Download presentation
Presentation is loading. Please wait.
Published byDamian Fox Modified over 9 years ago
1
Chapter 12: Analysis of Variance
2
Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the discussion to single-factor ANOVA.
3
12.1: Introduction to the Analysis of Variance Technique Compare several means simultaneously. The analysis of variance technique allows us to test the null hypothesis that all means are equal against the alternative hypothesis that at least one mean value is different, with a specified value of .
4
Example: A study was conducted to determine if the drying time for a certain paint is affected by the type of applicator used. The data in the table below represents the drying time (in minutes) for 3 different applicators when the paint was applied to standard wallboard. Is there any evidence to suggest the type of applicator has a significant effect on the paint drying time at the 0.05 level? Note: 1.The type of applicator is a level. 2.The data values from repeated samplings are called replicates.
5
Sample Results:
6
Note: 1.The drying time is measured by the mean value. is the mean drying time for level i, i = 1, 2, 3. 2.There is a certain amount of variation among the means. 3.Some variation can be expected, even if all three population means are equal. 4.Consider the question: “Is the variation among the sample means due to chance, or it is due to the effect of applicator on drying time?” 5.You might consider a dotplot of the data to see if the graphs suggests a difference among the levels?
7
Solution: 1.The Set-up: a.Population parameter of concern: The mean at each level of the test factor. Here, the mean drying time for each applicator. b.The null and the alternative hypothesis: H 0 : 1 = 2 = 3 The mean drying time is the same for each applicator. H a : i j for some i j Not all drying time means are equal.
8
2.The Test Criteria: a.Assumptions: The data was randomly collected and all observations are independent. The effects due to chance and untested factors are assumed to be normally distributed. b.Test statistic: F test statistic (see below). c.Level of significance: = 0.05 3.The Sample Evidence: a.Sample information: Data listed in the given table. b.Calculate the value of the test statistic: The F statistic is a ratio of two variances. Separate the variance in the entire data set into two parts.
9
Partition the Total Sum of Squares: Consider the numerator of the fraction used to define the sample variance: The numerator of this fraction is called the sum of squares, or total sum of squares. Notation:
11
Calculations:
12
An ANOVA table is often used to record the sums of squares and to organize the rest of the calculations. Format for the ANOVA Table:
13
Degrees of freedom, df, associated with each of the three sources of variation: 1.df(factor): one less than the number of levels (columns), c, for which the factor is tested. df(factor) = c 1 2.df(total): one less than the total number of observations, n. df(total) = n 1 n = k 1 + k 2 + k 3 +... 3.df(error): sum of the degrees of freedom for all levels tested. Each column has k i 1 degrees of freedom. df(error) = (k 1 1) + (k 2 1) + (k 3 1) +... = n c
14
Calculations: df(factor) = df(applicator) = c 1 = 3 1 = 2 df(total) = n 1 = 19 1 = 18 df(error) = n c = 19 3 = 16 Note: The sums of squares and the degrees of freedom must check. SS(factor) + SS(error) = SS(total) df(factor) + df(error) = df(total)
15
Mean Square: The mean square for the factor being tested and for the error is obtained by dividing the sum-of-square value by the corresponding number of degrees of freedom. Calculations:
16
The Complete ANOVA Table: The Test Statistic: Numerator degrees of freedom = df(factor) Denominator degrees of freedom = df(error)
17
4.The Probability Distribution (Classical Approach): a.Critical value: F(2, 16, 0.05) = 3.63 b.F* is in the critical region. 4.The Probability Distribution (p-Value Approach): a.The p-value: Table 9: 0.025 < P < 0.05; By computer: P = 0.033 b.The p-value is smaller than the level of significance, . 5.The Results: a.Decision: Reject H 0. b.Conclusion: There is evidence to suggest the three population means are not all the same. The type of applicator has a significant effect on the paint drying time.
18
12.2: The Logic Behind ANOVA Many experiments are conducted to determine the effect that different levels of some test factor have on a response variable. Single-factor ANOVA: obtain independent random samples at each of several levels of the factor being tested. Draw a conclusion concerning the effect that the levels of the test factors have on the response variable.
19
The Logic of the Analysis of Variance Technique: 1.In order to compare the means of the levels of the test factor, a measure of the variation between the levels (columns), the MS(factor), is compared to a measure of the variation within the levels, MS(error). 2.If the MS(factor) is significantly larger than the MS(error), then the means for each of the factor levels are not all the same. This implies the factor being tested has a significant effect on the response variable. 3.If the MS(factor) is not significantly larger than the MS(error), we cannot reject the null hypothesis that all means are equal.
20
Example: Do the box-and-whisker plots below show sufficient evidence to indicate a difference in the three population means?
21
Solution: 1.The box-and-whisker plots show the relationship among the three samples. 2.The plots suggest the three sample means are different from each other. 3.This suggests the population means are different. 4.There is relatively little within-sample variation, but a relatively large amount of between-sample variation.
22
Example: Do the box-and-whisker plots below show sufficient evidence to indicate a difference in the three population means?
23
Solution: 1.The box-and-whisker plots show the relationship among the four samples. 2.The plots suggest the four sample means are not different from each other. 3.There is relatively little between-sample variation, but a relatively large amount of within-sample variation. The data values within each sample cover a relatively wide range of values.
24
Assumptions: 1.Goal: to investigate the effect of various levels of a factor on a response variable. a.We would like to know which level is most advantageous. b.Probably want to reject H 0 in favor of H a. c.A follow-up study might determine the “best” level of the factor. 2.a.The effects due to chance and due to untested factors are normally distributed. b.The variance is constant throughout the experiment. 3.a.All observations are independent. b.The data is gathered (or tests are conducted) in a randomized order to ensure independence.
25
12.3: Applications of Single- Factor ANOVA Consider the notation used in ANOVA. Each observation has two subscripts: first indicates the column number (test factor level); second identifies the replicate (row) number. The column totals: C i The grand total (sum of all x’s): T
26
Notation used in ANOVA:
27
Mathematical Model for Single-Factor ANOVA: 1. : mean value for all the data without respect to the test factor. 2.F c : effect of factor (level) c on the response variable. 3. k(c) : experiment error that occurs among the k replicates in each of the c columns.
28
Example: A study was conducted to determine the effectiveness of various drugs on post-operative pain. The purpose of the experiment was to decide if there is any difference in length of pain relief due to drug. Eighty patients with similar operations were selected at random and split into four groups. Each patient was given one of four drugs and checked regularly. The length of pain relief (in hours) was recorded for each patient. At the 0.05 level of significance, is there any evidence to reject the claim that the four drugs are equally effective? Note: 1.The data is omitted here. 2.The ANOVA table is given in a later slide.
29
Solution: 1.The Set-up: a.Population parameter of interest: The mean time of pain relief for each factor (drug). b.The null and alternative hypothesis: H 0 : 1 = 2 = 3 = 4 H a : the means are not all equal. 2.The Hypothesis Test Criteria: a.Assumptions: The patients were randomly assigned to drug and their times are independent of each other. The effects due to chance and untested factors are assumed to be normally distributed. b.Test statistic: F* with df(numerator) = df(factor) = 3 and df(denominator) = df(error) = 80 4 = 76 c.Level of significance: = 0.05
30
3.The Sample Evidence: a.Sample information: The ANOVA table: b.Calculate the value of the test statistic:
31
4.The Probability Distribution (Classical Approach): a.Critical value: F(3, 76, 0.05) 2.72 b.F* is in the critical region. 4.The Probability Distribution (p-Value Approach): a.The p-value: P = P(F* > 7.95, with df n = 3, df d = 76) < 0.01 By computer: P .0001 b.The p-value is smaller than the level of significance, . 5.The Results: a.Decision: Reject H 0. b.Conclusion: There is evidence to suggest that not all drugs have the same effect on length of pain relief.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.