Download presentation
Presentation is loading. Please wait.
Published byMalcolm Jones Modified over 8 years ago
1
Chapter 15 Analysis of Variance
2
The article “Could Mean Platelet Volume be a Predictive Marker for Acute Myocardial Infarction?” (Medical Science Monitor, 2005) described an experiment in which four groups of patients seeking treatment for chest pain were compared with respect of mean platelet volume (MPV). The purpose of the study was to determine is mean MPV was different for the four groups, in particular for the heart attack group. If so, then MPV could be used as an indicator of heart attack risk. When two or more populations or treatments are being compared, the characteristic that distinguishes the populations or treatments from one another is called the factor. Researchers need to compare the means from the four treatment groups to determine if 1 = 2 = 3 = 4 or if at least one of the means differ from the rest. In this experiment, the factor is the clinical diagnosis. The four groups were (1) noncardiac chest pain, (2) stable angina pectoris, (3) unstable angina pectoris, (4) myocardial infarction (heart attack). In order to compare the means, the researchers must use a procedure called a single factor analysis of variance or ANOVA.
3
Mean of Sample 1 Mean of Sample 3 Mean of Sample 2 Mean of Sample 1 Mean of Sample 3 Mean of Sample 2 Graph A Graph B Whether the null hypothesis (of equal means) should be rejected depends on how substantially the samples from the different populations or treatments differ from one another. Consider the following example. In Group A, notice that the three samples seem to have very different means and very little variability in each sample. This would lead us to doubt the claim that 1 = 2 = 3. The phrase “analysis of variance” comes from the idea of analyzing variability in the data to see how much can be attributed to differences in ’s and how much is due to variability in the individual populations. In Group B, notice that the three samples have the same means as Group A. However, due to the large amount of variability in each sample and the fact that the samples overlap, it is plausible that the samples could come from populations with equal means.
4
ANOVA Notation k = the number of populations or treatments being compared The total number of observations Grand Total Grand Mean
5
ANOVA Notation Continued... A measure of differences among the sample means is the treatment sum of squares, denoted by SSTr and given by A measure of variation within the k samples, called error sum of squares and denoted SSE, is Each sum of squares has an associated df: treatment df = k – 1error df = N – k A mean square is a sum of squares divided by its df. The number of error degrees of freedom comes from adding the number of degrees of freedom associated with each of the sample variances: (n 1 – 1) + (n 2 – 1) + … (n k – 1) = N - k
6
The Single Factor ANOVA F test Null hypothesis: H 0 : 1 = 2 = … = k Alternative hypothesis: H a : at least two ’s are different Test Statistic: with df 1 = k – 1 and df 2 = N – k P-value: the area under the appropriate F curve to the right of the calculated F value When H 0 is true, MSTr = MSE When H 0 is false, MSTr > MSE
7
The Single Factor ANOVA F Test Continued... Assumptions: 1.Each of the k population or treatment response distributions is normal. 2.The k normal distributions have identical standard deviations. ( 1 = 2 = … = k ) 3.The observations in the sample from any particular one of the k populations or treatments are independent of one another. 4.When comparing population means, the k random samples are selected independently of each other. When comparing treatment means, treatments are assigned at random to subjects or objects. If sample sizes are large, individual boxplots or normal probability plots for each sample can be used to check for normality. If the sample sizes are small, then a combined normal probability plot should be used to check for normality. First find the deviations from the respective mean in each sample, Then combine the deviations to create the normal probability plot While there is a formal procedure to check for equal standard deviations, its use is not recommended due to its sensitivity to any departure from normality. The ANOVA F test can safely be used if the largest sample standard deviation is not more than twice the smallest sample standard deviation.
8
Heart Attack Risk Continued... Here are the summary statistics for the four groups: H 0 : 1 = 2 = 3 = 4 H a : at least two ’s are different GroupDescription Sample size Sample mean Sample standard deviation 1Noncardiac chest pain3510.890.69 2Stable angina pectoris3511.250.74 3Unstable angina pectoris3511.370.91 4Myocardial Infarction3511.751.07 State the hypotheses. Verify assumptions. The four boxplots are approximately symmetrical with no outliers, so the assumption of normality is plausible. To verify the equality of the standard deviations, notice that the largest sample deviation (group 4) is less than twice that of the smallest standard deviation (group 1). The subjects were randomly selected from groups of individuals who had been diagnosed with the four conditions.
9
Heart Attack Risk Continued... Here is the summary statistics for the four groups: GroupDescription Sample size Sample mean Sample standard deviation 1Noncardiac chest pain3510.890.69 2Stable angina pectoris3511.250.74 3Unstable angina pectoris3511.370.91 4Myocardial Infarction3511.751.07 Calculate the sum of squares terms. Calculate the F test statistic.
10
Heart Attack Risk Continued... H 0 : 1 = 2 = 3 = 4 H a : at least two ’s are different Test Statistic: with df 1 = 3 and df 2 = 136 P-value <.001 =.05 Since the P-value < , we reject H 0. There is convincing evidence to conclude that mean MPV is not the same for all four patient populations.
11
Summarizing an ANOVA ANOVA calculations are often summarized in a tabular format called an ANOVA table. To understand such a table, we need one more sum of squares term. Total sum of squares, denoted by SSTo, is given by with df = N – 1. The relationship between the three sum of squares is: SSTo = SSTr + SSE This is the fundamental identity for single-factor ANOVA.
12
The General Format for a Single- Factor ANOVA Table Source of Variation df Sum of Squares Mean SquareFP-value Treatmentk – 1SSTr k - 1 MSTr MSE ErrorN – kSSE N - k TotalN - 1SSTo When the analysis is done by statistical software, then the P-value appears here.
13
Heart Attack Risk Continued... This is the ANOVA table for this data set. SourcedfSSMSFP-value Treatment313.1994.4005.870.000 Error136101.8880.749 Total139115.087 Now we know that at least two of the means are different – but which two? To answer the question in this study we need to know if the mean MPV for the heart attack group is the mean that is different.
14
This procedure is based on calculating confidence intervals for the difference between each possible pair of ’s. If the interval contains the value zero, then there is no significant difference between the means involved. If, however, the interval does NOT contain the value zero, then the two means are significantly different. How can we tell which of the mean(s) is/are different? We need to use a multiple comparison procedure, which is a method of identifying differences between ’s. Tukey-Kramer (T-K) Multiple Comparison Procedure What do we do now that we know that at least two of the population or treatment means are different?
15
Tukey-Kramer (T-K) Multiple Comparison Procedure When there are k populations or treatments being compare, the number of confidence intervals necessary is given by For i – j : where q is the relevant Studentized range critical value The two means are judged to differ significantly if the interval does not contain 0. If the sample sizes are the same, we can use T-K intervals are based on probability distributions called studentized range distributions.
16
Heart Attack Risk Revisited... Number of confidence intervals to compute: For 1 – 2 : GroupDescription Sample size Sample mean Sample standard deviation 1Noncardiac chest pain3510.890.69 2Stable angina pectoris3511.250.74 3Unstable angina pectoris3511.370.91 4Myocardial Infarction3511.751.07 How many confidence intervals will we need to compute? Sample sizes are the same in each treatment. This is the critical value for 95% confidence when k = 4 and df = 120 (closest df in the table to 136). This interval contains 0, so there is not a significant difference in the mean MVP between patients with noncardiac chest pain and patients with stable angina.
17
Heart Attack Risk Revisited... For95% Confidence Interval 1 – 2 (-0.898, 0.178) 1 – 3 (-1.018, 0.058) 1 – 4 (-1.398, -0.322) 2 – 3 (-0.658,.0418) 2 – 4 (-1.038, 0.038) 3 – 4 (-0.918, 0.158) The only interval that does not contain 0 is for the difference in mean MPV between patients with noncardiac chest pain and patients with heart attacks. The remaining confidence intervals are calculated in the same manner. They are...
18
Summarizing the Results of the Tukey-Kramer Procedure 1.List the sample means in increasing order, identifying population just above each x Population32145 Sample Meanx 3 x 2 x 1 x 4 x 5 2. Use the T-K intervals to determine the group of means that do not differ significantly from the first in the list. Draw a horizontal line extending from the smallest mean to the last mean in the group identified, Population32145 Sample Meanx 3 x 2 x 1 x 4 x 5 If the sample means for populations 3, 2, and 1 are not significantly different, then draw a line under them.
19
Summarizing the Results of the Tukey-Kramer Procedure 3. Use the T-K intervals to determine the group of means that are not significantly different from the second smallest in the list. If this entire group of means is not underscored, draw a horizontal line extending from the smallest mean to the last mean in the new group, Population32145 Sample Meanx 3 x 2 x 1 x 4 x 5 If the sample means for population 2 is not significantly different from 1 and 4, but is different from 5, then draw a line under 2, 1, and 4. 4. Continue considering the means in the order listed, adding new lines as needed.
20
Heart Attack Risk Revisited... For95% Confidence Interval 1 – 2 (-0.898, 0.178) 1 – 3 (-1.018, 0.058) 1 – 4 (-1.398, -0.322) 2 – 3 (-0.658, 0.418) 2 – 4 (-1.038, 0.038) 3 – 4 (-0.918, 0.158) Population:1234 Sample Mean: 10.8911.2511.3711.75 Should mean MPV be used as a predictor of heart attacks? Let’s summarize these T-K intervals. Based on these data, we have evidence that the mean MPV is not the same for the noncardiac chest pain group and the heart attack group. But since the difference in means is small compared to the variability among the individuals in each group, it would still be difficult to distinguish the two groups based on an individual MPV value. And we don’t have evidence that the mean is different for the heart attack group and the two angina groups. So, MPV is probably not useful as a predictor of heart attack.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.