Class 23: Thursday, Dec. 2nd Today: One-way analysis of variance, multiple comparisons. Next week: Two-way analysis of variance. I will the final homework, Homework 9, to you this weekend. All of the final project ideas look good. I have ed some of you my comments already and will the rest of you my comments by tomorrow. Schedule: –Thurs., Dec. 9 th – Final class –Mon., Dec. 13 th (5 pm) – Preliminary results from final project due –Tues., Dec. 14 th (5 pm) – Homework 9 due –Tues., Dec. 21 st (Noon) – Final project due.
Individual vs. Familywise Error Rate When several tests are considered simultaneously, they constitute a family of tests. Individual Type I error rate: Probability for a single test that the null hypothesis will be rejected assuming that the null hypothesis is true. Familywise Type I error rate: Probability for a family of test that at least one null hypothesis will be rejected assuming that all of the null hypotheses are true. When we consider a family of tests, we want to make the familywise error rate small, say 0.05, to protect against falsely rejecting a null hypothesis.
Why Control the Familywise error rate: Five children in a particular school got leukemia last year? Is that a coincidence or does the clustering of cases suggest the presence of an environmental toxin that caused the disease? Individual Type I error rate: Calculate the probability that five children at this particular school would all get leukemia this particular year. If this is small, say smaller than 0.05, become alarmed. Familywise Type I error rate: Calculate the probabilty that five children in any school would develop the same severe disease in the same year. If this is small, say smaller than 0.05, become alarmed. If we control the individual type I error rate, then we will locate many disease “clusters” that are not caused by an environmental toxin but are just coincidences.
Bonferroni Method General method for doing multiple comparisons for any family of k tests. Denote familywise type I error rate we want by p*, say p*=0.05. Compute p-values for each individual test -- Reject null hypothesis for ith test if Guarantees that familywise type I error rate is at most p*. Why Bonferroni works: If we do k tests and all null hypotheses are true, then using Bonferroni with p*=0.05, we have probability 0.05/k to make a Type I error for each test and expect to make k*(0.05/k)=0.05 errors in total.
Multiplicity A news report says, “A 15 year study of more than 45,000 Swedish solidiers revealed that heavy users of marijuana were six times more likely than nonusers to develop schizophrenia.” Were the investigators only looking for difference in schizophrenia among heavy/non-heavy users of marijuana? Key question: What is their family of tests? If they were actually looking for a difference among 100 outcomes (e.g., blood pressure, lung cancer), Bonferroni should be used to control the familywise Type I error rate, i.e., only consider a difference significant if p-value is less than.05/100= The best way to deal with the multiple comparisons problem is to design a study to search specifically for a pattern that was suggested by an exploratory data analysis. Then there is only one comparison.
Bonferroni method on Milgram’s data If we want to test whether each of the four groups has a mean different from the mean of all four groups, we have four tests. Bonferroni method: Check whether p-value of each test is <0.05/4= There is strong evidence that the remote group has a mean higher than the mean of the four groups and the touch-proximity group has a mean lower than the mean of the four groups.
Multiple Comparison Simulation In multiplecomp.JMP, 50 groups are compared with sample sizes of ten for each group. The observations for each group are simulated from a standard normal distribution. Thus, in fact, Bonferroni approach to deciding which groups have means different than average: Reject null hypothesis that a group’s mean is the average mean of all groups only if the p- value for the t-test is.05/50=.001.
Multiple Comparison Simulation Iteration12345 # of Groups with p- value < 0.05 # of Groups with p- value <.0025
Pairwise Comparisons We are interested not just in what groups have means that are different than the average mean, but in pairwise comparisons between the groups. For a pairwise comparison between group i and group j, we want to test the null hypothesis that group i and group j have the same means versus the alternative that group i and group j have different means, i.e., vs.
Pairwise Comparisons Cont. For Milgram’s obedience data, there are six pairwise comparisons: (1) Proximity vs. Remote; (2) Proximity vs. Touch- Proximity; (3) Proximity vs. Voice-Feedback; (4) Remote vs. Touch-Proximity; (5) Remote vs. Voice-Feedback; (6) Touch-Proximity vs. Voice-Feedback Multiple comparisons situation with a family of six tests. We want to control the familywise error rate at.05 rather than the individual type I error rate. Could use Bonferroni to do this but there is a method called Tukey’s HSD (stands for “Honest Significant Differences”) that is specially designed to control the familywise type I error rate for pairwise comparisons in ANOVA.
Comparisons between groups that are in red are groups for which the null hypothesis that the group means are the same is rejected using the Tukey HSD procedure, which controls the familywise Type I error rate at A confidence interval for the difference in group means that adjusts for multiple comparisons is shown in the third and fourth lines.
More on Tukey’s HSD Using Tukey’s HSD, the pairs for which there is strong evidence of a difference in means adjusting for multiple comparisons are remote is higher than proximity, remote is higher than touch proximity and voice feedback is higher than touch proximity. For confidence intervals for differences in the means of each pair of groups, if we use the usual confidence intervals, there is a good chance that at least one of the intervals will not contain the true difference in means between the groups. When making a family of confidence intervals, we want confidence intervals that have a 95% chance of all intervals in the family containing their true values. The confidence intervals produced by the Tukey HSD procedure have this property. 95% confidence interval for difference in mean of remote group vs. mean of proximity group using Tukey’s HSD: (26.40, ). 95% confidence interval for difference in mean of remote group vs. mean of proximity group assuming that this is the only confidence interval being formed (family of one confidence interval): (42.34, ). Tukey’s HSD confidence interval is wider because in order for a family of CIs to each contain their true value when multiple CIs are formed, each CI must be wider than it would be if only one CI was being formed.
Tukey HSD in JMP Use Analyze, Fit Model to do the analysis of variance by making the X variable the categorical variable denoting the group. After Fit Model, click red triangle next to group variable (Condition in the Milgram study) and click LS Means Differences Tukey HSD. Clicking LS Means Differences Student’s t gives CIs that do not adjust for multiple comparisons.
Assumptions in one-way ANOVA Assumptions needed for validity of one- way analysis of variance p-values and CIs: –Linearity: automatically satisfied. –Constant variance: Spread within each group is the same. –Normality: Distribution within each group is normally distributed. –Independence: Sample consists of independent observations.
Rule of thumb for checking constant variance Constant variance: Look at standard deviation of different groups by using Fit Y by X and clicking Means and Std Dev. Check whether (highest group standard deviation/lowest group standard deviation)^2 is greater than 3. If greater than 3, then constant variance is not reasonable and transformation should be considered.. If less than 3, then constant variance is reasonable. (Highest group standard deviation/lowest group standard deviation)^2 =( /63.640)^2=4.29. Thus, constant variance is not reasonable for Milgram’s data.
Transformations to correct for nonconstant variance If standard deviation is highest for high groups with high means, try transforming Y to log Y or. If standard deviation is highest for groups with low means, try transforming Y to Y 2. SD is particularly low for group with highest mean. Try transforming to Y 2. To make the transformation, right click in new column, click New Column and then right click again in the created column and click Formula and enter the appropriate formula for the transformation.
Transformation of Milgram’s data to Squared Voltage Level Check of constant variance for transformed data: (Highest group standard deviation/lowest group standard deviation)^2 = Constant variance assumption is reasonable for voltage squared. Analysis of variance tests are approximately valid for voltage squared data; reanalyzed data using voltage squared.
Analysis using Voltage Squared Strong evidence that the group mean voltage squared levels are not all the same. Strong evidence that remote has higher mean voltage squared level than proximity and touch-proximity and that voice-feedback has higher mean voltage squared level than touch-proximity, taking into account the multiple comparisons.
Rule of Thumb for Checking Normality in ANOVA The normality assumption for ANOVA is that the distribution in each group is normal. Can be checked by looking at the boxplot, histogram and normal quantile plot for each group. If there are more than 30 observations in each group, then the normality assumption is not important; ANOVA p-values and CIs will still be approximately valid even for nonnormal data if there are more than 30 observations in each group. If there are less than 30 observations per group, then we can check normality by clicking Analyze, Distribution and then putting the Y variable in the Y, Columns box and the categorical variable denoting the group in the By box. We can then create normal quantile plots for each group and check that for each group, the points in the normal quantile plot are in the confidence bands. If there is nonnormality, we can try to use a transformation such as log Y and see if the transformed data is approximately normally distributed in each group.
One way Analysis of Variance: Steps in Analysis 1.Check assumptions (constant variance, normality, independence). If constant variance is violated, try transformations. 2.Use the effect test (commonly called the F- test) to test whether all group means are the same. 3.If it is found that at least two group means differ from the effect test, use Tukey’s HSD procedure to investigate which groups are different, taking into account the fact multiple comparisons are being done.
Example: Discrimination against the Handicapped Study of how physical handicaps affect people’s perception of employment qualifications. Researchers prepared five videotaped job interviews, using same two male actors for each. Tapes differed only in that applicant appeared with a different handicap in each– (i) wheelchair; (ii) on crutches; (iii) hearing impaired; (iv) one leg amputated; (v) no handicap. Each tape shown to 14 students from U.S. university. Students rate qualifications of candidate on 0 to 10 point scale based on tape. Questions of interest: Do subjects systematically evaluate qualifications differently according to candidate’s handicap? If so, which handicaps produce different evaluations?
Checking Assumptions Constant variance is reasonable – (Largest standard deviation/smallest standard deviation)^2=(1.79/1.48)^2=1.46. There are less than 30 observations per group so we need to check normality but a check of the normal quantile plot for each group indicates that normality is OK.
Do all videotapes have the same mean? Test of H_0: Mean of all five videotapes is the same vs. H_A: At least two of the videotapes have different means has p-value Evidence that there is some difference in the means of the videotapes.
How do the videotapes compare? The only conclusion we can make about how the videotapes compare, taking account of the fact that we are making multiple comparisons, is that Crutches has a higher mean than Hearing.