Download presentation
Presentation is loading. Please wait.
Published byMelvin Merritt Modified over 6 years ago
1
ANOVA: Multiple Comparisons & Analysis of Variance
2
One population, Two population, ...
Previously... Inference (confidence intervals, hypothesis testing) for mean for one group/one population Inference (confidence intervals, hypothesis testing) to compare the means of two groups/two populations To review... briefly look at a few of those one and two- mean inference procedures/situations
3
Ho: μ = 1 Ha: μ > 1 where μ = mean heat conductivity transmitted per square meter of surface per degree Celsius difference on the two sides of the glass Is there evidence that the conductivity of this type of glass is greater than 1? Carry out an appropriate test.
4
Does logging significantly change the mean number of species in a plot after 8 years? Give appropriate statistical evidence to support your conclusion. Assume both populations are Normally distributed. We want to test Ho: μU = μL OR μU – μL = 0 Ha: μU ≠ μL OR μU – μL ≠ 0 where μU & μL are the mean number of species in unlogged and logged plots, respectfully ;
5
Is there good evidence that red wine drinkers’ mean polyphenol levels were different from white wine drinkers’ mean polyphenol levels? Assume both populations are approximately Normal. We want to test: Ho: μR = μW or μR – μW = 0 Ha: μR ≠ μW or μR – μW ≠ 0 where μR & μW are the mean percent change in polyphenols for men who drink red and white wine, respectfully.
6
Nothing magical about the numbers one or two...
Sometimes there is a need to compare three, four, five, or more groups with each other. ANOVA (Analysis of Variance) is a method for doing that; tests whether there is an association between a categorical variable that identifies different groups, and a numerical variable. The phrase “Analysis of Variance” can be misleading; the procedure really looks at means/compares means.
7
Go to Math 140 data... Copy and paste GPA & Favorite Social Media data into StatCrunch “Clean up” data Create side-by-side box plots (graph, box plots, select ‘overall college GPA data,’ then group by ‘favorite social media data;’ check boxes ‘use fences,’ ‘draw boxes horizontally,’ & markers mean;’ compute
8
Go to Math 140 data... Is there a difference in mean GPA for Twitter users vs. Snapchap users vs. Instagram users vs. Facebook users vs. other? Is there a significant difference? OR Is the mean difference just due to sampling variability? Compare means; compare spreads
9
More math 140 data... Let’s look at ‘age in years’ & ‘transportation used to get to COC’ data. Is there a difference in mean ages among the seven different categories? Is the difference just due to sampling variability or is there truly a difference in mean ages among the seven different types of transportation? Compare means; compare spreads
10
ANOVA is for... Comparing 3 or 4 or 5 or more groups to each other
If we just have 2 groups to compare to each other, like comparing mean GPA’s and genders, we can use 2- sample t-test Compare male mean GPA’s to female mean GPA’s; that’s what 2-sample t-tests are meant to do ANOVA is for comparing multiple groups, like mean GPA among Twitter users vs. Facebook users vs. Instagram users vs. etc.
11
We could... Ho: Twitter User GPA = Facebook User GPA Ho: Snapchat User GPA = Facebook User GPA Ha: Twitter User GPA ≠ Facebook User GPA Ha: Snapchat User GPA ≠ Facebook User GPA Ho: Twitter User GPA = Instagram User GPA Ho: Snapchat User GPA = Instagram User GPA Ha: Twitter User GPA ≠ Instagram User GPA Ha: Snapchat User GPA ≠ Instagram User GPA Ho: Twitter User GPA = Other User GPA Ho: Snapchat User GPA = OtherUser GPA Ha: Twitter User GPA ≠ Other User GPA Ha: Snapchat User GPA ≠ Other User GPA Ho: Twitter User GPA = Snapchat User GPA Ho: Other User GPA = Facebook User GPA Ha: Twitter User GPA ≠ Snapchat User GPA Ha: Other User GPA ≠ Facebook User GPA And two more I can’t fit on here ... Other/Facebook & Instagram/Facebook... Ten different hypothesis tests... This is called multiple comparison... Comparing multiple pairs of means Three separate tests...
12
Remember α... Rejection zone (when conducting an hypothesis test); significance level; usually 5% (0.05) α is also the probability of committing a type I error (rejecting the null hypothesis when it really is true) Basic problem with multiple comparisons is that even though the probability of something going wrong (making an incorrect decision; committing an error) on one occasion (comparing 2 things only) is small (5%), if we keep repeating the experiment, eventually something will go wrong.
13
Big chances to make big mistakes...
Essentially, by doing multiple tests, we are creating more opportunities to mistakenly reject the null hypothesis. The more tests we do, the greater the probability that we will mistakenly reject the null hypothesis at least once. For our ten hypothesis tests, each with α= 0.05, the overall significance level (or probability that we conclude that at least one mean is different from another, when the truth is that all means are equal is about 40%! Yikes!
14
So, anova to the rescue... ANOVA tests whether a categorical variable is associated with a numerical variable. This is the same as testing whether the mean value of a numerical variable is different in different groups ANOVA looks at the variation within each group and between all groups; then creates a ratio comparing these numbers called the F-statistic F =
15
ANOVA looks at variation within & between
Look at variation within each group Look at variation between all groups
16
ANOVA looks at variation within & between
Look at variation within each group Look at variation between all groups
17
ANOVA looks at variation within & between
Look at variation within each group Look at variation between all groups
18
Like all other procedures...
We have conditions that must be checked and met Random Sample & Independent Measurements Independent Groups Same Variance
19
Let’s do an example of anova... With our gpa & favorite social media data...
Test the hypothesis that COC students with different favorite social media have different GPA (i.e, do students have higher (or lower) GPA’s depending on their favorite social media?). Assume all conditions have been checked and met. Ho: μTwitter = μSnapchat = μOther = μInstagram = μFacebook Ha: At least one population mean is different where μfavorite social mediais the true, unknown population mean (all COC students’ GPA whose favorite social media is indicated) StatCrunch, stat, ANOVA, one-way, values in a single column, response overall GPA, factors social media, compute
20
Let’s do an example of anova... With our gpa & favorite social media data...
Ho: μTwitter = μSnapchat = μOther = μInstagram = μFacebook Ha: At least one population mean is different Fail to reject Ho. With a p-value of almost 0.80 and an alpha level of 5%, we do not have sufficient evidence to conclude that at least one population mean is different (i.e., we do not have enough evidence to conclude that all COC who have different favorite social media have different GPAs.
21
Study hours by major... Three independent random samples of full-time college students were asked how many hours per week they studied outside of class. Their responses and their majors are shown in the excel spread sheet found on my website (data sets). Test the hypothesis that the mean number of hours studying varies by major. Assume all conditions have been checked and met. Ho: μMath = μ Social Science = μEnglish Ha: At least one population mean is different. Where μmajor is the true, unknown population mean study time for all full-time college students within the given major SS: Sum of Squares (total amount of variation) Total: Sum of treatment (explained which is variation between and error (unexplained which is variation within) MS: SS / df F-Stat: ratio between MS between & MS within
22
Study hours by major... Ho: μMath = μ Social Science = μEnglish Ha: At least one population mean is different. Cut/paste data into StatCrunch. Stat, ANOVA, one-way, select columns, Math, Social Science, English, compute. SS: Sum of Squares (total amount of variation) Total: Sum of treatment (explained which is variation between and error (unexplained which is variation within) MS: SS / df F-Stat: ratio between MS between & MS within
23
Study hours by major... Ho: μMath = μ Social Science = μEnglish Ha: At least one population mean is different. Reject Ho. With a p-value of almost 0 and an alpha level of 5%, we have sufficient evidence to conclude that at least one population mean is different (i.e., we have enough evidence to conclude that the mean number of hours studying varies by major for all full-time college students. SS: Sum of Squares (total amount of variation) Total: Sum of treatment (explained which is variation between and error (unexplained which is variation within) MS: SS / df F-Stat: ratio between MS between & MS within
24
Your turn to choose some data...
With a partner, go to Math 140 data Choose one numeric set of data and one categorical set of data (that has more than 2 categories... so you wouldn’t choose gender, or a yes/no set of data; the categorical set must have at least 3 options in it); choose the two sets of data that you believe may have a relationship Example: I though that favorite social media used might be related to GPA, i.e, Twitter is your favorite? I think you will have a high GPA; Instagram your favorite? I think you will have a lower GPA.
25
Your turn to choose some data...
Go through your data and ‘clean it up’ as it might be ‘messy;’ justify any/all ‘cleaning’ you do State your null and alternative hypotheses; define parameters Assume all conditions have been checked and met Run the ANOVA procedure Provide a complete interpretation Questions? Refer to the examples worked in these notes.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.