McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly
13-2 Chapter 13 Learning Objectives (LOs) LO 13.1: Provide a conceptual overview of ANOVA. LO 13.2: Conduct and evaluate hypothesis tests based on one-way ANOVA. LO 13.3: Use Tukey’s HSD method in order to determine which means differ. LO 13.4: Conduct and evaluate hypothesis tests based on two-way ANOVA with no interaction. LO 13.5: Conduct and evaluate hypothesis tests based on two-way ANOVA with interaction.
13-3 Chapter Case - How Much Does Using Public Transportation Save? Environmental and economic concerns have led to an upswing in the use of public transportation. Research analyst Sean Cox looked at study results from a Boston Globe article that claimed commuters there topped the nation in cost savings from public transportation. He wants to know if the average savings significantly differ among these cities.
13-4 How Much Does Using Public Transportation Save?
One-Way ANOVA Analysis of Variance (ANOVA) is used to determine if there are differences among three or more populations means (µ). One-way ANOVA compares population means based on one categorical variable (City in the chapter case) called treatment. We utilize a completely randomized design, comparing sample means computed for each treatment to test whether the population means differ. LO 13.1 Provide a conceptual overview of ANOVA.
13-6 ANOVA Assumptions The assumptions are extensions of those we used when comparing just two populations: 1. The populations are normally distributed. 2. The population standard deviations are unknown but assumed equal. 3. Samples are selected independently from each population. Here we compare a total of c (more than 2) populations, rather than just two. LO 13.1
Steps of HT 1. State hypotheses (H 0 and H A ) 2. Find test statistic (F in this case) 3. Find critical or p-value (We get them from the Excel output) 4. Conclusion and interpretation
13-8 The Hypothesis Test Step 1: The competing hypotheses for the one-way ANOVA: H 0 : µ 1 = µ 2 = µ 3 = µ 4 (different treatments have no different effect on population means) H A : Not all population means are equal (different treatments have different effect on population means) We have only this pair of hypotheses. LO 13.2 Conduct and evaluate hypothesis tests based on one-way ANOVA.
13-9 The ANOVA Concept The competing hypotheses are displayed graphically below: The left graph depicts the null hypothesis, where all sample means are drawn from the same distribution. On the right, the distributions, and population means, differ. LO 13.2 Sample means are close enough to one another population means are same.
13-10 The ANOVA Concept LO 13.2 The rationale is 1.If sample means are close enough to one another, we guess that population means are same. 2.Then how close the sample means should be? 3.They should be close enough so that the variance of the sample means is small enough. 4.How small the variance should be? It is determined by F critical value in the F distribution. 5.So we convert the variance of sample means into F test statistic. 6.F value the ratio of once variance to another variance. 7.So we find another variance and divide the variance of sample means by another variance. 8.This is the F test statistic.
13-11 The ANOVA Concept LO 13.2 The rationale is 9. The F test statistic should be small enough (smaller than the critical value) for us to conclude that population means are same. If F test statistic < F critical value, then we do not reject H 0. If F test statistic > F critical value, then we reject H Small enough F test statistic also means p –value (P(F >F test statistic) is greater than level of significance (α) If p-value > α, then we do not reject H 0. If p-value < α, then we reject H 0.
13-12 Chapter Case Steps 2 and 3: Generate Excel table This ANOVA table shows F test statistic and critical value, but we will use the p-value, which is 7.96/ It is practically 0. Please see Public Transportation in Excel Demonstration in D2L to see how to generate this table. The data is posted in Excel Data so you can practice it. ANOVA Source of VariationSSdfMSFP-valueF crit Between Groups E Within Groups Total
13-13 Chapter Case Step 4: F test statistic ( ) > F critical value (3.0984), so we reject H 0. Or p-value (0) < α (0.05). So we reject H 0. It means that there is significance difference among the average cost savings in 4 cities.
Multiple Comparison Methods When the one-way ANOVA finds significant differences between the population means (as in the chapter case), it is natural to ask which means differ. In this section we show two techniques for performing this follow-up analysis: Fisher’s Least Difference Method (Not very efficient) Tukey’s Honestly Significant Differences Method LO 13.3 Use confidence intervals and Tukey’s HSD method in order to determine which means differ.
13-15 The Tukey HSD Procedure Tukey’s method generates confidence intervals (CI) of the form: If CI does not include 0, there is difference between 2 population means (positive or negative) The q-statistic comes from the studentized range distribution with c and (n T – c) degrees of freedom. LO 13.3
13-16 The Tukey HSD Procedure Is there difference between Boston and Chicago? Boston sample mean = Chicago sample mean = With α = 0.05, c = 4, n T – c = 24 – 4 = 20, q = 3.96 MSE = 7209 n Boston = 5 n Chicago = 5 LO 13.3
13-17 The Tukey HSD Procedure confidence interval (CI) for the difference between Boston and Chicago = ( ) ± /2 (1/5+1/5) = 1892 ± = ( , ) CI does not include 0, so there is difference between Boston and Chicago (Boston > Chicago). You can repeat this procedure for all pairs of cities. LO 13.3
13-18 Excel Output of the Chapter Case Anova: Single Factor SUMMARY GroupsCountSumAverageVariance Boston NY SF Chicago ANOVA Source of VariationSSdfMSFP-valueF crit Between Groups E Within Groups Total MSE c-1 n T - c
13-19 Finding q
13-20 Homework 1. Problem 8 on page 394. To answer a, Download “Detergent” data from s:\jslee\QM3620\Data and run Excel to generate the ANOVA table. 2. Problem 14 on page 400. Answers are on pp in the appendix.
Two-Way ANOVA with No Interaction - Example 13.4 She interviews four workers in three fields and determines their salaries. This is one-way ANOVA Data are not reliable
13-22 Two-Way ANOVA Example If the education level of the 12 workers is considered, a different story emerges. The data is posted as “Two_Factor_Income” in Excel Data. It is clear that education also impacts wage. Se we collect more reliable data this way Educational ServicesFinancial ServicesMedical Services High School Bachelor’s Master’s Ph.D
13-23 The Randomized Block Design This type of two-way ANOVA is called a randomized block design. The term “block” refers to a matched set of observations across the treatments. In the salary example, the treatments are the three fields of employment. The blocks are the education levels. Until we account for them, we cannot capture the employment field effects.
13-24 ANOVA for the Salary Example LO 13.4 H 0 : µ 1 = µ 2 = µ 3 (Average salaries in 3 fields are same) H 1 : Not all the average salaries are same
13-25 ANOVA for the Salary Example LO 13.4 ANOVA Source of VariationSSdfMSFP-valueF crit Rows E Columns Error Total See “Two Factor Income” In Excel Demonstration in D2L. You can also find the date in Excel Data.
13-26 Interpreting the Results Steps 2, 3 & 4 F test statistic of column (56.58) > f critical value (4.76). Also p-value of column is about 0.03 (< 0.05). So, we reject H 0. After controlling for educational level, the average salaries differ. Salary also differs by educational level, as indicated by the very small p-value. LO 13.4
13-27 ANOVA for the Salary Example LO 13.4 Hypothesis are always: H 0 : Same means (There is no effect of the factor) H A : Different means (There is effect of the factor)
13-28 Homework Problem 30 on p Do parts a, b and c. The data, “House” is posted on S:\jslee\QM3620\Data. You should download this file and run Excel to generate ANOVA table. Answers are on p. 679 in the Appendix.
: Two-Way ANOVA with Interaction Now we will look at data categorized by two factors, but with two or more values observed in each “cell.” We want to know three effects: Effect of column factor Effect of row factor Interaction effect with three p-values (You can use F test statistics and F critical values, but using p-values is simpler) LO 13.5 Conduct and evaluate hypothesis tests based on two-way ANOVA with interaction.
13-30 What is Interaction? Interaction means that the effect of one factor depends on the level of the other factor. For example, perhaps education impacts salaries in the financial sector, but not in professional sports. The two categories, employment sector and education, interact differently depending on the sector. LO 13.5
13-31 Another Salary Example Each interior cell entry represents the average salary of 3 employees who fit the categories. By choosing Data > Data Analysis > ANOVA: Two Factor With Replication, we are able to obtain Excel output. LO 13.5
13-32 Two-way ANOVA with Interaction LO 13.5 Data and demonstration are in Excel Data and Excel Demonstration. The are called “Two_Factor_Income_with_Interaction.”
13-33 Interpreting the Output We could do all the steps of HT in a simple way. If p-value is smaller than level of significance (α), then we conclude the relevant factor has effect. LO 13.5
13-34 Interpreting the Output P-value for row factor (Education level) = 0 < α =0.5, reject H 0, meaning there is effect. P-value for column factor (Field) = 0 < α =0.5, reject H 0, meaning there is effect. P-value for Interaction (Education level * Field) = < α =0.5, reject H 0, meaning there is effect. LO 13.5
13-35 Homework Problem 38 on p Data called “Job Satisfaction” is posted on s:\jslee\qm3620\Data. Answers are on p. 679.