Keller: Stats for Mgmt & Econ, 7th Ed Analysis of Variance

Slides:



Advertisements
Similar presentations
Ch 14 實習(2).
Advertisements

Chapter 11 Analysis of Variance
Lecture 15 Two-Factor Analysis of Variance (Chapter 15.5)
Design of Experiments and Analysis of Variance
Statistics for Managers Using Microsoft® Excel 5th Edition
Analysis of Variance Chapter Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
Chapter 11 Analysis of Variance
Statistics for Business and Economics
Chapter 3 Analysis of Variance
Statistics for Managers Using Microsoft® Excel 5th Edition
Analysis of Variance Chapter Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
Lecture 14 Analysis of Variance Experimental Designs (Chapter 15.3)
Lecture 13 Multiple comparisons for one-way ANOVA (Chapter 15.7)
Analysis of Variance Chapter 15 - continued Two-Factor Analysis of Variance - Example 15.3 –Suppose in Example 15.1, two factors are to be examined:
Lecture 10 Inference about the difference between population proportions (Chapter 13.6) One-way analysis of variance (Chapter 15.2)
8. ANALYSIS OF VARIANCE 8.1 Elements of a Designed Experiment
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Go to Table of ContentTable of Content Analysis of Variance: Randomized Blocks Farrokh Alemi Ph.D. Kashif Haqqi M.D.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
Chap 10-1 Analysis of Variance. Chap 10-2 Overview Analysis of Variance (ANOVA) F-test Tukey- Kramer test One-Way ANOVA Two-Way ANOVA Interaction Effects.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 11-1 Chapter 11 Analysis of Variance Statistics for Managers using Microsoft Excel.
Chapter 12: Analysis of Variance
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
QNT 531 Advanced Problems in Statistics and Research Methods
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
© 2003 Prentice-Hall, Inc.Chap 11-1 Analysis of Variance IE 340/440 PROCESS IMPROVEMENT THROUGH PLANNED EXPERIMENTATION Dr. Xueping Li University of Tennessee.
Analysis of Variance Chapter 12 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
Analysis of Variance ( ANOVA )
© 2002 Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 9 Analysis of Variance.
Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested.
Economics 173 Business Statistics Lectures 9 & 10 Summer, 2001 Professor J. Petry.
CHAPTER 12 Analysis of Variance Tests
Chapter 10 Analysis of Variance.
Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
1 Analysis of Variance Chapter 14 2 Introduction Analysis of variance helps compare two or more populations of quantitative data. Specifically, we are.
ANALYSIS OF VARIANCE (ANOVA) BCT 2053 CHAPTER 5. CONTENT 5.1 Introduction to ANOVA 5.2 One-Way ANOVA 5.3 Two-Way ANOVA.
Copyright © 2009 Cengage Learning 14.1 Chapter 14 Analysis of Variance.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 16 Chi-Squared Tests.
Lecture 9-1 Analysis of Variance
Chapter 10: Analysis of Variance: Comparing More Than Two Means.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Chapter 4 Analysis of Variance
Chap 11-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 11 Analysis of Variance.
DSCI 346 Yamasaki Lecture 4 ANalysis Of Variance.
Analysis of Variance l Chapter 8 l 8.1 One way ANOVA
Chapter 11 Analysis of Variance
CHAPTER 4 Analysis of Variance (ANOVA)
Chapter 10 Two-Sample Tests and One-Way ANOVA.
Statistics for Managers Using Microsoft Excel 3rd Edition
Two-Way Analysis of Variance Chapter 11.
Factorial Experiments
ANOVA Econ201 HSTS212.
i) Two way ANOVA without replication
Applied Business Statistics, 7th ed. by Ken Black
Comparing Three or More Means
Chapter 10: Analysis of Variance: Comparing More Than Two Means
Post Hoc Tests on One-Way ANOVA
Post Hoc Tests on One-Way ANOVA
Statistics for Business and Economics (13e)
Chapter 10 Two-Sample Tests and One-Way ANOVA.
Chapter 11 Analysis of Variance
Chapter 14: Analysis of Variance One-way ANOVA Lecture 8
One-Way Analysis of Variance
Keller: Stats for Mgmt & Econ, 7th Ed Analysis of Variance
Chapter 15 Analysis of Variance
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Keller: Stats for Mgmt & Econ, 7th Ed Analysis of Variance May 8, 2018 Chapter 15 Analysis of Variance Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations of interval data. Analysis of variance is:  an extremely powerful and widely used procedure.  a procedure which determines whether differences exist between population means.  a procedure which works by analyzing sample variance.

One-Way Analysis of Variance… Independent samples are drawn from k populations: Note: These populations are referred to as treatments. It is not a requirement that n1 = n2 = … = nk.

One Way Analysis of Variance… New Terminology: x is the response variable, and its values are responses. xij refers to the ith observation in the jth sample. E.g. x35 is the third observation of the fifth sample. The grand mean, , is the mean of all the observations, i.e.: (n = n1 + n2 + … + nk)

One Way Analysis of Variance… More New Terminology: The unit that we measure is the experimental unit. Population classification criterion is called a factor. Each population is a factor level.

Example 15-1… An apple juice company has a new product featuring… more convenience, similar or better quality, and lower price when compared with existing juice products. Which factor should an advertising campaign focus on? Before going national, test markets are set-up in three cities, each with its own campaign, and data is recorded… Do differences is sales exist between the test markets?

comma added for clarity Example 15.1… Terminology x is the response variable, and its values are responses. weekly sales is the response variable; the actual sales figures are the responses in this example. xij refers to the ith observation in the jth sample. E.g. x42 is the fourth week’s sales in city #2: 717 pkgs. x20, 3 is the last week of sales for city #3: 532 pkgs. comma added for clarity

Example 15.1… The unit that we measure is the experimental unit. Terminology The unit that we measure is the experimental unit. Weeks in the three cities when we recorded sales. Population classification criterion is called a factor. The advertising strategy is the factor we’re interested in. This is the only factor under consideration (hence the term “one way” analysis of variance). Each population is a factor level. In this example, there are three factor levels: convenience, quality, and price.

Example 15.1… The null hypothesis in this case is: H0: IDENTIFY The null hypothesis in this case is: H0: i.e. there are no differences between population means. Our alternative hypothesis becomes: H1: at least two means differ OK. Now we need some test statistics…

sum across k treatments Test Statistics… Since is of interest to us, a statistic that measures the proximity of the sample means to each other would also be of interest. Such a statistic exists, and is called the between-treatments variation. It is denoted SST, short for “sum of squares for treatments”. Its is calculated as: grand mean sum across k treatments A large SST indicates large variation between sample means which supports H1.

Test Statistics… SST gave us the between-treatments variation. A second statistic, SSE (Sum of Squares for Error) measures the within-treatments variation. SSE is given by: or: In the second formulation, it is easier to see that it provides a measure of the amount of variation we can expect from the random variable we’ve observed.

Example 15.1… Since: If it were the case that: COMPUTE Since: If it were the case that: then SST = 0 and our null hypothesis, H0: would be supported. More generally, a “small value” of SST supports the null hypothesis. The question is, how small is “small enough”?

Example 15.1… COMPUTE The following sample statistics and grand mean were computed… Hence, the between-treatments variation, sum of squares for treatments, is: Is SST = 57,512.23 “large enough” to indicate the population means differ?

Example 15.1… We calculate the sample variances as: COMPUTE We calculate the sample variances as: and from these, calculate the within-treatments variation (sum of squares for error) as: We still need a couple more quantities in order to relate SST and SSE together in a meaningful way…

Mean Squares… The mean square for treatments (MST) is given by: The mean square for errors (MSE) is given by: And the test statistic: is F-distributed with k–1 and n–k degrees of freedom. Aha! We must be close…

Example 15.1… COMPUTE We can calculate the mean squares treatment and mean squares error quantities as: Giving us our F-statistic of: Does F = 3.23 fall into a rejection region or not? How does it compare to a critical value of F? Note these required conditions: 1. The populations tested are normally distributed. 2. The variances of all the populations tested are equal.

Example 15.1… INTERPRET Since the purpose of calculating the F-statistic is to determine whether the value of SST is large enough to reject the null hypothesis, if SST is large, F will be large. Hence our rejection region is: Our value for FCritical is:

Example 15.1… INTERPRET Since F = 3.23 is greater than FCritical = 3.15, we reject the null hypothesis (H0: ) in favor of the alternative hypothesis (H1: at least two population means differ). That is: there is enough evidence to infer that the mean weekly sales differ between the three cities. Stated another way: we are quite confident that the strategy used to advertise the product will produce different sales figures.

Summary of Techniques (so far)…

ANOVA Table… The results of analysis of variance are usually reported in an ANOVA table… Source of Variation degrees of freedom Sum of Squares Mean Square Treatments k–1 SST MST=SST/(k–1) Error n–k SSE MSE=SSE/(n–k) Total n–1 SS(Total) F-stat=MST/MSE

Example 15.1… Using Excel: Tools, Data Analysis…, Anova: Single Factor COMPUTE Using Excel: Tools, Data Analysis…, Anova: Single Factor We produce the following output… compare “SST” “SSE” p-value vs. 0.05…

Identifying Factors… Factors that Identify the One-Way Analysis of Variance:

Analysis of Variance Experimental Designs Experimental design is one of the factors that determines which technique we use. In the previous example we compared three populations on the basis of one factor – advertising strategy. One-way analysis of variance is only one of many different experimental designs of the analysis of variance.

Analysis of Variance Experimental Designs A multifactor experiment is one where there are two or more factors that define the treatments. For example, if instead of just varying the advertising strategy for our new apple juice product we also varied the advertising medium (e.g. television or newspaper), then we have a two-factor analysis of variance situation. The first factor, advertising strategy, still has three levels (convenience, quality, and price) while the second factor, advertising medium, has two levels (TV or print).

Independent Samples and Blocks Similar to the ‘matched pairs experiment’, a randomized block design experiment reduces the variation within the samples, making it easier to detect differences between populations. The term block refers to a matched group of observations from each population. We can also perform a blocked experiment by using the same subject for each treatment in a “repeated measures” experiment.

Independent Samples and Blocks The randomized block experiment is also called the two-way analysis of variance, not to be confused with the two-factor analysis of variance. To illustrate where we’re headed… we’ll do this first

Randomized Block Analysis of Variance The purpose of designing a randomized block experiment is to reduce the within-treatments variation to more easily detect differences between the treatment means. In this design, we partition the total variation into three sources of variation: SS(Total) = SST + SSB + SSE where SSB, the sum of squares for blocks, measures the variation between the blocks.

Randomized Blocks… In addition to k treatments, we introduce notation for b blocks in our experimental design… mean of the observations of the 1st treatment mean of the observations of the 2nd treatment

Sum of Squares : Randomized Block… Squaring the ‘distance’ from the grand mean, leads to the following set of formulae… test statistic for treatments test statistic for blocks

ANOVA Table… We can summarize this new information in an analysis of variance (ANOVA) table for the randomized block analysis of variance as follows… Source of Variation d.f.: Sum of Squares Mean Square F Statistic Treatments k–1 SST MST=SST/(k–1) F=MST/MSE Blocks b–1 SSB MSB=SSB/(b-1) F=MSB/MSE Error n–k–b+1 SSE MSE=SSE/(n–k–b+1) Total n–1 SS(Total)

Test Statistics & Rejection Regions…

Example 15.2… IDENTIFY Are there difference in the effectiveness of four new cholesterol drugs? 25 groups of men were matched according to age & weight, and the results were recorded. The hypotheses to test in this case are: H0: H1: At least two means differ

Example 15.2… Each of the four drugs can be considered a treatment. IDENTIFY Each of the four drugs can be considered a treatment. Each group) can be blocked, because they are matched by age and weight. By setting up the experiment this way, we eliminates the variability in cholesterol reduction related to different combinations of age and weight. This helps detect differences in the mean cholesterol reduction attributed to the different drugs.

Example 15.2… The Data : : : : : Treatment Block : : : : : Block There are b = 25 blocks, and k = 4 treatments in this example.

Example 15.2… We obtain the output from Excel… COMPUTE Tools > Data Analysis > Anova: Two Factor Without Replication a.k.a. Randomized Block b-1 k-1 Rows  Blocks MSB MST Columns  Treatments

Example 15.2… INTERPRET The F-statistic to determine whether differences exist between the four drugs (treatments; columns) is 4.12 and is greater than FCritical=1.67. Its p-value is .0094. Thus we reject H0 in favor of the research hypothesis: at least two means differ (i.e. there are differences between the treatments).

Example 15.2… INTERPRET The other F-statistic, 10.11 (p-value = 0; also greater than FCritical=4.12) indicates that there are differences between the groups of men (blocks; rows), that is: age & weight have an impact, but our experiment design accounts for that.

Identifying Factors… Factors that Identify the Randomized Block of the Analysis of Variance:

Two-Factor Analysis of Variance… The original set-up for Example 15.1 examined one factor, namely the effects of the marketing strategy on sales. Emphasis on convenience, Emphasis on quality, or Emphasis on price. Suppose we introduce a second factor, that being the effects of the selected media on sales, that is: Advertise on television, or Advertise in newspapers. To which factor(s) or the interaction of factors can we attribute any differences in mean sales of apple juice?

More Terminology… A complete factorial experiment is an experiment in which the data for all possible combinations of the levels of the factors are gathered. This is also known as a two-way classification. The two factors are usually labeled A & B, with the number of levels of each factor denoted by a & b respectively. The number of observations for each combination is called a replicate, and is denoted by r. For our purposes, the number of replicates will be the same for each treatment, that is they are balanced.

Example 15.3… The Data Factor “A” • Strategy Factor “B” Medium There are a = 3 levels of factor A, b = 2 levels of factor B, yielding 3 x 2 = 6 replicates, each replicate has r = 10 observations…

Possible Outcomes… Fig. 15.5 This figure illustrates the case where there are differences between levels of A, but no difference between the levels of B and no interaction between A & B: 1 2 3 Levels 1 and 2 of factor B Levels of factor A Mean response

Possible Outcomes… Fig. 15.6 This figure illustrates the case where there are differences between levels of B, but no differences between the levels of A and no interaction between A & B: 1 2 3 Level 2 of factor B Levels of factor A Mean response Level 1 of factor B

Possible Outcomes… Fig. 15.4 This figure illustrates the case where there are differences between levels of A, and there are differences between the levels of B, but and no interaction between A & B: (i.e. the factors affect sales independently, which means there is no interaction) 1 2 3 Level 1 of factor B Levels of factor A Mean response Level 2 of factor B

Possible Outcomes… This figure shows the levels of A & B interacting: 1 2 3 Level 1 of factor B Levels of factor A Mean response Level 2 of factor B

ANOVA Table… Table 15.8 Source of Variation d.f.: Sum of Squares Mean Square F Statistic Factor A a-1 SS(A) MS(A)=SS(A)/(a-1) F=MS(A)/MSE Factor B b–1 SS(B) MS(B)=SS(B)/(b-1) F=MS(B)/MSE Interaction (a-1)(b-1) SS(AB) MS(AB) = SS(AB) [(a-1)(b-1)] F=MS(AB)/MSE Error n–ab SSE MSE=SSE/(n–ab) Total n–1 SS(Total)

Two Factor ANOVA… Test for the differences between the Levels of Factor A… H0: The means of the a levels of Factor A are equal H1: At least two means differ Test statistic: F = MS(A) / MSE Example 15.3: Are there differences in the mean sales caused by different marketing strategies? H0:

Two Factor ANOVA… Test for the differences between the Levels of Factor B… H0: The means of the a levels of Factor B are equal H1: At least two means differ Test statistic: F = MS(B) / MSE Example 15.3: Are there differences in the mean sales caused by different advertising media? H0:

Two Factor ANOVA… Test for interaction between Factors A and B… H0: Factors A and B do not interact to affect the mean responses. H1: Factors A and B do interact to affect the mean responses. Test statistic: F = MS(AB) / MSE Example 15.3: Are there differences in the mean sales caused by interaction between marketing strategy and advertising medium?? H0: H1: At least two means differ

Example 15.3… Using the data, we use Tools > Data Analysis… > COMPUTE Using the data, we use Tools > Data Analysis… > Anova: Two-Factor With Replication and get: Factor B - Media Factor A - Mktg Strategy Interaction of A&B Error

Example 15.3… INTERPRET There is evidence at the 5% significance level to infer that differences in weekly sales exist between the different marketing strategies (Factor A).

Example 15.3… INTERPRET There is insufficient evidence at the 5% significance level to infer that differences in weekly sales exist between television and newspaper advertising (Factor B).

Example 15.3… INTERPRET There is not enough evidence to conclude that there is an interaction between marketing strategy and advertising medium that affects mean weekly sales (interaction of Factor A & Factor B).

See for yourself… Plot this data… …as a line chart to see the effects of Factors A & B… Television Newspaper

See for yourself… There are differences between the levels of factor A, no difference between the levels of factor B, and no interaction is apparent.

See for yourself… INTERPRET These results indicate that emphasizing quality produces the highest sales and that television and newspapers are equally effective.

Identifying Factors… Independent Samples Two-Factor Analysis of Variance…

Summary of ANOVA… two-factor analysis of variance one-way analysis of variance two-way analysis of variance a.k.a. randomized blocks

Multiple Comparisons… When we conclude from the one-way analysis of variance that at least two treatment means differ (i.e. we reject the null hypothesis that H0: ), we often need to know which treatment means are responsible for these differences. We will examine three statistical inference procedures that allow us to determine which population means differ: • Fisher’s least significant difference (LSD) method • Bonferroni adjustment, and • Tukey’s multiple comparison method.

Multiple Comparisons… Two means are considered different if the difference between the corresponding sample means is larger than a critical number. The general case for this is, IF THEN we conclude and differ. The larger sample mean is then believed to be associated with a larger population mean.

Fisher’s Least Significant Difference… What is this critical number, NCritical ? One measure is the Least Significant Difference, given by: LSD will be the same for all pairs of means if all k sample sizes are equal. If some sample sizes differ, LSD must be calculated for each combination.

Back to Example 15.1… With k=3 treatments (marketing strategy based on convenience, quality, or price), we will perform three comparisons based on the sample means: We compare these to the Least Significant Difference we calculate as (at 5% significance):

Example 15.1 • Fisher’s LSD We can compute the difference of means manually or use Excel: Tools > Data Analysis Plus > Multiple Comparisons we conclude that only the means for convenience and quality differ

Bonferroni Adjustment to LSD Method… Fisher’s method may result in an increased probability of committing a type I error. We can adjust Fisher’s LSD calculation by using the “Bonferroni adjustment”. Where we used alpha ( ), say .05, previously, we now use and adjusted value for alpha: where

Example 15.1 • Bonferroni’s Adjustment Since we have k=3 treatments, C=k(k–1)/2=3(2)/2=3, hence we set our new alpha value to: Thus, instead of using t.05/2 in our LSD calculation, we are going to use t.0167/2 Use Excel’s TINV() function to calculate t.0167/2=2.467

Example 15.1 • Bonferroni’s Adjustment Again, we can use the Multiple Comparisons tool to do the number crunching for us: Similar result as before…

Tukey’s Multiple Comparison Method… As before, we are looking for a critical number to compare the differences of the sample means against. In this case: Note: is a lower case Omega, not a “w” Critical value of the Studentized range with n–k degrees of freedom Table 7 - Appendix B harmonic mean of the sample sizes

Example 15.1 • Tukey’s Method… Again, we can select a pair of means, manually calculate the difference between the larger and smaller mean, compare this difference to our calculated value of omega ( ), and so on for each pair of sample means. Or use Excel: Similar result as before… Compare

Which method to use? In example 15.1, all three multiple comparison methods yielded the same results. This will not always be the case! Generally speaking… If you have identified two or three pairwise comparisons that you wish to make before conducting the analysis of variance, use the Bonferroni method. If you plan to compare all possible combinations, use Tukey’s comparison method.