1 Analysis of Variance Chapter 14 2 Introduction Analysis of variance helps compare two or more populations of quantitative data. Specifically, we are.

Slides:



Advertisements
Similar presentations
Ch 14 實習(2).
Advertisements

Lecture 11 One-way analysis of variance (Chapter 15.2)
Analysis of Variance (ANOVA) ANOVA can be used to test for the equality of three or more population means We want to use the sample results to test the.
Design of Experiments and Analysis of Variance
1 1 Slide Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2009, Econ-2030 Applied Statistics-Dr Tadesse Chapter 10: Comparisons Involving Means n Introduction to Analysis of Variance n Analysis of.
Part I – MULTIVARIATE ANALYSIS
Analysis of Variance Chapter Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Comparing Means.
Analysis of Variance Chapter Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
Introduction to the Analysis of Variance
Lecture 14 Analysis of Variance Experimental Designs (Chapter 15.3)
Lecture 13 Multiple comparisons for one-way ANOVA (Chapter 15.7)
Analysis of Variance Chapter 15 - continued Two-Factor Analysis of Variance - Example 15.3 –Suppose in Example 15.1, two factors are to be examined:
Lecture 10 Inference about the difference between population proportions (Chapter 13.6) One-way analysis of variance (Chapter 15.2)
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Comparing Means.
Chapter 12: Analysis of Variance
1 1 Slide © 2005 Thomson/South-Western AK/ECON 3480 M & N WINTER 2006 n Power Point Presentation n Professor Ying Kong School of Analytic Studies and Information.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 3 月 30 日 第八週:變異數分析與實驗設計.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
QNT 531 Advanced Problems in Statistics and Research Methods
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 13 Experimental Design and Analysis of Variance nIntroduction to Experimental Design.
1 1 Slide Analysis of Variance Chapter 13 BA 303.
Analysis of Variance Chapter 12 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
Analysis of Variance ( ANOVA )
Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested.
Economics 173 Business Statistics Lectures 9 & 10 Summer, 2001 Professor J. Petry.
Chapter 10 Analysis of Variance.
ANOVA (Analysis of Variance) by Aziza Munir
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.
Copyright © 2009 Cengage Learning 14.1 Chapter 14 Analysis of Variance.
1 Nonparametric Statistical Techniques Chapter 17.
Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the.
Chapter 10: Analysis of Variance: Comparing More Than Two Means.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
12-1 Chapter Twelve McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
© 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd.. 1 Slide Slide Slides Prepared by Juei-Chao Chen Fu Jen Catholic University Slides Prepared.
Chapter 14: Analysis of Variance One-way ANOVA Lecture 9a Instructor: Naveen Abedin Date: 24 th November 2015.
1/54 Statistics Analysis of Variance. 2/54 Statistics in practice Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 Nonparametric Statistical Techniques Chapter 18.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 Pertemuan 19 Analisis Varians Klasifikasi Satu Arah Matakuliah: I Statistika Tahun: 2008 Versi: Revisi.
Rancangan Acak Lengkap ( Analisis Varians Klasifikasi Satu Arah) Pertemuan 16 Matakuliah: I0184 – Teori Statistika II Tahun: 2009.
Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
Keller: Stats for Mgmt & Econ, 7th Ed Analysis of Variance
ANOVA Econ201 HSTS212.
i) Two way ANOVA without replication
Comparing Three or More Means
Statistics Analysis of Variance.
Chapter 10: Analysis of Variance: Comparing More Than Two Means
Statistics for Business and Economics (13e)
Econ 3790: Business and Economic Statistics
Keller: Stats for Mgmt & Econ, 7th Ed Analysis of Variance
Chapter 10 – Part II Analysis of Variance
Presentation transcript:

1 Analysis of Variance Chapter 14

2 Introduction Analysis of variance helps compare two or more populations of quantitative data. Specifically, we are interested in the relationships among the population means (are they equal or not). The procedure works by analyzing the sample variance.

One - Way Analysis of Variance The analysis of variance is a procedure that tests to determine whether differences exits among two or more population means. To do this, the technique analyzes the sample variances

4 Example 1 –An apple juice manufacturer is planning to develop a new product -a liquid concentrate. –The marketing manager has to decide how to market the new product. –Three strategies are considered Emphasize convenience of using the product. Emphasize the quality of the product. Emphasize the product’s low price. One - Way Analysis of Variance :

5 Example 1 - continued –An experiment was conducted as follows: In three cities an advertisement campaign was launched. In each city only one of the three characteristics (convenience, quality, and price) was emphasized. The weekly sales were recorded for twenty weeks following the beginning of the campaigns. One - Way Analysis of Variance :

6 See file (Xm1.xls) (Xm1.xls) Weekly sales

7 Solution –The data is quantitative. –Our problem objective is to compare sales in three cities. –We hypothesize on the relationships among the three mean weekly sales: One - Way Analysis of Variance :

8 H 0 :  1 =  2 =  3 H 1 : At least two means differ To build the statistic needed to test the hypotheses use the following notation: Solution Defining the Hypotheses

9 Independent samples are drawn from k populations (treatments). 12k X 11 x 12. X n1,1 X 21 x 22. X n2,1 X k1 x k2. X nk,1 Sample size Sample mean First observation, first sample Second observation, second sample X is the “response variable”. The variables’ value are called “responses”. Notation

10 Terminology In the context of this problem… Response variable – weekly sales Responses – actual sale values Experimental unit – weeks in the three cities when we record sales figures. Factor – the criterion by which we classify the populations (the treatments). In this problems the factor is the marketing strategy. Factor levels – the population (treatment) names. In this problem factor levels are the marketing trategies.

11 Two types of variability are employed when testing for the equality of the population means The rationale of the test statistic

12 Graphical demonstration: Employing two types of variability

Treatment 1Treatment 2 Treatment Treatment 1Treatment 2Treatment The sample means are the same as before, but the larger within-sample variability makes it harder to draw a conclusion about the population means. A small variability within the samples makes it easier to draw a conclusion about the population means.

14 The rationale behind the test statistic – I The rationale behind the test statistic – I If the null hypothesis is true, we would expect all the sample means be close to one another (and as a result to the grand mean). If the alternative hypothesis is true, at least some of the sample means would reside away from one another. Thus, we measure variability among sample means.

15 The variability among the sample means is measured as the sum of squared distances between each mean and the grand mean. This sum is called the S um of S quares for T reatments SST In our example treatments are represented by the different advertising strategies. Variability among sample means Variability among sample means

16 There are k treatments The size of sample j The mean of sample j Sum of squares for treatments (SSTR) Sum of squares for treatments (SSTR) Note: When the sample means are close to one another, their distance from the grand mean is small, leading to amall SST. Thus, large SST indicates large variation among sample means, which supports H 1.

17 Solution – continued Calculate SST = 20( ) ( ) ( ) 2 = = 57, The grand mean is calculated by Sum of squares for treatments (SST) Sum of squares for treatments (SST)

18 Is SST = 57, large enough to favor H 1 ? See next. Sum of squares for treatments (SST) Sum of squares for treatments (SST)

19 Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means. Therefore, even-though sample means may markedly differ from one another, large SST must be judged relative to the “within samples variability”. The rationale behind test statistic – II The rationale behind test statistic – II

20 The variability within samples is measured by adding all the squared distances between observations and their sample means. This sum is called the S um of S quares for E rror - SSE. In our example this is the sum of all squared differences between sales in city j and the sample mean of city j (over all the three cities). Within samples variability Within samples variability

21 Solution – continued Calculate SSE Sum of squares for errors (SSE) Sum of squares for errors (SSE)  (n 1 - 1)S (n 2 -1)S (n 3 -1)S 3 2 = (20 -1)10, (20 -1) (20-1)8, = = 506,967.88

22 Note: If SST is small relative to SSE, we can’t infer that treatments are the cause for different average performance. Note: If SST is small relative to SSE, we can’t infer that treatments are the cause for different average performance. Is SST = 57, large enough relative to SSE = 506, to argue that the means ARE different? Is SST = 57, large enough relative to SSE = 506, to argue that the means ARE different? Sum of squares for errors (SSE) Sum of squares for errors (SSE)

23 mean sum of squares To perform the test we need to calculate the mean sum of squares as follows: The mean sum of squares Calculation of MST - M ean S quare for T reatments Calculation of MSE M ean S quare for E rror

24 Calculation of the test statistic with the following degrees of freedom: v 1 =k -1 and v 2 =n-k We assume: 1. The populations tested are normally distributed. 2. The variances of all the populations tested are equal. For honors class: Testing equal variances For honors class: Testing normality

25 And finally the hypothesis test: H 0 :  1 =  2 = …=  k H a : At least two means differ Test statistic: R.R: F>F ,k-1,n-k The F test rejection region

26 The F test H o :  1 =  2 =  3 H 1 : At least two means differ Test statistic F= MST  MSE= 3.23 Since 3.23 > 3.15, there is sufficient evidence to reject H o in favor of H 1, and argue that at least one of the mean sales is different than the others.

27 Use Excel to find the p-value =FDIST(3.23,2,57) =.0467 =FDIST(3.23,2,57) =.0467 The F test p- value p Value = P(F>3.23) =.0467

28 Excel single factor printout SS(Total) = SST + SSE See file (Xm1.xls) (Xm1.xls)

If the single factor ANOVA leads us to conclude at least two means differ, we often wants to know which ones. Two means are considered different if the difference between the corresponding sample means is larger than a critical number. The larger sample mean is believed to be associated with a larger population mean Multiple Comparisons

30 Fisher’s Least Significant Difference The Fisher’s Least Significant (LSD) method is one procedure designed to determine which mean difference is significant. The hypotheses are: H 0 : |  i –  j | = 0 H a : |  i –  j |  0. The statistic:

This method builds on the equal variance t-test of the difference between two means. The test statistic is improved by using MSE rather than s p 2. We can conclude that  i and  j differ (at  % significance level if |  i -  j | > LSD, where Fisher’s Least Significant Difference

Experimentwise type I error rate (  E ) (the effective type I error) The Fisher’s method may result in an increased probability of committing a type I error. The probability of committing at least one type I error in a series of C hypothesis tests each at  level of significance is increasing too. This probability is called experimentwise type I error rate (  E ). It is calculated by  E = 1-(1 –  ) C where C is the number of pairwise comparisons (C = k(k-1)/2, k is the number of treatments) The Bonferroni adjustment determines the required type I error probability per pairwise comparison (  ), to secure a pre-determined overall  E 

The procedure: –Compute the number of pairwise comparisons (C) [C=k(k-1)/2], where k is the number of populations/treatments. –Set  =  E /C, where the value of  E is predetermined –We can conclude that  i and  j differ (at  /C% significance level if The Bonferroni Adjustment

Example1 - continued –Rank the effectiveness of the marketing strategies (based on mean weekly sales). –Use the Fisher’s method, and the Bonferroni adjustment method Solution (the Fisher’s method) –The sample mean sales were , 653.0, –Then, The Fisher and Bonferroni methods The significant difference is between  1 and  2.

Solution (the Bonferroni adjustment) –We calculate C=k(k-1)/2 to be 3(2)/2 = 3. –We set  =.05/3 =.0167, thus t.0167  2, 60-3 = (Excel). Again, the significant difference is between  1 and  2. The Fisher and Bonferroni methods

The test procedure: –Find a critical number  as follows: k = the number of samples =degrees of freedom = n - k n g = number of observations per sample (recall, all the sample sizes are the same)  = significance level q  (k,) = a critical value obtained from the studentized range table The Tukey Multiple Comparisons If the sample sizes are not extremely different, we can use the above procedure with n g calculated as the harmonic mean of the sample sizes.

The test procedure: –Find a critical number  as follows: k = the number of samples =degrees of freedom = n - k n g = number of observations per sample  = significance level q  (k,) = a critical value obtained from the studentized range table The Tukey Multiple Comparisons recall, all the sample sizes are the same

The Tukey Multiple Comparisons If the sample sizes are not the same, but don’t differ much from one another, we can use the harmonic mean of the sample sizes for n g. Recall, all the sample sizes are the same

Repeat this procedure for each pair of samples. Rank the means if possible. Select a pair of means. Calculate the difference between the larger and the smaller mean. If there is sufficient evidence to conclude that  max >  min. The Tukey Multiple Comparisons

City 1 vs. City 2: = City 1 vs. City 3: = 31.1 City 2 vs. City 3: = Example 1 – continued. We had three populations (three marketing strategies). K = 3, Sample sizes were equal. n 1 = n 2 = n 3 = 20,  = n-k = 60-3 = 57, MSE = Take q.05 (3,60) from the table. Population Sales - City 1 Sales - City 2 Sales - City 3 Mean The Tukey Multiple Comparisons

Excel – Tukey and Fisher LSD method Xm15 -1.xls Fisher’s LDS Bonferroni adjustments  =.05 Type  =.05/3 =.0167

Randomized Blocks Design The purpose of designing a randomized block experiment is to reduce the within-treatments variation thus increasing the relative amount of among-treatment variation. This helps in detecting differences among the treatment means more easily.

The Block of dark blues 43 Treatment 4 Treatment 3 Treatment 2 Treatment 1 The Block of bluish purples Randomized Blocks The block of Greyish pinks

44 The sum of square total is partitioned into three sources of variation –Treatments –Blocks –Within samples (Error) SS(Total) = SST + SSB + SSE Sum of square for treatmentsSum of square for blocksSum of square for error Recall. For the independent samples design we have: SS(Total) = SST + SSE Partitioning the total variability

45 To perform hypothesis tests for treatments and blocks we need Mean square for treatments Mean square for blocks Mean square for error The mean sum of square

46 The test statistic for the randomized block design ANOVA Test statistics for treatments Test statistics for blocks

47 Testing the mean responses for treatments F > F ,k-1,(k-1)(b-1) Testing the mean response for blocks F> F ,b-1,(k-1)(b-1) The F test rejection region

48 Example 2 –Are there differences in the effectiveness of cholesterol reduction drugs? –To answer this question the following experiment was organized: 25 groups of men with high cholesterol were matched by age and weight. Each group consisted of 4 men. Each person in a group received a different drug. The cholesterol level reduction in two months was recorded. –Can we infer from the data in Xm2.xls that there are differences in mean cholesterol reduction among the four drugs? Xm2.xls Randomized Blocks ANOVA - Example Additional example

49 Solution –Each drug can be considered a treatment. –Each 4 records (per group) can be blocked, because they are matched by age and weight. –This procedure eliminates the variability in cholesterol reduction related to different combinations of age and weight. –This helps detect differences in the mean cholesterol reduction attributed to the different drugs. Randomized Blocks ANOVA - Example

50 BlocksTreatmentsb-1MSTR / MSEMSBL / MSE Conclusion: At 5% significance level there is sufficient evidence to infer that the mean “cholesterol reduction” gained by at least two drugs are different. K-1 Randomized Blocks ANOVA - Example

51 > The rejection region: 14.2 Multiple Comparisons >59.72 <59.72 Testing the differences Example – continued Calculating LSD: MSE = ; n1 = n2 = n3 =20. t.05/2,60-3 = tinv(.05,57) = LSD=(2.002)[ (1/20+1/20)].5 = 59.72