Economics 173 Business Statistics Lectures 9 & 10 Summer, 2001 Professor J. Petry.

Slides:



Advertisements
Similar presentations
Ch 14 實習(2).
Advertisements

Lecture 11 One-way analysis of variance (Chapter 15.2)
Chapter 11 Analysis of Variance
Lecture 15 Two-Factor Analysis of Variance (Chapter 15.5)
Design of Experiments and Analysis of Variance
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Statistics for Managers Using Microsoft® Excel 5th Edition
Part I – MULTIVARIATE ANALYSIS
Analysis of Variance Chapter Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
Chapter 11 Analysis of Variance
Example –A radio station manager wants to know if the amount of time his listeners spent listening to a radio per day is about the same every day of the.
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Statistics for Business and Economics
Comparing Means.
Statistics for Managers Using Microsoft® Excel 5th Edition
Analysis of Variance Chapter Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
Lecture 14 Analysis of Variance Experimental Designs (Chapter 15.3)
Lecture 16 Two-factor Analysis of Variance (Chapter 15.5) Homework 4 has been posted. It is due Friday, March 21 st.
Lecture 13 Multiple comparisons for one-way ANOVA (Chapter 15.7)
Analysis of Variance Chapter 15 - continued Two-Factor Analysis of Variance - Example 15.3 –Suppose in Example 15.1, two factors are to be examined:
Lecture 10 Inference about the difference between population proportions (Chapter 13.6) One-way analysis of variance (Chapter 15.2)
Chapter 17 Analysis of Variance
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Comparing Means.
Go to Table of ContentTable of Content Analysis of Variance: Randomized Blocks Farrokh Alemi Ph.D. Kashif Haqqi M.D.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
Chap 10-1 Analysis of Variance. Chap 10-2 Overview Analysis of Variance (ANOVA) F-test Tukey- Kramer test One-Way ANOVA Two-Way ANOVA Interaction Effects.
Chapter 12: Analysis of Variance
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 3 月 30 日 第八週:變異數分析與實驗設計.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
QNT 531 Advanced Problems in Statistics and Research Methods
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 13 Experimental Design and Analysis of Variance nIntroduction to Experimental Design.
© 2003 Prentice-Hall, Inc.Chap 11-1 Analysis of Variance IE 340/440 PROCESS IMPROVEMENT THROUGH PLANNED EXPERIMENTATION Dr. Xueping Li University of Tennessee.
Analysis of Variance Chapter 12 Introduction Analysis of variance compares two or more populations of interval data. Specifically, we are interested.
Analysis of Variance ( ANOVA )
© 2002 Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 9 Analysis of Variance.
Analysis of Variance ST 511 Introduction n Analysis of variance compares two or more populations of quantitative data. n Specifically, we are interested.
CHAPTER 12 Analysis of Variance Tests
Chapter 10 Analysis of Variance.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
1 Analysis of Variance Chapter 14 2 Introduction Analysis of variance helps compare two or more populations of quantitative data. Specifically, we are.
Copyright © 2009 Cengage Learning 14.1 Chapter 14 Analysis of Variance.
Lecture 9-1 Analysis of Variance
Chapter 12: Analysis of Variance. Chapter Goals Test a hypothesis about several means. Consider the analysis of variance technique (ANOVA). Restrict the.
Chapter 10: Analysis of Variance: Comparing More Than Two Means.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Chap 11-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 11 Analysis of Variance.
1/54 Statistics Analysis of Variance. 2/54 Statistics in practice Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Chapter 11 Analysis of Variance
Keller: Stats for Mgmt & Econ, 7th Ed Analysis of Variance
CHAPTER 4 Analysis of Variance (ANOVA)
Factorial Experiments
ANOVA Econ201 HSTS212.
Comparing Three or More Means
Statistics Analysis of Variance.
Chapter 10: Analysis of Variance: Comparing More Than Two Means
Statistics for Business and Economics (13e)
Chapter 10 – Part II Analysis of Variance
Presentation transcript:

Economics 173 Business Statistics Lectures 9 & 10 Summer, 2001 Professor J. Petry

Analysis of Variance Chapter 14

14.1 Introduction Analysis of variance helps compare two or more populations of quantitative data. Specifically, we are interested in the relationships among the population means (are they equal or not). The procedure works by analyzing the sample variance.

The analysis of variance is a procedure that tests to determine whether differences exits among two or more population means. To do this, the technique analyzes the variance of the data. How can analyzing variance help us understand relationship between population means? If the ratio of the variance between samples compared to the variance within samples is large, then the means of the samples are likely to be unequal (i.e. reject H 0 ) Single - Factor (One - Way) Analysis of Variance : Independent Samples

Example 14.1 –An apple juice manufacturer is planning to develop a new product -a liquid concentrate. –The marketing manager has to decide how to market the new product. –Three strategies are considered Emphasize convenience of using the product. Emphasize the quality of the product. Emphasize the product’s low price.

Example continued –An experiment was conducted as follows: In three cities an advertising campaign was launched. In each city only one of the three characteristics (convenience, quality, and price) was emphasized. The weekly sales were recorded for twenty weeks following the beginning of the campaigns.

Example continued –Data (see file XM ) Weekly sales Solution –The data are quantitative. –Our problem objective is to compare sales in three cities. –We hypothesize on the relationships among the three mean weekly sales:

H 0 :  1 =  2 =  3 H 1 : At least two means differ To perform the analysis of variance we need to build an “F” statistic. To more easily follow the process we use the following notation:

Independent samples are drawn from k populations. Each population is called a “treatment”. 12k X 11 x 21. X n1,1 X 12 x 22. X n2, 2 X 1k x 2k. X nk,k Sample size Sample mean First observation, first sample Second observation, second sample X is the “response variable”.. The variables’ values are called “responses”. The average of the sample means is the “grand mean”

The Test Statistic The test stems from the following rationale: –If the null hypothesis is true, we would expect all the sample means be close to one another (and as a result to the grand mean). –If the alternative hypothesis is true, at least some of the sample means would be different from one another.

Treatment 1Treatment 2 Treatment 3 10 Treatment 1Treatment 2Treatment The sample means are the same as before, but the larger within-sample variability makes it harder to draw a conclusion about the population means. =noise=“error” A small variability within the samples makes it easier to draw a conclusion about the population means. If the b/n sample mean variance is large, relative to the within-sample variance, then the sample means are unequal.

The variability among the sample means is measured as the sum of squared distances between each mean and the grand mean. This sum is called the S um of S quares for T reatments SST 2 1 )( xxn k j jj    In our example treatments are represented by the different advertising strategies.

SST - Continued Note: When the averages are close to one another, their distance from the grand average is small, leading to a small SST. Thus, large SST indicates large variation among sample averages. There are k treatments The size of sample j The average of sample j The grand mean The grand mean is calculated by

The variability within samples is measured by adding all the squared distances between observations and their sample mean. This sum is called the S um of S quares for E rror - SSE. In our example this is the sum of all squared differences between sales in city j and the sample mean of city j (over all the three cities).

= 20( ) ( ) ( ) 2 = = 57, Calculation of SSTCalculation of SSE = (n 1 - 1)S (n 2 -1)S (n 3 -1)S 3 2 = (20 -1)10, (20 -1) (20-1)8, = = 506, To perform the test we need to calculate the mean sums of squares mean sums of squares as follows:

Calculation of MST - M ean S quare for T reatments Calculation of MSE M ean S quare for E rror with the following degrees of freedom: v 1 =k -1 and v 2 =n-k We require that: 1. The populations tested are normally distributed. 2. The variances of all the populations tested are equal.

H 0 :  1 =  2 = …=  k H 1 : At least two means differ Test statistic: Rejection region: F>F  k-1,n-k Specifically in our advertisement problem Which in terms of our hypothesis test, looks like...

H o :  1 =  2 =  3 H 1 : At least two means differ Test statistic F= MST / MSE= 3.23 Since 3.23 > 3.15, there is sufficient evidence to reject H o in favor of H 1, and argue that at least one of the mean sales is different than the others.

SS(Total) = SST + SSE

Checking the required conditions The F test requires that the populations are normally distributed with equal variance. From the Excel printout we compare the sample variances: 10774, 7238, It seems the variances are equal (see section 14.7 for Bartlett’s test of equality of variances). To check the normality observe the histogram of each sample.

All the distributions seem to be normal.

14.3 Analysis of Variance Models Several elements may distinguish between one experimental design and others. –The number of factors. Each characteristic investigated is called a factor. Each factor has several levels.

Factor A Level 1Level2 Level 1 Factor B Level 3 Two - way ANOVA Level2 One - way ANOVA Treatment 3 Treatment 2 Response Treatment 1

–Independent samples or blocks. Groups of matched observations are formed into blocks, in order to remove the effects of “noise” variability. By doing so we improve the chances of detecting the variability of interest.

–Fixed and random effects models. If all levels of a factor included in our analysis are pre- determined, we have a fixed effect ANOVA. –The conclusion of a fixed effect ANOVA applies only to the levels studied. If the levels included in our analysis represent a random sample of all the possible levels, we have a random- effect ANOVA. –The conclusion of the random-effect ANOVA applies to all the levels (not only those studied). The calculation of the test statistic for fixed and random effects may differ for some ANOVA models.

14.4 Single-Factor Analysis of Variance: Randomized Blocks The purpose of designing a randomized block experiment is to reduce the within-treatments variation thus increasing the relative among- treatment variation. This helps in detecting differences among the treatment means more easily.

Treatment 4 Treatment 3 Treatment 2 Treatment 1 Block 1Block3Block2 Block all the observations with some commonality across treatments

Blocked samples from k populations (treatments)

The sum of square total is partitioned into three sources of variation –Treatments –Blocks –Within samples (Error) SS(Total) = SST + SSB + SSE Sum of square for treatmentsSum of square for blocksSum of square for error Recall: SS(Total) = SST + SSE

To perform hypothesis tests for treatments and blocks we need Mean square for treatments Mean square for blocks Mean square for error Test statistics for treatments Test statistics for blocks

Example 14.2 –A radio station manager wants to know if the amount of time his listeners spent listening to a radio per day is about the same every day of the week. –200 teenagers were asked to record how long they spend listening to a radio each day of the week. Solution –The problem objective is to compare seven populations. –The data are quantitative.

–Each day of the week can be considered a treatment. –Each 7 data points (per person) can be blocked, because they belong to the same person. –This procedure eliminates the variability in the “radio-times” among teenagers, and helps detect differences of the mean times teenagers listen to the radio among the days of the week.

Checking the required conditions Observing the histograms of the seven populations we can assume that all the distributions are approximately normally distributed. Sunday Monday The population variances seem to be equal. See the sample variances :

BlocksTreatmentsK-1b-1 MST / MSE MSB / MSE Conclusion: At 5% significance level there is sufficient evidence to reject the null hypothesis, and infer that mean “radio time” is different in at least one of the week days.

14.5 Two Factor Analysis of Variance: Independent Samples Example 14.3 –Suppose in example 14.1, two factors are to be examined: The effects of the marketing approach on sales. –Emphasis on convenience –Emphasis on quality –Emphasis on price The effects of the selected media on sales. –Advertise on TV –Advertise in newspapers

Example continued –The combinations of level, one for each factor define the treatments. –The hypotheses tested are: H 0 :  1 =  2 =  3 =  4 =  5 =  6 H 1 : At least two means differ. –We assume that the only existing levels are those studied. Thus, this is a fixed - effects factorial experiment. 1. Emphasize convenience+ advertise on TV. 2. Emphasize convenience+ advertise in newspaper.

We can design the experiment as follows: City 1City2City3City4City5City6 Convenience&Quality&Price & Convenience& Quality &Price & TVTVTVNewspaper Newspaper Newspaper This is a one - way ANOVA experimental design. The p-value =.045. We conclude that there is a strong evidence that differences exist in the mean weekly sales.

Are these differences caused by differences in the marketing approach? Are these differences caused by differences in the medium used for advertising? Are there combinations of these two factors that interact to affect the weekly sales? A new experimental design is needed to answer these questions.

City 1 sales City3 sales City 5 sales City 2 sales City 4 sales City 6 sales TV Newspapers ConvenienceQualityPrice Factor A: Marketing strategy Factor B: Advertising media Are there differences in the mean sales caused by different marketing strategies? Test whether mean sales of “Convenience”, “Quality”, and “Price” significantly differ from one another. Factor A: Marketing strategy Factor B: Advertising media Factor A: Marketing strategy Factor B: Advertising media Factor A: Marketing strategy Factor B: Advertising media

City 1 sales City 3 sales City 5 sales City 2 sales City 4 sales City 6 sales TV Newspapers ConvenienceQualityPrice Factor A: Marketing strategy Factor B: Advertising media Are there differences in the mean sales caused by different advertising media? Test whether mean sales of the “TV”, and “Newspapers” significantly differ from one another. Use SS(B).

City 1 sales City 5 sales City 2 sales City 4 sales City 6 sales TV Newspapers ConvenienceQualityPrice Factor A: Marketing strategy Factor B: Advertising media Are there differences in the mean sales caused by interaction between marketing strategy and advertising medium? Test whether mean sales of certain cells are different than the level expected. City 3 sales

Graphical description of the possible relationships between factors A and B.

Levels of factor A 123 Level 1 of factor B Level 2 of factor B Level 1and 2 of factor B Difference among the levels of factor A No difference among the levels of factor B Difference among the levels of factor A, and difference among the levels of factor B; no interaction Levels of factor A No difference among the levels of factor A. Difference among the levels of factor B Interaction M R e s a p n o n s e M R e s a p n o n s e M R e s a p n o n s e M R e s a p n o n s e

Sums of squares

F tests for the Two-way ANOVA Test for the difference among the levels of the main factors A and B F= MS(A) MSE F= MS(B) MSE Rejection region: F > F ,a-1,n-ab F > F , b-1, n-b Test for interaction between factors A and B F= MS(AB) MSE Rejection region: F > F ,a-1)(b-1),n-ab SS(A)/(a-1) SS(B)/(b-1) SS(AB)/(a-1)(b-1) SSE/(n-ab)

Example continued –Test of the difference in mean sales among the three marketing approaches H 0 :  conv. =  quality =  price H 1 : At least two mean sales are different F = MS(Marketing strategy)/MSE = 5.33 (see computer printout next.) F critical = F ,a-1,n-ab = F.05,3-1,60-(3)(2) = about 3.15 –At 5% significance level there is evidence to infer that differences in weekly sales exist among the marketing strategies.

Example continued –Test of the difference in mean sales between the two advertising media H 0 :  TV. =  Nespaper H 1 : The two mean sales differ F = MS(Media)/MSE = 1.42 (see computer printout next.) F critical = F  a-1,n-ab = F.05,2-1,60-(3)(2) = about 4.00 –At 5% significance level there is insufficient evidence to infer that differences in weekly sales exist between the two advertising media.

Example continued –Test for interaction between factor A and B H 0 :  TV*conv. =  TV*quality =…=  newsp.*price H 1 : At least two means differ F = MS(Marketing*Media)/MSE =.09 (see computer printout next.) F critical = F  a-1)(b-1),n-ab = F.05,(3-1)(2-1),60-(3)(2) = about 3.15 –At 5% significance level there is insufficient evidence to infer that the two factors interact to affect the mean weekly sales.

The two - way ANOVA Excel solution Factor A = Marketing strategies Factor B = Advertising media

14.7 Multiple Comparisons When the null hypothesis is rejected, it may be desirable to find which mean(s) is (are) different, and at what ranking order. Three statistical inference procedures, geared at doing this, are presented: –Fisher’s least significant difference (LSD) method –Bonferroni adjustment –Tukey’s multiple comparison method

Two means are considered different if the difference between the corresponding sample means is larger than a critical number. Then, the larger sample mean is believed to be associated with a larger population mean. Conditions common to all the methods here: –The ANOVA model is the independent-sample, single factor –The conditions required to perform the ANOVA are satisfied. –The experiment is fixed-effect

Fisher Least Significant Different (LSD) Method This method builds on the equal variance t-test of the difference between two means. The test statistic is improved by using MSE rather than s p 2. We can conclude that  i and  j differ (at  % significance level if |  i -  j | > LSD, where

The Bonferroni Adjustment The Fisher’s method may result in an increased probability of committing a type I error (  ). The Bonferroni adjustment determines the required type I error probability per pairwise comparison, to secure a pre-determined overall 

The procedure: –Compute the number of pairwise comparisons (C) [C=k(k-1)/2}], where k is the number of populations. –Set  =  E /C, where  E is the true probability of making at least one type I error (called experimentwise type I error). –We can conclude that  i and  j differ (at  /C% significance level if

Example continued –Rank the effectiveness of the marketing strategies (based on mean weekly sales). –Use the Fisher’s method, and the Bonferroni adjustment method Solution (the Fisher’s method) –The sample mean sales were , 653.0, –Then,

Solution (the Bonferroni adjustment) –We calculate C=k(k-1)/2 to be 3(2)/2 = 3. –We set  =.05/3 =.0167, thus t.00833, 60-3 = (Excel). Again, the significant difference is between  1 and  2.

The test procedure: –Find a critical number  as follows: k = the number of samples =degrees of freedom = n - k n g = number of observations per sample (recall, all the sample sizes are the same)  = significance level q  (k, ) = a critical value (the studentized range) obtained from a table The Tukey Multiple Comparisons

Repeat this procedure for each pair of samples. Rank the means if possible. Select a pair of means. Calculate the difference between the larger and the smaller mean. If there is sufficient evidence to conclude that  max >  min. If the sample sizes are different, use the above procedure provided the sizes are similar. For n g use the harmonic mean.

City 1 vs. City 2: = City 1 vs. City 3: = 31.1 City 2 vs. City 3: = Example continued –We had three populations (three marketing strategies). K = 3, Sample sizes were equal. n 1 = n 2 = n 3 = 20,  = n-k = 60-3 = 57, MSE = Take q.05 (3,60) from the table. Population Sales - City 1 Sales - City 2 Sales - City 3 Mean

14.8 Bartlett’s Test This procedure is conducted when testing The test statistic is