Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of Variance and Covariance

Similar presentations


Presentation on theme: "Analysis of Variance and Covariance"— Presentation transcript:

1 Analysis of Variance and Covariance
16-1

2 Chapter Outline Overview Relationship Among Techniques
3) One-Way Analysis of Variance 4) Statistics Associated with One-Way Analysis of Variance 5) Conducting One-Way Analysis of Variance Identification of Dependent & Independent Variables Decomposition of the Total Variation Measurement of Effects Significance Testing Interpretation of Results

3 Chapter Outline 6) Illustrative Applications of One-Way Analysis of Variance 7) Assumptions in Analysis of Variance 8) N-Way Analysis of Variance 9) Analysis of Covariance 10) Issues in Interpretation Interactions Relative Importance of Factors Multiple Comparisons 11) Multivariate Analysis of Variance

4 Relationship Among Techniques
Analysis of variance (ANOVA) is used as a test of means for two or more populations. The null hypothesis, typically, is that all means are equal. Analysis of variance must have a dependent variable that is metric (measured using an interval or ratio scale). There must also be one or more independent variables that are all categorical (nonmetric). Categorical independent variables are also called factors.

5 Relationship Among Techniques
A particular combination of factor levels, or categories, is called a treatment. One-way analysis of variance involves only one categorical variable, or a single factor. Here a treatment is the same as a factor level. If two or more factors are involved, the analysis is termed n-way analysis of variance. If the set of independent variables consists of both categorical and metric variables, the technique is called analysis of covariance (ANCOVA). The metric-independent variables are referred to as covariates.

6 Relationship Amongst Test, Analysis of Variance, Analysis of Covariance, & Regression
Fig. 16.1 One Independent One or More Metric Dependent Variable t Test Binary Variable One-Way Analysis of Variance One Factor N-Way Analysis More than Analysis of Variance Categorical: Factorial Covariance Categorical and Interval Regression Interval Independent Variables

7 One-Way Analysis of Variance
Marketing researchers are often interested in examining the differences in the mean values of the dependent variable for several categories of a single independent variable or factor. For example: Do the various segments differ in terms of their volume of product consumption? Do the brand evaluations of groups exposed to different commercials vary? What is the effect of consumers' familiarity with the store (measured as high, medium, and low) on preference for the store?

8 Statistics Associated with One-Way Analysis of Variance
F statistic. The null hypothesis that the category means are equal is tested by an F statistic. The F statistic is based on the ratio of the variance between groups and the variance within groups. The variances are related to sum of squares

9 Statistics Associated with One-Way Analysis of Variance
SSbetween. Also denoted as SSx , this is the variation in Y related to the variation in the means of the categories of X. This is variation in Y accounted for by X. SSwithin. Also referred to as SSerror , this is the variation in Y due to the variation within each of the categories of X. This variation is not accounted for by X. SSy. This is the total variation in Y.

10 Conducting One-Way ANOVA
Interpret the Results Identify the Dependent and Independent Variables Decompose the Total Variation Measure the Effects Test the Significance Fig. 16.2

11 Conducting One-Way ANOVA: Decomposing the Total Variation
The total variation in Y may be decomposed as: SSy = SSx + SSerror, where Yi = individual observation j = mean for category j = mean over the whole sample, or grand mean Yij = i th observation in the j th category Y S y = ( i - ) 2 1 N x n j c e r o

12 Conducting One-Way ANOVA : Decomposition of the Total Variation
Independent Variable X Total Categories Sample X1 X2 X3 … Xc Y1 Y1 Y1 Y1 Y1 Y2 Y2 Y2 Y2 Y2 : : Yn Yn Yn Yn YN Y1 Y2 Y3 Yc Y Within Category Variation =SSwithin Between Category Variation = SSbetween Total Variation =SSy Category Mean Table 16.1

13 Conducting One-Way ANOVA: Measure Effects and Test Significance
In one-way analysis of variance, we test the null hypothesis that the category means are equal in the population. H0: µ1 = µ2 = µ3 = = µc The null hypothesis may be tested by the F statistic which is proportional to following ratio: This statistic follows the F distribution S x F ~ S e r o

14 Conducting One-Way ANOVA: Interpret the Results
If the null hypothesis of equal category means is not rejected, then the independent variable does not have a significant effect on the dependent variable. On the other hand, if the null hypothesis is rejected, then the effect of the independent variable is significant. A comparison of the category mean values will indicate the nature of the effect of the independent variable.

15 Illustrative Applications of One-Way ANOVA
We illustrate the concepts discussed in this chapter using the data presented in Table 16.2. The department store chain is attempting to determine the effect of in-store promotion (X) on sales (Y). The null hypothesis is that the category means are equal: H0: µ1 = µ2 = µ3.

16 Effect of Promotion and Clientele on Sales
Table 16.2

17 One-Way ANOVA: Effect of In-store Promotion on Store Sales
Table 16.4 Cell means Level of Count Mean Promotion High (1) Medium (2) Low (3) TOTAL Source of Sum of df Mean F ratio F prob Variation squares square Between groups (Promotion) Within groups (Error) TOTAL

18 Assumptions in Analysis of Variance
The error term is normally distributed, with a zero mean The error term has a constant variance. The error is not related to any of the categories of X. The error terms are uncorrelated.

19 N-Way Analysis of Variance
In marketing research, one is often concerned with the effect of more than one factor simultaneously. For example: How do advertising levels (high, medium, and low) interact with price levels (high, medium, and low) to influence a brand's sale? Do educational levels (less than high school, high school graduate, some college, and college graduate) and age (less than 35, 35-55, more than 55) affect consumption of a brand? What is the effect of consumers' familiarity with a department store (high, medium, and low) and store image (positive, neutral, and negative) on preference for the store?

20 N-Way Analysis of Variance
Consider two factors X1 and X2 having categories c1 and c2.   The significance of the overall effect is tested by an F test If the overall effect is significant, the next step is to examine the significance of the interaction effect. This is also tested using an F test The significance of the main effect of each factor may be tested using an F test as well

21 Two-way Analysis of Variance
Source of Sum of Mean Sig. of Variation squares df square F F  Main Effects Promotion Coupon Combined Two-way interaction Model Residual (error) TOTAL 2 Table 16.5

22 Two-way Analysis of Variance
Table 16.5, cont. Cell Means Promotion Coupon Count Mean High Yes High No Medium Yes Medium No Low Yes Low No TOTAL Factor Level Means Promotion Coupon Count Mean High Medium Low Yes No Grand Mean

23 Analysis of Covariance
When examining the differences in the mean values of the dependent variable, it is often necessary to take into account the influence of uncontrolled independent variables. For example: In determining how different groups exposed to different commercials evaluate a brand, it may be necessary to control for prior knowledge. In determining how different price levels will affect a household's cereal consumption, it may be essential to take household size into account. Suppose that we wanted to determine the effect of in-store promotion and couponing on sales while controlling for the affect of clientele. The results are shown in Table 16.6.

24 Analysis of Covariance
Sum of Mean Sig. Source of Variation Squares df Square F of F Covariance Clientele Main effects Promotion Coupon Combined 2-Way Interaction Promotion* Coupon Model Residual (Error) TOTAL Covariate Raw Coefficient Clientele Table 16.6

25 Issues in Interpretation
Important issues involved in the interpretation of ANOVA results include interactions, relative importance of factors, and multiple comparisons. Interactions The different interactions that can arise when conducting ANOVA on two or more factors are shown in Figure 16.3. Relative Importance of Factors It is important to determine the relative importance of each factor in explaining the variation in the dependent variable.

26 A Classification of Interaction Effects
Noncrossover (Case 3) Crossover (Case 4) Possible Interaction Effects No Interaction (Case 1) Interaction Ordinal (Case 2) Disordinal Fig. 16.3

27 Patterns of Interaction
Fig. 16.4 Y X 11 12 13 Case 1: No Interaction 22 21 Case 2: Ordinal Interaction Case 3: Disordinal Interaction: Noncrossover Case 4: Disordinal Interaction: Crossover

28 Multivariate Analysis of Variance
Multivariate analysis of variance (MANOVA) is similar to analysis of variance (ANOVA), except that instead of one metric dependent variable, we have two or more. In MANOVA, the null hypothesis is that the vectors of means on multiple dependent variables are equal across groups. Multivariate analysis of variance is appropriate when there are two or more dependent variables that are correlated.

29 Regression Analysis © 2007 Prentice Hall 17-29

30 Chapter Outline 1) Correlations 2) Bivariate Regression
3) Statistics Associated with Bivariate Regression 4) Conducting Bivariate Regression Analysis i. Scatter Diagram ii. Bivariate Regression Model iii. Estimation of Parameters iv. Standardized Regression Coefficient v. Significance Testing © 2007 Prentice Hall

31 Chapter Outline 6) Statistics Associated with Multiple Regression
vi. Strength and Significance of Association vii. Assumptions 5) Multiple Regression 6) Statistics Associated with Multiple Regression 7) Conducting Multiple Regression Partial Regression Coefficients Strength of Association Significance Testing 8) Multicollinearity 9) Relative Importance of Predictors © 2007 Prentice Hall

32 Product Moment Correlation
The product moment correlation, r, summarizes the strength of association between two metric (interval or ratio scaled) variables, say X and Y. It is an index used to determine whether a linear or straight-line relationship exists between X and Y. r varies between -1.0 and +1.0. The correlation coefficient between two variables will be the same regardless of their underlying units of measurement. © 2007 Prentice Hall

33 Explaining Attitude Toward the City of Residence
Table 17.1 Explaining Attitude Toward the City of Residence © 2007 Prentice Hall

34 Product Moment Correlation
When it is computed for a population rather than a sample, the product moment correlation is denoted by , the Greek letter rho. The coefficient r is an estimator of . The statistical significance of the relationship between two variables measured by using r can be conveniently tested. The hypotheses are: r H : = 1 © 2007 Prentice Hall

35 Significance of correlation
The test statistic has a t dist. The r bet. ‘Attitude towards city’ and ‘Duration’ is The value of t-stat is From the t table (Table 4 in the Stat Appdx), the critical value of t for a two-tailed test and = 0.05 is a Hence, the null hypothesis of no relationship between X and Y is rejected © 2007 Prentice Hall

36 Regression Analysis Regression analysis examines associative relationships between a metric dependent variable and one or more independent variables in the following ways: Determine whether the independent variables explain a significant variation in the dependent variable: whether a relationship exists. Determine how much of the variation in the dependent variable can be explained by the independent variables: strength of the relationship. Predict the values of the dependent variable. © 2007 Prentice Hall

37 Statistics Associated with Bivariate Regression Analysis
Regression model. Yi = Xi + ei whereY = dep var, X = indep var, = intercept of the line, = slope of the line, and ei is the error term for the i th observation. Coefficient of determination: r 2. Measures strength of association. Varies bet. 0 and 1 and signifies proportion of the variation in Y accounted for by the variation in X. Estimated or predicted value of Yi is i = a + bx where i is the predicted value of Yi and a and b are estimators of and b b 1 b b 1 b b 1 © 2007 Prentice Hall

38 Statistics Associated with Bivariate Regression Analysis
Regression coefficient. The estimated parameter b is usually referred to as the non-standardized regression coefficient. Standard error of estimate. This statistic is the standard deviation of the actual Y values from the predicted values. Standard error. The standard deviation of b, SEb is called the standard error. Y © 2007 Prentice Hall

39 Statistics Associated with Bivariate Regression Analysis
Sum of squared errors. The distances of all the points from the regression line are squared and added together to arrive at the sum of squared errors, which is a measure of total error, . t statistic. A t statistic can be used to test the null hypothesis that no linear relationship exists between X and Y e j S 2 © 2007 Prentice Hall

40 Idea Behind Estimating Regression Eqn
A scatter diagram, or scattergram, is a plot of the values of two variables The most commonly used technique for fitting a straight line to a scattergram is the least-squares procedure. In fitting the line, the least-squares procedure minimizes the sum of squared errors, e j S 2 © 2007 Prentice Hall

41 Conducting Bivariate Regression Analysis
Plot the Scatter Diagram Formulate the General Model Estimate the Parameters Estimate Regression Coefficients Test for Significance Determine the Strength and Significance of Association © 2007 Prentice Hall

42 Plot of Attitude with Duration
Fig. 17.3 4.5 2.25 6.75 11.25 9 13.5 3 6 15.75 18 Duration of Residence Attitude © 2007 Prentice Hall

43 Which Straight Line Is Best?
Fig. 17.4 9 6 3 2.25 4.5 6.75 11.25 13.5 15.75 18 Line 1 Line 2 4 © 2007 Prentice Hall

44 Decomposing the Total Variation
Fig. 17.6 X2 X1 X3 X5 X4 Y X Total Variation SSy Residual Variation (SSRes ) Explained Variation (SSReg ) © 2007 Prentice Hall

45 Decomposing the Total Variation
The total variation, SSy, may be decomposed into the variation accounted for by the regression line, SSreg, and the error or residual variation, SSerror or SSres, as follows: SSy = SSreg + SSres where S y = ( Y i - ) 2 n 1 r e g s © 2007 Prentice Hall

46 Strength and Significance of Association
The strength of association is: R 2 = S r e g y Answers the question: ”What percentage of total variation in Y is explained by X?” © 2007 Prentice Hall

47 Test for Significance The statistical significance of the linear relationship between X and Y may be tested by examining the hypotheses: A t statistic can be used, where t=b/SEb SEb denotes the standard deviation of b and is called the standard error. H : b 1 = © 2007 Prentice Hall

48 Illustration of Bivariate Regression
The regression of attitude on duration of residence, using the data shown in Table 17.1, yielded the results shown in Table a= , b= The estimated equation is: Attitude ( ) = (Duration of residence) The standard error, or standard deviation of b is , and t = / =8.414. The p-value corresponding to the calculated t is Since this is smaller than =0.05, the null hypothesis is rejected. Y a © 2007 Prentice Hall

49 VARIABLES IN THE EQUATION
Bivariate Regression Table 17.2 Multiple R R Adjusted R Standard Error ANALYSIS OF VARIANCE df Sum of Squares Mean Square Regression Residual F = Significance of F = VARIABLES IN THE EQUATION Variable b SEb Beta (ß) T Significance of T Duration (Constant) © 2007 Prentice Hall

50 Strength and Significance of Association
The predicted values ( ) can be calculated using Attitude ( ) = (Duration of residence) For the first observation in Table 17.1, this value is: = x 10 = For each observation, we can obtain this value Using these, SSreg = , SSres = R2=105.95/( )=0.8762, Y Y Y © 2007 Prentice Hall

51 Strength and Significance of Association
Another, equivalent test for examining the significance of the linear relationship between X and Y (significance of b) is the test for the significance of the coefficient of determination. The hypotheses in this case are: H0: R2pop = 0 H1: R2pop > 0 © 2007 Prentice Hall

52 Strength and Significance of Association
The appropriate test statistic is the F statistic which has an F distribution. The p-value corresponding to the F statistic is: 0.0000 Therefore, the relationship is significant at the α=0.05 level, corroborating the results of the t test. © 2007 Prentice Hall

53 Assumptions The error term is normally distributed.
The mean of the error term is 0. The variance of the error term is constant. This variance does not depend on the values assumed by X. The error terms are uncorrelated. In other words, the observations have been drawn independently. © 2007 Prentice Hall

54 Multiple Regression Y = b + X . e
The general form of the multiple regression model is as follows: which is estimated by the following equation: = a + b1X1 + b2X2 + b3X bkXk As before, the coefficient a represents the intercept, but the b's are now the partial regression coefficients. Y = b + 1 X 2 3 . k e © 2007 Prentice Hall

55 Stats Associated with Multiple Reg
Coefficient of multiple determination. The strength of association is measured by R2. Adjusted R2. R2, coefficient of multiple determination, is adjusted for the number of independent variables and the sample size. F test. The F test is used to test the null hypothesis that the coefficient of multiple determination in the population, R2pop, is zero. The test statistic has an F distribution © 2007 Prentice Hall

56 The Multiple Regression Equation
For data in Table 17.1, suppose we want to explain ‘Attitude Towards City’ by ‘Duration’ and ‘Importance of Weather’ From Table 17.3, the estimated regression equation is:  ( ) = X X2 or Attitude = (Duration) (Importance) Y © 2007 Prentice Hall

57 VARIABLES IN THE EQUATION
Multiple Regression Table 17.3 Multiple R R Adjusted R Standard Error ANALYSIS OF VARIANCE df Sum of Squares Mean Square Regression Residual F = Significance of F = VARIABLES IN THE EQUATION Variable b SEb Beta (ß) T Significance of T IMPORTANCE DURATION (Constant) © 2007 Prentice Hall

58 Strength of Association
The strength of association is measured by R2, which is similar to bivariate case R2 is adjusted for the number of independent variables and the sample size. It is called Adjusted R2 © 2007 Prentice Hall

59 Conducting Multiple Regression Analysis: Significance Testing
H0 : R2pop = 0, This is equivalent to the following null hypothesis: H : b = b = b = . . . = b = 1 2 3 k The overall test (for all βi’s collectively) can be conducted by using an F statistic which has an F distribution. Testing for the significance of the individual βi’s can be done in a manner similar to that in the bivariate case by using t tests © 2007 Prentice Hall

60 Multicollinearity Multicollinearity arises when intercorrelations among the predictors are very high. Multicollinearity can result in several problems, including: The partial regression coefficients may not be estimated precisely. The standard errors are likely to be high. It becomes difficult to assess the relative importance of the independent variables in explaining the variation in the dependent variable. © 2007 Prentice Hall

61 Relative Importance of Predictors
Statistical significance. If the partial regression coefficient of a variable is not significant, that variable is judged to be unimportant. Measures based on standardized coefficients or beta weights. The most commonly used measures are the absolute values of the beta weights, |Bi| , or the squared values, Bi 2. © 2007 Prentice Hall


Download ppt "Analysis of Variance and Covariance"

Similar presentations


Ads by Google