Analysis of Variance and Covariance 16-1. Chapter Outline 1)Overview 2)Relationship Among Techniques 3) One-Way Analysis of Variance 4)Statistics Associated.

Slides:



Advertisements
Similar presentations
Analysis of Variance and Covariance Chapter Outline 1)Overview 2)Relationship Among Techniques 3) One-Way Analysis of Variance 4)Statistics Associated.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Analysis of Variance and Covariance. Chapter Outline 1) Overview 2) Relationship Among Techniques 3) One-Way Analysis of Variance 4) Statistics Associated.
Forecasting Using the Simple Linear Regression Model and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
14-1 Statistically Adjusting the Data Variable Respecification Variable respecification involves the transformation of data to create new variables or.
Chapter 12 Multiple Regression
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
SIMPLE LINEAR REGRESSION
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Correlation and Regression
Correlation and Linear Regression
Correlation and Regression
Data Analysis: Correlation and Regression
Chapter Eighteen. Figure 18.1 Relationship of Correlation and Regression to the Previous Chapters and the Marketing Research Process Focus of This Chapter.
Chapter Sixteen Analysis of Variance and Covariance 16-1 © 2007 Prentice Hall.
Lecture 8 Analysis of Variance and Covariance Effect of Coupons, In-Store Promotion and Affluence of the Clientele on Sales.
Chapter Sixteen Analysis of Variance and Covariance.
Chapter Sixteen Analysis of Variance and Covariance.
Chapter Sixteen Analysis of Variance and Covariance 16-1 Copyright © 2010 Pearson Education, Inc.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Introduction to Linear Regression
Examining Relationships in Quantitative Research
© 2009 Pearson Education, Inc publishing as Prentice Hall 18-1 Chapter 18 Data Analysis: Correlation and Regression.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Analysis of Variance and Covariance Effect of Coupons, In-Store Promotion and Affluence of the Clientele on Sales.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Chapter Seventeen Analysis of Variance and Covariance.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
MARKETING RESEARCH CHAPTER 17: Hypothesis Testing Related to Differences.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Analysis of Variance and Covariance
Analysis of Variance and Covariance Chapter Outline 1)Overview 2)Relationship Among Techniques 3) One-Way Analysis of Variance 4)Statistics Associated.
Chapter Sixteen Analysis of Variance and Covariance 16-1 Copyright © 2010 Pearson Education, Inc.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
UNIT 4-B: DATA ANALYSIS and REPORTING
Analysis of Variance and Covariance
Correlation and Simple Linear Regression
Analysis of Variance and Covariance
Chapter 11 Simple Regression
Correlation and Simple Linear Regression
Analysis of Variance Correlation and Regression Analysis
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
CHAPTER- 17 CORRELATION AND REGRESSION
MOHAMMAD NAZMUL HUQ, Assistant Professor, Department of Business Administration. Chapter-16: Analysis of Variance and Covariance Relationship among techniques.
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Product moment correlation
SIMPLE LINEAR REGRESSION
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Analysis of Variance and Covariance 16-1

Chapter Outline 1)Overview 2)Relationship Among Techniques 3) One-Way Analysis of Variance 4)Statistics Associated with One-Way Analysis of Variance 5)Conducting One-Way Analysis of Variance i.Identification of Dependent & Independent Variables ii.Decomposition of the Total Variation iii.Measurement of Effects iv.Significance Testing v.Interpretation of Results

Chapter Outline 6)Illustrative Applications of One-Way Analysis of Variance 7)Assumptions in Analysis of Variance 8)N-Way Analysis of Variance 9)Analysis of Covariance 10)Issues in Interpretation i.Interactions ii.Relative Importance of Factors iii.Multiple Comparisons 11) Multivariate Analysis of Variance

Relationship Among Techniques Analysis of variance (ANOVA) is used as a test of means for two or more populations. The null hypothesis, typically, is that all means are equal. Analysis of variance must have a dependent variable that is metric (measured using an interval or ratio scale). There must also be one or more independent variables that are all categorical (nonmetric). Categorical independent variables are also called factors.

Relationship Among Techniques A particular combination of factor levels, or categories, is called a treatment. One-way analysis of variance involves only one categorical variable, or a single factor. Here a treatment is the same as a factor level. If two or more factors are involved, the analysis is termed n-way analysis of variance. If the set of independent variables consists of both categorical and metric variables, the technique is called analysis of covariance (ANCOVA). The metric-independent variables are referred to as covariates.

Relationship Amongst Test, Analysis of Variance, Analysis of Covariance, & Regression

One-Way Analysis of Variance Marketing researchers are often interested in examining the differences in the mean values of the dependent variable for several categories of a single independent variable or factor. For example: Do the various segments differ in terms of their volume of product consumption? Do the brand evaluations of groups exposed to different commercials vary? What is the effect of consumers' familiarity with the store (measured as high, medium, and low) on preference for the store?

Statistics Associated with One-Way Analysis of Variance eta 2 (η 2 ). The strength of the effects of X (independent variable or factor) on Y (dependent variable) is measured by eta 2 (η 2 ). The value of η 2 varies between 0 and 1. F statistic. The null hypothesis that the category means are equal is tested by an F statistic. The F statistic is based on the ratio of the variance between groups and the variance within groups. The variances are captured by the Mean squares. This is the sum of squares divided by the appropriate degrees of freedom.

Statistics Associated with One-Way Analysis of Variance SS between. Also denoted as SS x, this is the variation in Y related to the variation in the means of the categories of X. This is variation in Y accounted for by X. SS within. Also referred to as SS error, this is the variation in Y due to the variation within each of the categories of X. This variation is not accounted for by X. SS y. This is the total variation in Y.

Conducting One-Way ANOVA Interpret the Results Identify the Dependent and Independent Variables Decompose the Total Variation Measure the Effects Test the Significance Fig. 16.2

The total variation in Y may be decomposed as: SS y = SS x + SS error, where Y i = individual observation j = mean for category j = mean over the whole sample, or grand mean Y ij = i th observation in the j th category Conducting One-Way ANOVA: Decomposing the Total Variation Y Y SS y =( Y i - Y ) 2  i =1 N SS x = n ( Y j - Y ) 2  j =1 c SS error =  i n ( Y ij - Y j ) 2  j c

Conducting One-Way ANOVA : Decomposition of the Total Variation Independent VariableX Total CategoriesSample X 1 X 2 X 3 …X c Y 1 Y 1 Y 1 Y 1 Y 1 Y 2 Y 2 Y 2 Y 2 Y 2 : : Y n Y n Y n Y n Y N Y 1 Y 2 Y 3 Y c Y Within Category Variation =SS within Between Category Variation = SS between Total Variatio n =SS y Category Mean Table 16.1

Conducting One-Way ANOVA: Measure Effects and Test Significance In one-way analysis of variance, we test the null hypothesis that the category means are equal in the population. H 0 : µ 1 = µ 2 = µ 3 = = µ c The null hypothesis may be tested by the F statistic: This statistic follows the F distribution, with (c - 1) and (N - c) degrees of freedom (df). The effect of X on Y is η 2 =SS x /SS y. F = SS x /(c - 1) SS error /(N - c)

Conducting One-Way ANOVA: Interpret the Results If the null hypothesis of equal category means is not rejected, then the independent variable does not have a significant effect on the dependent variable. On the other hand, if the null hypothesis is rejected, then the effect of the independent variable is significant. A comparison of the category mean values will indicate the nature of the effect of the independent variable.

Illustrative Applications of One-Way ANOVA We illustrate the concepts discussed in this chapter using the data presented in Table The department store chain is attempting to determine the effect of in-store promotion (X) on sales (Y). The null hypothesis is that the category means are equal: H 0 : µ 1 = µ 2 = µ 3.

Effect of Promotion and Clientele on Sales Table 16.2

One-Way ANOVA: Effect of In-store Promotion on Store Sales Table 16.4 Cell means Level of CountMean Promotion High (1) Medium (2) Low (3) TOTAL Source of Sum ofdfMean F ratio F prob Variationsquaressquare Between groups (Promotion) Within groups (Error) TOTAL

Assumptions in Analysis of Variance 1. The error term is normally distributed, with a zero mean 2. The error term has a constant variance. 3. The error is not related to any of the categories of X. 4. The error terms are uncorrelated. If the error terms are correlated, the F ratio can be distorted.

N-Way Analysis of Variance In marketing research, one is often concerned with the effect of more than one factor simultaneously. For example: How do advertising levels (high, medium, and low) interact with price levels (high, medium, and low) to influence a brand's sale? Do educational levels (less than high school, high school graduate, some college, and college graduate) and age (less than 35, 35-55, more than 55) affect consumption of a brand? What is the effect of consumers' familiarity with a department store (high, medium, and low) and store image (positive, neutral, and negative) on preference for the store?

N-Way Analysis of Variance Consider two factors X 1 and X 2 having categories c 1 and c 2. The SS splits as: SS y = SS x1 +SS x2 + SS x1x2 + SS error, where SS x1x2 is SS due to the interaction The significance of the overall effect is tested by an F test where df n =degrees of freedom for the numerator =c 1 c df d =degrees of freedom for the denominator =N - c 1 c 2 F = (SS x 1 + SS x 2 + SS x 1 x 2 )/df n SS error /df d

N-Way Analysis of Variance If the overall effect is significant, the next step is to examine the significance of the interaction effect. Under the null hypothesis of no interaction, the appropriate F test is: Where df n = (c 1 - 1) (c 2 - 1) df d = N - c 1 c 2

N-Way Analysis of Variance The significance of the main effect of each factor may be tested as follows for X 1 : where df n = c df d = N - c 1 c 2 F = SS x 1 /df n SS error /df d

Two-way Analysis of Variance Source ofSum ofMean Sig. of Variationsquares dfsquare F F  Main Effects Promotion Coupon Combined Two-way interaction Model Residual (error) TOTAL Table 16.5

Two-way Analysis of Variance Table 16.5, cont. Cell Means PromotionCoupon Count Mean High Yes High No Medium Yes Medium No Low Yes Low No TOTAL 30 Factor Level Means PromotionCoupon Count Mean High Medium Low Yes No Grand Mean

Analysis of Covariance When examining the differences in the mean values of the dependent variable, it is often necessary to take into account the influence of uncontrolled independent variables. For example: In determining how different groups exposed to different commercials evaluate a brand, it may be necessary to control for prior knowledge. In determining how different price levels will affect a household's cereal consumption, it may be essential to take household size into account. Suppose that we wanted to determine the effect of in-store promotion and couponing on sales while controlling for the affect of clientele. The results are shown in Table 16.6.

Analysis of Covariance Sum ofMeanSig. Source of Variation SquaresdfSquare Fof F Covariance Clientele Main effects Promotion Coupon Combined Way Interaction Promotion* Coupon Model Residual (Error) TOTAL CovariateRaw Coefficient Clientele Table 16.6

Issues in Interpretation Important issues involved in the interpretation of ANOVA results include interactions, relative importance of factors, and multiple comparisons. Interactions The different interactions that can arise when conducting ANOVA on two or more factors are shown in Figure Relative Importance of Factors It is important to determine the relative importance of each factor in explaining the variation in the dependent variable.

A Classification of Interaction Effects Noncrossover (Case 3) Crossover (Case 4) Possible Interaction Effects No Interaction (Case 1) Interaction Ordinal (Case 2) Disordinal Fig. 16.3

Patterns of Interaction Fig Y XXX Case 1: No Interaction X 22 X 21 XXX X 22 X 21 Y Case 2: Ordinal Interaction Y XXX X 22 X 21 Case 3: Disordinal Interaction: Noncrossover Y XXX X 22 X 21 Case 4: Disordinal Interaction: Crossover

Issues in Interpretation: Multiple Comparisons If the null hypothesis of equal means is rejected, we can only conclude that not all of the group means are equal. We may wish to examine differences among specific means. by specifying appropriate contrasts, or comparisons used to determine which of the means are statistically different. A priori contrasts are determined before conducting the analysis, based on the researcher's theoretical framework.

Issues in Interpretation: Multiple Comparisons A posteriori contrasts are made after the analysis. These are generally multiple comparison tests. They enable the researcher to construct generalized confidence intervals that can be used to make pairwise comparisons of all treatment means.

Multivariate Analysis of Variance Multivariate analysis of variance (MANOVA) is similar to analysis of variance (ANOVA), except that instead of one metric dependent variable, we have two or more. In MANOVA, the null hypothesis is that the vectors of means on multiple dependent variables are equal across groups. Multivariate analysis of variance is appropriate when there are two or more dependent variables that are correlated.

Regression Analysis © 2007 Prentice Hall17-33

Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated with Bivariate Regression 4) Conducting Bivariate Regression Analysis i. Scatter Diagram ii. Bivariate Regression Model iii. Estimation of Parameters iv. Standardized Regression Coefficient v. Significance Testing

Chapter Outline vi. Strength and Significance of Association vii. Assumptions 5) Multiple Regression 6)Statistics Associated with Multiple Regression 7)Conducting Multiple Regression i.Partial Regression Coefficients ii.Strength of Association iii.Significance Testing 8)Multicollinearity 9)Relative Importance of Predictors

Product Moment Correlation The product moment correlation, r, summarizes the strength of association between two metric (interval or ratio scaled) variables, say X and Y. It is an index used to determine whether a linear or straight-line relationship exists between X and Y. r =Cov(X,Y)/S X S Y. r varies between -1.0 and The correlation coefficient between two variables will be the same regardless of their underlying units of measurement.

Explaining Attitude Toward the City of Residence Table 17.1

Product Moment Correlation When it is computed for a population rather than a sample, the product moment correlation is denoted by, the Greek letter rho. The coefficient r is an estimator of. The statistical significance of the relationship between two variables measured by using r can be conveniently tested. The hypotheses are:      H 0 :  =0 H 1 :  0

Significance of correlation The test statistic has a t dist with n - 2 degrees of freedom. The r bet. ‘Attitude towards city’ and ‘Duration’ is The value of t-stat is and the df = 12-2 = 10. From the t table (Table 4 in the Stat Appdx), the critical value of t for a two-tailed test and = 0.05 is   Hence, the null hypothesis of no relationship between X and Y is rejected

Partial Correlation A partial correlation coefficient measures the association between two variables after controlling for, the effects of one or more additional variables. Partial correlations have an order associated with them. The order indicates how many variables are being adjusted or controlled.

Partial Correlation The coefficient r xy.z is a first-order partial correlation coefficient, as it controls for the effect of one additional variable, Z. A second-order partial correlation coefficient controls for the effects of two variables, a third-order for the effects of three variables, and so on.

Regression Analysis Regression analysis examines associative relationships between a metric dependent variable and one or more independent variables in the following ways: Determine whether the independent variables explain a significant variation in the dependent variable: whether a relationship exists. Determine how much of the variation in the dependent variable can be explained by the independent variables: strength of the relationship. Determine the structure or form of the relationship: the mathematical equation relating the independent and dependent variables. Predict the values of the dependent variable. Control for other independent variables when evaluating the contributions of a specific variable. Regression analysis is concerned with the nature and degree of association between variables and does not imply or assume any causality.

Statistics Associated with Bivariate Regression Analysis Regression model. Y i = + X i + e i whereY = dep var, X = indep var, = intercept of the line, = slope of the line, and e i is the error term for the i th observation. Coefficient of determination: r 2. Measures strength of association. Varies bet. 0 and 1 and signifies proportion of the variation in Y accounted for by the variation in X. Estimated or predicted value of Y i is i = a + bx where i is the predicted value of Y i and a and b are estimators of and   0  1   0  1   0  1

Statistics Associated with Bivariate Regression Analysis Regression coefficient. The estimated parameter b is usually referred to as the non- standardized regression coefficient. Standard error of estimate. This statistic is the standard deviation of the actual Y values from the predicted values. Standard error. The standard deviation of b, SE b is called the standard error. Y

Statistics Associated with Bivariate Regression Analysis Standardized regression coefficient. Also termed the beta coefficient or beta weight, this is the slope obtained by the regression of Y on X when the data are standardized. Sum of squared errors. The distances of all the points from the regression line are squared and added together to arrive at the sum of squared errors, which is a measure of total error,. t statistic. A t statistic with n - 2 degrees of freedom can be used to test the null hypothesis that no linear relationship exists between X and Y  e j  2

Idea Behind Estimating Regression Eqn A scatter diagram, or scattergram, is a plot of the values of two variables The most commonly used technique for fitting a straight line to a scattergram is the least-squares procedure. In fitting the line, the least-squares procedure minimizes the sum of squared errors,.  e j  2

Conducting Bivariate Regression Analysis Fig Plot the Scatter Diagram Formulate the General Model Estimate the Parameters Estimate Standardized Regression Coefficients Test for Significance Determine the Strength and Significance of Association

Plot of Attitude with Duration Fig Duration of Residence Attitude

Which Straight Line Is Best? Fig Line 1 Line2 3 4

Decomposing the Total Variation Fig X2X1X3 X5 X4 Y X Total Variation SS y Residual Variation SS res Explained Variation SS reg Y

Decomposing the Total Variation The total variation, SS y, may be decomposed into the variation accounted for by the regression line, SS reg, and the error or residual variation, SS error or SS res, as follows: SS y = SS reg + SS res where

Strength and Significance of Association Answers the question: ”What percentage of total variation in Y is explained by X?” R 2 = S S r e g S S y The strength of association is:

Test for Significance The statistical significance of the linear relationship between X and Y may be tested by examining the hypotheses: A t statistic with n - 2 degrees of freedom can be used, where SE b denotes the standard deviation of b and is called the standard error.  H 0 :  1 =0 H 1 :  1  0 t = b SE b

Standardization is the process by which the raw data are transformed into new variables having a mean of 0 and a variance of 1 When the data are standardized, the intercept assumes a value of 0. The term beta coefficient or beta weight is used to denote the standardized regression coefficient, B yx There is a simple relationship between the standardized and non-standardized regression coefficients: B yx = b yx (S x /S y ) Standardized Regression Coefficient

Illustration of Bivariate Regression The regression of attitude on duration of residence, using the data shown in Table 17.1, yielded the results shown in Table a= , b= The estimated equation is: Attitude ( ) = (Duration of residence) The standard error, or standard deviation of b is , and t = / =8.414, with n - 2 = 10 df. From Table 4 in the Statistical Appendix, we see that the critical value of t with 10 df and = 0.05 is Since the calculated value of t is larger than the critical value, the null hypothesis is rejected.   Y

Bivariate Regression Table 17.2 Multiple R R Adjusted R Standard Error ANALYSIS OF VARIANCE dfSum of SquaresMean Square Regression Residual F = Significance of F = VARIABLES IN THE EQUATION Variableb SE b Beta (ß) T Significance of T Duration (Constant)

Strength and Significance of Association The predicted values ( ) can be calculated using Attitude ( ) = (Duration of residence) For the first observation in Table 17.1, this value is: = x 10 = For each observation, we can obtain this value Using these, = , = R 2 =105.95/( )=0.8762, Y Y Y SS reg =( Y i - Y ) 2  i =1 n SS res =( Y i - Y i ) 2  i =1 n

Strength and Significance of Association Another, equivalent test for examining the significance of the linear relationship between X and Y (significance of b) is the test for the significance of the coefficient of determination. The hypotheses in this case are: H 0 : R 2 pop = 0 H 1 : R 2 pop > 0

Strength and Significance of Association The appropriate test statistic is the F statistic: which has an F distribution with 1 and n - 2 degrees of freedom. The value of the F statistic is: F = /( /10) = with 1 and 10 degrees of freedom. The calculated F statistic exceeds the critical value of 4.96 determined from Table 5 in the Statistical Appendix. Therefore, the relationship is significant at = 0.05, corroborating the results of the t test. F = SS reg SS res /(n-2)

Assumptions The error term is normally distributed. For each fixed value of X, the distribution of Y is normal. The means of all these normal distributions of Y, given X, lie on a straight line with slope b. The mean of the error term is 0. The variance of the error term is constant. This variance does not depend on the values assumed by X. The error terms are uncorrelated. In other words, the observations have been drawn independently.

Multiple Regression The general form of the multiple regression model is as follows: which is estimated by the following equation: = a + b 1 X 1 + b 2 X 2 + b 3 X b k X k As before, the coefficient a represents the intercept, but the b's are now the partial regression coefficients. Y  Y=  0 +  1 X 1 +  2 X 2 +  3 X  k X k +ee

Stats Associated with Multiple Reg Coefficient of multiple determination. The strength of association is measured by R 2. Adjusted R 2. R 2, coefficient of multiple determination, is adjusted for the number of independent variables and the sample size. F test. The F test is used to test the null hypothesis that the coefficient of multiple determination in the population, R 2 pop, is zero. The test statistic has an F distribution with k and (n - k - 1) degrees of freedom.

Stats Associated with Multiple Reg Partial regression coefficient. The partial regression coefficient, b 1, denotes the change in the predicted value,, per unit change in X 1 when the other independent variables, X 2 to X k, are held constant. Suppose one was to remove the effect of X 2 from X 1. This could be done by running a regression of X 1 on X 2. In other words, one would estimate the equation 1 = a + b X 2 and calculate the residual X r = (X ). The partial regression coefficient, b 1, is then equal to the bivariate regression coefficient, b r, obtained from the equation = a + b r X r. Y X X Y

The Multiple Regression Equation For data in Table 17.1, suppose we want to explain ‘Attitude Towards City’ by ‘Duration’ and ‘Importance of Weather’ From Table 17.3, the estimated regression equation is: ( ) = X X 2 or Attitude = (Duration) (Importance) Y

Multiple Regression Table 17.3 Multiple R R Adjusted R Standard Error ANALYSIS OF VARIANCE dfSum of SquaresMean Square Regression Residual F = Significance of F = VARIABLES IN THE EQUATION Variableb SE b Beta (ß) T Significance of T IMPORTANCE DURATION (Constant)

Strength of Association The strength of association is measured by R 2, which is similar to bivariate case R 2 = SS reg SS y R 2 is adjusted for the number of independent variables and the sample size by using the following formula: Adjusted R 2 = R 2 - k(1 - R 2 ) n - k - 1

Conducting Multiple Regression Analysis Significance Testing H 0 : R 2 pop = 0 This is equivalent to the following null hypothesis:  H 0 :  1 =  2 =  3 =...=  k =0 The overall test can be conducted by using an F statistic: F = SS reg /k SS res /(n - k - 1) which has an F distribution with k and (n - k -1) degrees of freedom.

Testing for the significance of the can be done in a manner   i 's similar to that in the bivariate case by using t tests. : t = b SE b which has a t distribution with n - k -1 degrees of freedom. Conducting Multiple Regression Analysis Significance Testing

A residual is the difference between the observed value of Y i and the value predicted by the regression equation i. Scattergrams of the residuals, in which the residuals are plotted against the predicted values, i, time, or predictor variables, provide useful insights in examining the appropriateness of the underlying assumptions and regression model fit. The assumption of a normally distributed error term can be examined by constructing a histogram of the residuals. The assumption of constant variance of the error term can be examined by plotting the residuals against the predicted values of the dependent variable, i. Conducting Multiple Regression Analysis Examination of Residuals Y Y Y

A plot of residuals against time, or the sequence of observations, will throw some light on the assumption that the error terms are uncorrelated. Plotting the residuals against the independent variables provides evidence of the appropriateness or inappropriateness of using a linear model. Again, the plot should result in a random pattern. If an examination of the residuals indicates that the assumptions underlying linear regression are not met, the researcher can transform the variables in an attempt to satisfy the assumptions. Conducting Multiple Regression Analysis Examination of Residuals

Multicollinearity Multicollinearity arises when intercorrelations among the predictors are very high. Multicollinearity can result in several problems, including: – The partial regression coefficients may not be estimated precisely. The standard errors are likely to be high. – It becomes difficult to assess the relative importance of the independent variables in explaining the variation in the dependent variable.

Relative Importance of Predictors Statistical significance. If the partial regression coefficient of a variable is not significant, that variable is judged to be unimportant. Square of the partial correlation coefficient. This measure, R 2 yxi.xjxk, is the coefficient of determination between the dependent variable and the independent variable, controlling for the effects of the other independent variables. Measures based on standardized coefficients or beta weights. The most commonly used measures are the absolute values of the beta weights, |B i |, or the squared values, B i 2.

Cross-Validation The available data are split into two parts, the estimation sample and the validation sample. The regression model is estimated using the data from the estimation sample only. The estimated model is applied to the data in the validation sample to predict the values of the dependent variable, i, for the observations in the validation sample. The observed values Y i, and the predicted values, i, in the validation sample are correlated to determine the simple r 2. This measure, r 2, is compared to R 2 for the total sample and to R 2 for the estimation sample to assess the degree of shrinkage. Y Y