Multiple Regression Equations Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. HAWKES LEARNING SYSTEMS math courseware specialists Section 12.4 Multiple Regression Equations
Where x1, x2 ,... , xk = the k independent variables in the model HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.4 Multiple Regression Equations Definitions: Multiple Regression Model – a linear regression model using two or more independent variables to predict a dependent variable. To analyze multiple regression equations we will use an ANOVA table as shown in the next example. = b0 + b1 x1 + b2 x2 + … + bk xk Where x1, x2 ,... , xk = the k independent variables in the model b1, b2 ,... , bk = the corresponding coefficients of the independent variables.
HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.4 Multiple Regression Equations Analyze the data: Child’s Age (in Years) Teacher’s Experience (in Years) Parents’ Education Child’s Reading Level 6 5 13.1 1.3 7 10 14.5 2.2 8 16.1 3.7 9 3 12.5 4.1 2 11.8 4.9 11 4 11.1 5.2 12 1 10.2 13 12.6 7.1 14 12.1 8.5 15 14.3 9.7
HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.4 Multiple Regression Equations Solution (continued): Enter the data into Microsoft Excel. Next, choose DATA ANALYSIS from the DATA menu. Choose REGRESSION from the options listed. Enter the necessary information as shown below.
The results will be as follows: HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.4 Multiple Regression Equations Solution (continued): The results will be as follows: b0 b1 b2 b3 Coefficient of Correlation Coefficient of Determination P-value
First, the Regression Statistics are calculated. HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.4 Multiple Regression Equations Solution (continued): First, the Regression Statistics are calculated. “Multiple R” is the correlation coefficient. So R 0.997 “R Square” is the multiple coefficient of determination, R 2 and is analogous to the coefficient of determination r 2. R 2 0.993, signifying that the multiple regression model fits the data very well. “Adjusted R Square” is the adjusted coefficient of determination and is the value of the multiple coefficient of determination adjusted for the number of independent variables and the sample size.
Ha: At least one coefficient does not equal 0 HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.4 Multiple Regression Equations Solution (continued): The next block of data is the ANOVA table which further analyzes how well the regression model fits the sample data. P-value When using the ANOVA table to analyze the statistical significance of the linear relationship between the variables in a multiple regression, we test the claim that at least one of the independent variables’ coefficients is not equal to 0. H0: b1 = b2 = … = bk = 0 Ha: At least one coefficient does not equal 0 IF REJECT Ho there is a significant linear relationship between these variables, the regression model should be used for predictions.
HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.4 Multiple Regression Equations Solution (continued): IF FAIL TO REJECT Ho all of the coefficients of the independent variables equal zero, then this sample data implies that there is not a significant linear relationship between these variables, and therefore, the regression model should not be used for predictions. P-value To test the null hypothesis, consider the p-value given under “Significance F”. p 0.00000062. Since p < a we reject the null hypothesis and conclude that there is sufficient evidence at the 0.05 level of significance to support the claim that this multiple regression model fits the data well and can be used for predictions.
HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.4 Multiple Regression Equations Solution (continued): The last block of information gives the coefficients of the multiple regression equation and the confidence intervals for the coefficients of the independent variables and y-intercept. The first row (marked with pink) is the value and confidence interval for the y-intercept, b0 = –6.983 The second row (marked with blue) is the value and confidence interval for the first independent variable, child’s age, b1 = 0.898 The third row (marked with orange) is the value and confidence interval for the second independent variable, teacher’s experience, b2 = – 0.033 The fourth row (marked with green) is the value and confidence interval for the third independent variable, parents’ education, b3 = 0.232
HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.4 Multiple Regression Equations Solution (continued): Putting the values together, we can construct the multiple regression model for predicting a child’s reading level. = –6.983 + 0.898x1 – 0.033x2 + 0.232x3 Now we can use the regression model to predict the reading level of a child who is 10 years old with a teacher who has 8 years experience and parents with an average of 17.2 years of education. = –6.983 + 0.898(10) – 0.033(8) + 0.232(17.2) = 5.723
Ha: bi ≠ 0 IF p < a we reject the null hypothesis HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.4 Multiple Regression Equations Solution (continued): Finally we will consider the individual independent variables more closely. Look at the p-values for independent variables. The p-values test the null hypothesis that the coefficient of a particular independent variable equals 0. H0: bi = 0 Ha: bi ≠ 0 IF p < a we reject the null hypothesis A small p-value indicates that there is sufficient evidence to support the claim that the coefficient is not 0 and therefore this particular variable has a statistically significant effect upon the dependent variable. Since the p-value for the teacher’s experience 0.535 is greater than 0.05, it may not be useful and we could recalculate the multiple regression model without this variable.
H0: bi = 0 Ha: bi ≠ 0, at least one does not equal to 0 Significance F is the P-value If P-value < a , multiple regression model fits the data If P-value is < 0.05 and 0.000003 is < 0.05. Reject Ho. Linear relationship is statistically significant between X’s and Y.
Copy Data from Hawkes to Excel and Paste in cell A1.
H0: bi = 0 Ha: bi ≠ 0 If P-value < 0.05 , dependent variable is statistically significant P-value (x1) = 0.443 > 0.05, NOT SIGNIFICANT, ELIMINATE P-value (x2) = 0.00559 < 0.05, SIGNIFICANT, KEEP IT
Copy Data from Hawkes to Excel and Paste in cell A1. H0: bi = 0 Ha: bi ≠ 0 Copy Data from Hawkes to Excel and Paste in cell A1.
P-value = 0.0005079
Copy Data from Hawkes to Excel and Paste in cell A1.
H0: b1 = b2 = … = bk = 0 Ha: At least one coefficient does not equal 0
Coefficients can be used for prediction. Reject, Keep Variable H0: bi = 0 Ha: bi ≠ 0 Coefficients can be used for prediction.
P-value = 0.53983
Copy Data from Hawkes to Excel and Paste in cell A1.
H0: b1 = b2 = … = bk = 0 Ha: At least one coefficient does not equal 0
Coefficients have NO use for prediction. H0: bi = 0 Ha: bi ≠ 0 Don’t Reject Don’t Reject Don’t Reject Coefficients have NO use for prediction. H0: bi = 0 Ha: bi ≠ 0