Presentation is loading. Please wait.

Presentation is loading. Please wait.

Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition.

Similar presentations


Presentation on theme: "Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition."— Presentation transcript:

1 Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition

2 Note 14 of 5EIntroduction We can use the same basic ideas in simple linear regression to analyze relationships between a dependent variable and several independent variables Multiple regression is an extension of the simple linear regression for investigating how a response y is affected by several independent variables, x 1, x 2, x 3,…, x k. Our objective are –find relationships between y and x 1, x 2, x 3,…, x k –predict y using x 1, x 2, x 3,…, x k

3 Note 14 of 5EExample Fatness (y) may depend on –x 1 = age –x 2 = sex –x 3 = body type Monthly sales (y) of the retail store may depend on –x 1 = advertising expenditure –x 2 = time of year –x 3 = state of economy –x 4 = size of inventory

4 Note 14 of 5E Some Questions Which of the independent variables are useful and which are not? How could we create a prediction equation to allow us to predict y using knowledge of x 1, x 2, x 3 etc? How strong is the relationship between y and the independent variables? How good is this prediction?

5 Note 14 of 5E The General Linear Model y =     x 1 +   x 2 +…+  k x k +  y is the dependent variable.        k        k are unknown parameters x   x   x k x   x   x k are independent predictor variables The deterministic part of the model, E(y) =     x 1 +   x 2 +…+  k x k, E(y) =     x 1 +   x 2 +…+  k x k, x   x   x k  describes average value of y for any fixed values of x   x   x k. The observation y deviates from the deterministic model by an amount   is  random error. We assume random errors are independent normal random variables with mean zero and a constant variance  2

6 Note 14 of 5E The Method of Least Squares Data: n observations on the response y and the independent variables, x 1, x 2, x 3, …x k. The best-fitting prediction equation is We choose our estimates to minimize The computation is usually done by a computer

7 Note 14 of 5E Steps in Regression Analysis When you perform multiple regression analysis, use a step-by step approach: 1.Fit the model to data – estimate parameters. 2.Use the analysis of variance F test and R 2 to determine how well the model fits the data. 3.Check the t tests for the partial regression coefficients to see which ones are contributing significant information in the presence of the others. 4.Use diagnostic plots to check for violation of the regression assumptions. 5. Proceed to estimate or predict the quantity of interest

8 Note 14 of 5E Example A data contains the selling price y (in thousands of dollars), the amount of living area x 1 (in hundreds of square feet), and the number of floors x 2, bedrooms x 3, and bathrooms x 4, for n = 15 randomly selected residences currently on the market. Propertyyx1x1 x2x2 x3x3 x4x4 169.06121 2118.510122 3116.510132 ……………… 15209.921243

9 Note 14 of 5E Minitab Output Regression Analysis: ListPrice versus SqFeet, NumFlrs, Bdrms, Baths The regression equation is ListPrice = 18.8 + 6.27 SqFeet - 16.2 NumFlrs - 2.67 Bdrms + 30.3 Baths Predictor Coef SE Coef T P Constant 18.763 9.207 2.04 0.069 SqFeet 6.2698 0.7252 8.65 0.000 NumFlrs -16.203 6.212 -2.61 0.026 Bdrms -2.673 4.494 -0.59 0.565 Baths 30.271 6.849 4.42 0.001 Estimated regression coefficients Regression equation

10 Note 14 of 5E Minitab Output S = 6.849 R-Sq = 97.1% R-Sq(adj) = 96.0% Analysis of Variance Source DF SS MS F P Regression 4 15913.0 3978.3 84.80 0.000 Residual Error 10 469.1 46.9 Total 14 16382.2 Source DF Seq SS SqFeet 1 14829.3 NumFlrs 1 0.9 Bdrms 1 166.4 Baths 1 916.5 Sequential Sums of squares: conditional contribution of each independent variable to SSR given the variables already entered into the model.

11 Note 14 of 5E Minitab Output Is the overall model useful in predicting list price? How much of the overall variation in the response is explained by the regression model? S = 6.849 R-Sq = 97.1% R-Sq(adj) = 96.0% Analysis of Variance Source DF SS MS F P Regression 4 15913.0 3978.3 84.80 0.000 Residual Error 10 469.1 46.9 Total 14 16382.2 Source DF Seq SS SqFeet 1 14829.3 NumFlrs 1 0.9 Bdrms 1 166.4 Baths 1 916.5 F = MSR/MSE = 84.80 with p-value =.000 is highly significant. The model is very useful in predicting the list price of homes. R 2 =.971 indicates that 97.1% of the overall variation is explained by the regression model.

12 Note 14 of 5E Minitab Output In the presence of the other three independent variables, is the number of bedrooms significant in predicting the list price of homes? Test using  =.05. Regression Analysis: ListPrice versus SqFeet, NumFlrs, Bdrms, Baths The regression equation is ListPrice = 18.8 + 6.27 SqFeet - 16.2 NumFlrs - 2.67 Bdrms + 30.3 Baths Predictor Coef SE Coef T P Constant 18.763 9.207 2.04 0.069 SqFeet 6.2698 0.7252 8.65 0.000 NumFlrs -16.203 6.212 -2.61 0.026 Bdrms -2.673 4.494 -0.59 0.565 Baths 30.271 6.849 4.42 0.001 To test H 0 :    the test statistic is t = -0.59 with p-value =.565. The p-value is larger than.05 and H 0 is not rejected. We cannot conclude that number of bedrooms is a valuable predictor in the presence of the other variables. Perhaps the model could be refit without x 3.

13 Note 14 of 5E Historical Note Where does the name “regression” come from? In 1886, geneticist Francis Galton set up a stand at the Great Exhibition, where he measured the heights of families attending. He discovered a phenomenon called “regression toward the mean”. Seeking laws of inheritance, he found that son’s heights tended to regress toward the mean height of the population, compared to their father’s heights. Tall fathers tended to have somewhat shorter sons, and vice versa. Galton developed regression analysis to study this effect, which he optimistically referred to as “regression towards mediocrity".

14 Note 14 of 5E


Download ppt "Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition."

Similar presentations


Ads by Google