STA 282 – Regression Analysis Multiple Linear 3-Dec-18 Jack Hill PhD
What Is Multiple Regression Analysis? Multiple Regression Analysis is a statistical tool to help Determine the relationship or correlation of a dependent variable and two or more independent variables Determine the Regression Equations (Deterministic and Probabilistic) for predicting values not in our dataset Regression Analysis is found in Excel 2007 and Excel 2010: Data Tab > Data Analysis > Regression 3-Dec-18 Jack Hill PhD
What Is Multiple Regression Analysis? The difference between Simple and Multiple Regression is simply the number of Independent Variables in your data set Simple Regression: A single Dependent (Y) variable and a single Independent (X) variable – a single column of data is highlighted for each variable Multiple Regression: A single Dependent (Y) variable and two or more Independent (X) variables – a single column of data is highlighted for the Dependent (Y) variable and two or more columns of data are highlighted for the Independent (X) variables 3-Dec-18 Jack Hill PhD
Underlying Assumption The assumption in Linear Regression Analysis is that the relationship is a “straight-line” and can be modeled using the generalized Linear equation: yi = β0 + β1xi1 + β2xi2 + β3xi3 +. . .+ βnxin + εi Where yi is dependent variable, β0 is the intercept, β1-3n are the slopes, Xi1-3n are the independent variables, and εi is the error in the measurements. 3-Dec-18 Jack Hill PhD
An Example Using Excel’s Multiple Regression Analysis Using prior home sales to predict selling price An Example Using Excel’s Multiple Regression Analysis 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis You are a realtor, and you hypothesize a relationship between the Dependent Variable (Sale Price) and three Independent Variables (House Size, #Bedrooms, and Lot size) that could predict the Selling Price of homes Stating the Null Hypothesis: “There is no significant relationship (correlation) between the Dependent and Independent variables.” Because there are more than 2 independent variables, you would use Excel’s Multiple Regression Analysis To test the significance of the relationship among variables To determine the Regression Equations 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis State the Null Hypothesis: “There is no significant relationship (correlation) between the Dependent and Independent variables.” To test the Null Hypothesis, you decide to analyze 10 of your recent home sales: 1. Enter measurements: Dependent Variable (Y): Sale Price Independent Variables (X): Lot Size, House Size, and # Bedrooms 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis 2.Select Data Tab 3. Data Analysis 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis 5. Click OK 4. Scroll Down and Select Regression 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis Regression Parameter Input Window 6. Highlight Dependent Variable (Y), Including the Column Label as the Input Y Range A 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis Regression Parameter Input Window 7. Highlight Independent Variable (X), Including the Column Label as the Input X Range A 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis Regression Parameter Input Window 8. Check Labels Box 10. Click OK 9. Select an Output Cell and Click Output Range 9. Cell 3-Dec-18 Jack Hill PhD
Regression Output Table Interpreting the Regression Statistics Regression Output Table 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis The Multiple R (Correlation Coefficient) of 0.945 indicates a “strong” correlation between the Dependent and Independent Variables. The Adjusted R Square (Determination Coefficient) of 0.839 indicates that 83.9% of the variation in the Dependent Variable is explainable by the Regression Equation. Conversely, 16.1% of the variation is due to other factors, such as, random errors. 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis Because Significant-F (0.0026) < 0.05, REJECT the Null Hypothesis and conclude there is a significant relationship (correlation) between the Dependent and Independent Variables. F-Critical: =FINV(0.05,3,6) Because F-value (16.60) > F-critical (4.76), REJECT the Null Hypothesis and conclude there is a significant relationship (correlation) between the Dependent and Independent Variables. 3-Dec-18 Jack Hill PhD
Predicting the Selling Price Regression Equations 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis Because the F-value > F-Critical, we can REJECT the Null Hypothesis and conclude there is a significant relationship among variables. The other statistics support a high degree of correlation. So, what does this mean? Our original goal was to use prior home sales to predict a selling price for a 1,900 sqft home with a 18,000 sqft Lot and 3 Bedrooms We can use the Regression Equations to predict a suggested selling price 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis The Deterministic Regression Equation using Coefficients below: Sale Price = β0 + β1*Lot Size + β2*House Size + β3*Bedrooms Sale Price = 14,196.5 + 20.022*Lot Size + 4.401*House Size + 2893.44*Bedrooms Note: NO Error Term 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis The Standard Error is the amount of error in our predictions ε = ± Std Error/2 ε = ± 13,149 The Probabilistic Regression Equation using Coefficients below : Sale Price = β0 + β1*Lot Size + β2*House Size + β3*Bedrooms + ε Sale Price = 14,196.5 + 20.022*Lot Size + 4.401*House Size + 2893.44*Bedrooms ± 13,149 3-Dec-18 Jack Hill PhD
Multiple Regression Analysis The Regression Equations help predict a Sale Price: Sale Price = 14,196.5 + 20.022*Lot Size + 4.401*House Size + 2893.44*Bedrooms Then, for a 1,900 sqft House on a 18,000 sqft Lot with 3 Bedrooms Deterministic Equation predicts a Selling Price w/o Error: Sale Price = 14,196.5 + 20.022*18,000 + 4.401*1,900 + 2893.44*3 Sale Price = $391,832 Probabilistic Equation predicts a Selling Prices w/ Error: Sale Price = $391,832 ± $13,149 When this house sells, all other factors being equal, the Sale Price should be between $378,683 and $404,981. 3-Dec-18 Jack Hill PhD
STA 282 – Regression Analysis Multiple Linear 3-Dec-18 Jack Hill PhD