Samantha Bellah Adv. Stats Final Project Real Estate Forecasting Regression Model Market: Highland Park Neighborhood Data Sources: Zillow.com E:\PuebloRESales2014Q1Q2.xlsx
Sample Data Set
Regression Analysis (Model Runs, Variable Selection) SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations20 ANOVA dfSSMSFSignificance F Regression81.76E E Residual112.39E E+08 Total194.16E+09 CoefficientsStandard Errort StatP-valueLower 95% Intercept Total SqFt Garage sq ft Number floors Detached Attached Year built Bedrooms Lot SqFt SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations20 ANOVA dfSSMSFSignificance F Regression31.55E E Residual162.61E E+08 Total194.16E+09 Coefficients Standard Errort StatP-valueLower 95% Intercept Total SqFt Number floors Lot SqFt
Presentation/Description of Final Model The second graph on slide 3, with an R Square of.37 represents my final model. It is not a great fit according to R Square because according to my model, only 37% of the variation in Selling Price is explained by the variables (total square feet, number of floors, and lot square feet). Since the F Significance is pretty low (.05) the results are somewhat statistically significant, but there are probably other predictors that are more reliable. The P-values show that the worst predictor of all the ones from my model is the lot square feet, with a P-value of.14, which is much too high to be very significant to this model. But when I took it out of the model, the significance of F raised to.06 and the R Square went even lower to.28, so the overall model seemed a better fit with the lot square feet included in the data set.
Residual Analysis There don’t seem to be any outliers that clearly stand out. The formula used to find the predicted selling price was as follows: Y^= *(total sq ft) *(number floors)+3.7*(lot sq ft)
Model Application AddressBedroomsTotal SqFtSelling Price# floorsGarageGarage sq ftBathLot SqFtHouse Age 1) 2936 Azalea St 31020$88,0001none ) 3909 Sheffield Ln41544$120,0003detached Y^= *(total sq ft) *(number floors)+3.7*(lot sq ft) 1) *(1020) *(1)+3.7*(6534) =$ ) *(1544) *(3)+3.7*(6011) =$54185 My model for House 1 was about $2000 off for predicting the selling price, while the price for House 2 was cut from the actual selling price by over half. I knew my model was not a great predictor of selling price based on my the results from the regression analysis and I tried to use the different variables I had to run a better model, but the best model I could find with the data I had found originally never got me great results. If I were to do the project again I would probably try running the model with a different variable such as distance from the schools. The best model I could make based on my data obviously wasn’t great because important factors such as number of bedrooms and bathrooms weren’t taken into consideration when determining the selling price.