Download presentation
Presentation is loading. Please wait.
Published byJason Lester Modified over 8 years ago
1
Samantha Bellah Adv. Stats Final Project Real Estate Forecasting Regression Model Market: Highland Park Neighborhood Data Sources: Zillow.com E:\PuebloRESales2014Q1Q2.xlsx
2
Sample Data Set
3
Regression Analysis (Model Runs, Variable Selection) SUMMARY OUTPUT Regression Statistics Multiple R0.651399 R Square0.42432 Adjusted R Square0.005644 Standard Error14752.05 Observations20 ANOVA dfSSMSFSignificance F Regression81.76E+092.21E+081.0134810.478104 Residual112.39E+092.18E+08 Total194.16E+09 CoefficientsStandard Errort StatP-valueLower 95% Intercept189836823091300.8221140.428473-3183993 Total SqFt49.3032124.890311.9808190.073167-5.47999 Garage sq ft36.4101274.535670.4884930.634794-127.642 Number floors-1684517836.57-0.944410.365244-56103 Detached-24669.951212.19-0.481720.639446-137387 Attached-1228524717.6-0.497010.628964-66688 Year built-957.5461196.24-0.800460.440388-3590.45 Bedrooms2296.0949728.6980.2360120.81776-19116.6 Lot SqFt3.0471153.1164690.9777460.349214-3.81219 SUMMARY OUTPUT Regression Statistics Multiple R0.610366 R Square0.372546 Adjusted R Square0.254899 Standard Error12769.95 Observations20 ANOVA dfSSMSFSignificance F Regression31.55E+095.16E+083.166630.053216 Residual162.61E+091.63E+08 Total194.16E+09 Coefficients Standard Errort StatP-valueLower 95% Intercept38110.0525772.981.4786820.158642-16526.2 Total SqFt50.4287718.285292.7578870.01400511.66568 Number floors-27994.510230.5-2.736380.014638-49682.2 Lot SqFt3.6900972.3487731.5710740.135729-1.28908
4
Presentation/Description of Final Model The second graph on slide 3, with an R Square of.37 represents my final model. It is not a great fit according to R Square because according to my model, only 37% of the variation in Selling Price is explained by the variables (total square feet, number of floors, and lot square feet). Since the F Significance is pretty low (.05) the results are somewhat statistically significant, but there are probably other predictors that are more reliable. The P-values show that the worst predictor of all the ones from my model is the lot square feet, with a P-value of.14, which is much too high to be very significant to this model. But when I took it out of the model, the significance of F raised to.06 and the R Square went even lower to.28, so the overall model seemed a better fit with the lot square feet included in the data set.
5
Residual Analysis There don’t seem to be any outliers that clearly stand out. The formula used to find the predicted selling price was as follows: Y^= 38110.05+50.4*(total sq ft)-27994.5*(number floors)+3.7*(lot sq ft)
6
Model Application AddressBedroomsTotal SqFtSelling Price# floorsGarageGarage sq ftBathLot SqFtHouse Age 1) 2936 Azalea St 31020$88,0001none02653454 2) 3909 Sheffield Ln41544$120,0003detached2882601143 Y^= 38110.05+50.4*(total sq ft)-27994.5*(number floors)+3.7*(lot sq ft) 1)38110.05+50.4*(1020)-27994.5*(1)+3.7*(6534) =$85699 2)38110.05+50.4*(1544)-27994.5*(3)+3.7*(6011) =$54185 My model for House 1 was about $2000 off for predicting the selling price, while the price for House 2 was cut from the actual selling price by over half. I knew my model was not a great predictor of selling price based on my the results from the regression analysis and I tried to use the different variables I had to run a better model, but the best model I could find with the data I had found originally never got me great results. If I were to do the project again I would probably try running the model with a different variable such as distance from the schools. The best model I could make based on my data obviously wasn’t great because important factors such as number of bedrooms and bathrooms weren’t taken into consideration when determining the selling price.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.