Predict House Sales Price Prof: Meiliu Lu Team: Sindhura Kilaru Mrunal Makwana
Problem Buying a house for each buyer is unprecedented and unparalleled. The main objective is to find the best price your client can sell their house at.
Dataset For Instance, there are 79 different clients needs. Such as, lot area, pool area, utilities and neighborhood. Satisfying all those needs and giving them their choice is tedious.
Data Preprocessing Since, dataset has more number of parameters and there are many irrelevant (null) values. Fetching Only numeric values. 10 Cross Validation. Normalized using, maximum, minimum and scale. Still, finding the best ones from these is a challenge.
Feature Selection The process of selecting a subset of relevant features (variables, predictors) for use in model construction. Boruta package from R It finds relevant features by comparing original attributes importance with importance achievable at random, estimated using their permuted copies.
Feature Selection
Machine Learning Techniques To Find the best results we prefered to implement following techniques. Neural Network Decision Tree Multi Linear regression
Multi linear Regression Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. A linear relationship is assumed between the dependent variable and the independent variables.
Decision Tree Decision Tree for the relevant values A decision tree depicts rules for dividing data into groups. The first rule splits the entire data set into some number of pieces, and then another rule may be applied to a piece, different rules to different pieces, forming a second generation of pieces.
Continue..
Neural Network Results with neural networks depends on the number of chosen hidden layers. Since inputs are huge in number, feeding them in NN with only relevant values, and numerical values were the best options. Neural Network with 15 hidden layers and 40 inputs took 2-3 minutes in our experiment.
Neural Network With only relevant values for train and test data. For Train: steps:74 error: 3.74238 time: 0.21 secs
Continue With only relevant values for train and test data. For Train: steps:74 error: 3.74238 time: 0.21 secs
Continue. For Test: steps: 21 error: 0.478 time: 0.01 secs
Continue. steps: 253 error: 0.53267 time: 2.91 secs
Continue. steps: 253 error: 0.53267 time: 2.91 secs
Reference Dataset: https://www.kaggle.com/c/house-prices-advanced-regression-techniques Boruta Package: http://www.cybaea.net/journal/2010/11/15/Feature- selection-All-relevant-selection-with-the-Boruta-package/ Neural Network: https://www.r-bloggers.com/fitting-a-neural-network-in-r- neuralnet-package/ Multiple linear Regression: Professor’s sample code Decision Tree: http://rstatistics.net/decision-trees-with-r/