A linear approach to predicting house prices Jeno Yamma
Key Takeaway Why exploring the data is very important Provide an understanding of the initial approach to a dataset What distributions could tell us about the data Using a linear regression as oppose to a more complicated model Understanding the model evaluation method, root mean square log error Reducing Bias and Variance Evaluating the model’s prediction
Why data exploration is important Business: Come up with an optimal business solution to the problem Data exploration allows us to: check if the data given is appropriate (integrity, missing values, outliers etc.) Brainstorm ideas See the behavior of the data Communicate the data to stakeholders to help explain your prediction Different tools you can use: R, Python, Tableau, Excel!
Initial approach
Sit and wait for it to load…
Did your tool read in the data correctly? Do a simple data type check after reading in the data Important to correct this first: leads to a better model performance and exploration Should we change anything?
Statistical summaries are important Gives us an idea of the mean and spread of each variables What is this and this ?
Missing Values Keep or remove?
Checking variable relationships – scatterplot Do you know how to read this?
Easier to read Can you find the most significant relationship?
Checking the spread of the data using boxplots
or Distributions A house with 8 bathrooms…
Always be curious about your data 8 bathrooms, only 4 rooms, built in 1928…damn 8 rooms, a small number of bathrooms, and the land size is close to the average land size…
Exploring the categorical section
Sales Date
Property Type
Seller
Exploring the categorical section
Do some more explorations and try to understand your data Do some more explorations and try to understand your data. For simplicity use Tableau or Excel
Let’s do some Machine Learning… (just a fancy way of saying let’s do some linear regression)