Analysis for Predicting the Selling Price of Apartments Pratik Nikte

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Chapter 10 Simple Regression.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Classification and Prediction: Regression Analysis
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Spreadsheet Modeling & Decision Analysis A Practical Introduction to Management Science 5 th edition Cliff T. Ragsdale.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Regression and Correlation Methods Judy Zhong Ph.D.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Correlation & Regression Analysis
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Data Mining and Decision Support
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data analysis tools Subrata Mitra and Jason Rahman.
PREDICTION Elsayed Hemayed Data Mining Course. Outline  Introduction  Regression Analysis  Linear Regression  Multiple Linear Regression  Predictor.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Restaurant Revenue Prediction using Machine Learning Algorithms
Machine Learning with Spark MLlib
Chapter 14 Introduction to Multiple Regression
JMP Discovery Summit 2016 Janet Alvarado
Chapter 7. Classification and Prediction
Introduction to Machine Learning and Tree Based Methods
Eco 6380 Predictive Analytics For Economists Spring 2016
A linear approach to predicting house prices
Kin 304 Regression Linear Regression Least Sum of Squares
CSE 4705 Artificial Intelligence
USE OF DATA ANALYTICS TO PREDICT THE DEMAND OF BIKES
BPK 304W Regression Linear Regression Least Sum of Squares
Predict House Sales Price
Using Data Analytics to Predict Liquor Sales in Iowa State
Employee Turnover: Data Analysis and Exploration
Simple Linear Regression
Machine Learning Feature Creation and Selection
Regression Analysis Week 4.
Dr. Morgan C. Wang Department of Statistics
Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.
What is Regression Analysis?
The Multiple Regression Model
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Product moment correlation
Somi Jacob and Christian Bach
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
EQUATION 4.1 Relationship Between One Dependent and One Independent Variable: Simple Regression Analysis.
Model generalization Brief summary of methods
CS 412 Term Project Presantation
Jia-Bin Huang Virginia Tech
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
March Madness Data Crunch Overview
Credit Card Fraudulent Transaction Detection
Presentation transcript:

Analysis for Predicting the Selling Price of Apartments Pratik Nikte Feature Selection Abstract In our analysis we have used Ames Iowa Housing Dataset from the Kaggle website to analyze and predict the sale price of the house. We have used machine learning techniques like Linear Regression, Random Forest and XGBoost Algorithms. Introduction Using the dataset problem we are trying to solve here is to build models to predict house sale prices. Based on the dataset, we can understand how some variables have direct impact on the sale price of the house. We used in-built machine learning libraries in R-programming Language while working on this problem. Figure 4: Multi-Correlation Matrix Feature Selection Parameters Impact on Sale Price Model Selection & Comparison 46 Categorical Variables 33 Continuous Variables 79 Total Variables Linear Regression Linear Regression model assumes that the relationship between dependent (sale price) and independent (features) variable is linear Project Plan Random Forest Random Forest does it own feature selection & create multiple decision trees which will help us to determine the accuracy to find the sale-price XGBoost We use the training data to predict target variable. With every iteration XGBboost minimizes the error rate compared to the first tree Figure 1: Project Plan Exploratory Data Analysis Choose the parameters for the model param <- list(colsample_bytree = .7, subsample = .7, booster = "gbtree", max_depth = 10, eta = 0.02, eval_metric = "rmse", objective="reg:linear") Fixing Na’s -> 0 value Data Formatting: Categorical -> Numeric Variables Transformed Sale Price to log(Sale Price+1) as our result analysis is based on RMSE log error To reduce the effect of high-end outliers Results As our result showed us that XGBoost had the lowest RMSE score and it was more accurate than the other models We evaluated XGBoost model by performing cross validation. We implemented XGBoost model on our test dataset and predicted the Sale Price Figure 2: Right Skewed Distribution Figure 3: Normalized Distribution