Predicting the Market Value of the Property Using JMP® Pro 11

Slides:



Advertisements
Similar presentations
Transformations & Data Cleaning
Advertisements

Chapter 8 Linear Regression © 2010 Pearson Education 1.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Introduction to Data Mining with XLMiner
Chapter 17 Overview of Multivariate Analysis Methods
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Statistical Methods Chichang Jou Tamkang University.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Lecture 6: Multiple Regression
Load Forecasting Eugene Feinberg Applied Math & Statistics Stony Brook University NSF workshop, November 3-4, 2003.
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Lecture 24: Thurs., April 8th
© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms 1 E6893 Big Data Analytics: Financial Market Volatility.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
POSTER TEMPLATE BY: Cluster-Based Modeling: Exploring the Linear Regression Model Space Student: XiaYi(Sandy) Shen Advisor:
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Chapter 5 Data mining : A Closer Look.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Segmentation Analysis
Correlation & Regression
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
Dr. Awad Khalil Computer Science Department AUC
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Mailing Campaign Model Nan Yang University of Central Florida 04/11/2008.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Analyzing and Interpreting Quantitative Data
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Research Methods I Lecture 10: Regression Analysis on SPSS.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Outline Class Intros Overview of Course Example Research Project.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Multivariate statistical methods Cluster analysis.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Prediction of Box Office Gross Revenue
Energy Consumption Forecast Using JMP® Pro 11 Time Series Analysis
Chapter 5: Target Markets: Segmentation and Evaluation
Survey Data Analysis to Understand the Effect of People's Perceptions on Overall Satisfaction Rating of SFO Airport Pratyush Gupta1and Anuashka Sharma2.
CHAPTER 3 Describing Relationships
Huyen Nguyen, Dung Phan, and Girish Shirodkar
Correlation, Bivariate Regression, and Multiple Regression
Customer Segmentation Based on RFM and Predicting Defaulters
Belinda Boateng, Kara Johnson, Hassan Riaz
Propensity Modeling and Targeted Marketing
Inference for Regression
PCB 3043L - General Ecology Data Analysis.
USE OF DATA ANALYTICS TO PREDICT THE DEMAND OF BIKES
Analyzing and Interpreting Quantitative Data
Fundraising Analytics to identify potential prospects using SAS 12.1
Using Data Analytics to Predict Liquor Sales in Iowa State
بحث في التحليل الاحصائي SPSS بعنوان :
Predicting Government Spending on Professional Services
Regression Models - Introduction
Example Histogram c) Interpret the following histogram that captures the percentage of body-fat in a testgroup [4]:  
Analytics: Its More than Just Modeling
MIS2502: Data Analytics Clustering and Segmentation
MIS2502: Data Analytics Clustering and Segmentation
Statistical Data Analysis
Checking the data and assumptions before the final analysis.
Ungraded quiz Unit 5.
Presentation transcript:

Predicting the Market Value of the Property Using JMP® Pro 11 Girish Shirodkar1 and Gaurav Pathak1 1Management Information Systems, Oklahoma State University, Stillwater, OK 74078 Introduction Many individuals and corporations often end up paying more or getting less in a property deal due to limited ken of factors that decide the market value of the Property. For any individual or corporation it is therefore important to understand how particular parameters/characteristics of a property drive its ultimate market value. Very few concrete studies have been done to find which factors eventually decide the value of the property. This study, which is based on the New York City property valuation data, is an attempt to figure out governing factors for the actual market value of the property. Based upon this data and socio-demographic data derived from zip codes, JMP ® Pro 11 is used for prediction of the market value of the property in New York City. Methods Data Preparation The New York City property valuation dataset consists of 11329 observations and 39 variables. It is a multivariate dataset which has missing values. The target variable is a continuous variable explaining the market value of a property. Following steps were taken during data preparation and exploration: Identification of multi-collinear data using multivariate methods Missing Value imputation using appropriate methods Outlier analysis using Mahalanobis distance statistic Fig. 2: K-means clustering Fig. 1: Scatterplot matrix and Ellipsoid 3D Plot showing correlation amongst variables Segment Description 1 Group of properties in NYC where the number of floors are between 3 and 7 2 Group of properties in NYC having actual total value of property between $12 - $22 million and having 10 to 25 floors 3 This segment comprises of the properties whose monthly maintenance is $20000 and above and land cost of the property is greater than $350000 4 The properties whose actual total value of property is greater than $360000 and the front dimensions of the building are greater than 105.41 meter. 5 This segment contains of properties whose dimensions are greater than 92m X 118m and whose land cost is greater than $2800000. 6 This segment contains of properties whose number of floors are in between 0 - 22 and whose land cost is greater than $850000. Clustering and Segment Profiling To segment the properties of New York City in distinct groups several clustering methods were applied on the imputed and transformed data. The newly formed segments were then profiled and characteristics of each segment were understood. After segments were created, different predictive models such as linear regression and decision trees were applied and compared to predict the market value of a property in New York City. Following steps were taken during this phase: Creation and evaluation of clusters using hierarchical and k-means clustering methods (Ref, Fig. 2) ANOVA testing to compare means of variables across segments (Ref. Fig. 3)

Predicting the Market Value of the Property Using JMP® Pro 11 Girish Shirodkar1 and Gaurav Pathak1 1Management Information Systems, Oklahoma State University, Stillwater, OK 74078 Street Map Functionality In order to get real sense of property value distribution across the New York City, We used the newly introduced street map functionality of JMP® Pro 11. From Fig 4 it is clear that the commercial and rented properties are spread evenly across the lengths and breadths of Manhattan. The residential properties are concentrated near the areas like SOHO, Flatiron building, Canal street and Lower Manhattan. Fig 4 shows, the distribution of tax classes properties across all the zips and the width of the color gives the dominating property class by the full value of properties. It can be inferred from the fig 5 that the most expensive residential properties are located in Chelsea, Greenwich and SOHO localities of Manhattan. The most expensive commercial and leased properties are located in the lower Manhattan (Financial District), Times Square and Lower east side of the Manhattan. Fig. 3: Segment profiling for FULLVAL and AVTOT2 Predictive Modeling The data has been cleaned and prepared by adding demographic variables, computation of new variables and transformation of skewed variables. Predictive models such as Forward Linear Regression Model, Backward Linear Regression Model, Stepwise Linear Regression Model, Decision tree and Neural Network have been used and competing models were analyzed and compared with each other. Based on R-squared criterion, the forward regression model with r-squared value of 0.7508 outperformed other models. Along with the property characteristics such as Extended land cost, front and depth measurements of the building in which the property is located, land costs; the demographic variables such as major industry prevailing in the area of the property came out to be important factors which ultimately derive the market value of the property. Conclusion and Discussion Variable transformation improves the performance of clustering algorithms as compared to the usage of skewed variables. Liner regression model outperformed other models such as decision trees and neural network, hence it was selected as a final model Along with the property characteristics such as Extended land cost, front and depth measurements of the building in which the property is located, land costs; the demographic variables such as major industry prevailing in the area of the property came out to be important factors that influence the property value. The properties whose actual total value of property is greater than $360000 and the front dimensions of the building are greater than 105.41 meter, are the highest market value fetching properties. Reference https://nycopendata.socrata.com/Housing-Development/Property-Valuation-and-Assessment-Data/rgy2-tti8 http://www.tax.ny.gov/research/property/assess/manuals/vol6/ref/prclas.htm Acknowledgements Prof. Dr. Goutam Chakraborty, founder of SAS and OSU Business Analytics Program at Oklahoma State University, for his continued support and guidance Fig. 4: Liner Regression model results and parameter estimates