DATA MODELING & PREPARATION Biz Pro 9 th Study Group
DM Task Formulation Data Collection Data Cleaning Data Exploration Data Modeling Define PurposeModel Selection Perform. Evaluation Model Deployment Initial Phase 90% Efforts Middle Phase 90% Professions Final Phase 90% Domains
Data Preparation and Exploration Data Preparation Data Visualization Dimension Reduction Prediction Linear Regression K-Nearest Neighbors Neural Networks Classification K-Nearest Neighbors Decision Trees Logistic Regression Neural Networks Unsupervised Cluster Analysis Performance Evaluation Cross Validation Performance Measures Time Series Forecasting Regression Methods Smoothing Methods Linear Processes Non-linear Processes Model Deployment Scoring New Data Domain Expertise Analysis
SPSS is a Windows based program Perform data entry and analysis and to create tables and graphs After SPSS 18.0, the company is acquisited by IBM (2009)
Menu Bar & Icons Data View/ Variable View
Just like Spreadsheet In EXCEL
The Variable View allows you to name your variables, to identify missing values, assign variable and value labels etc.
檔案 >> 開啟 >> 資料
Most of the time, the given variables are not coded as we need. Example – Gender = “Male” and “Female” But when modeling, we want “Is_male” = 1 or 0 Example – Score = 0 – 100 (Raw Score) But when analyzing, we want “grades” = A,B,C,D,X Example – Binning a Continuous Variables (Discretization)
轉換 >> 重新編碼成同一變數
轉換 >>Visual Binning/ 最適 Binning
轉換 >> 建立虛擬變數 (Need Python Essentials) 重新編碼成同一 / 不同變數 (Spend some time…)
資料 >> 選擇觀察值 (Only Numeric Variable Works)
資料 >> 合併檔案 >> 新增變數 (Need Same Variable Names)
Merged files according to start_station Merged files according to end_station
END! Thanks!