DATA MODELING & PREPARATION Biz Pro 9 th Study Group.

Slides:



Advertisements
Similar presentations
Copyright , SPSS Inc. 1 Practical solutions for dealing with missing data Rob Woods Senior Consultant.
Advertisements

DAMA-NCR Tuesday, November 13, 2001 Laura Squier Technical Consultant What is Data Mining?
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material.
Introduction to Data Mining with XLMiner
第三講 Recode、missing value、假設檢定與信賴區間
© Copyright 2000, Julia Hartman 1 An Interactive Tutorial for SPSS 10.0 for Windows © by Julia Hartman Binomial Logistic Regression Next.
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
專案設計步驟圖示 輸入 下拉 下一個 5 設定輸入變數 Inc_Exp 6.
Quantitative Data Analysis Social Research Methods 2109 & 6507 Spring, 2006 March
Canonical Correlation 典型相關 目標 1 – 決定兩組變數 ( 對相同事務的衡量 ) 是否獨立, 或決定這兩組變數之間關係的強度 –Example: Y1+Y2+…+Ym=X1+X2+…Xn ( 一般式 ) Y1, Y2,…Ym 是否與 X1, X2,..,Xn 有相關 / 無相關.
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
03/05/2003 Week #2 江支弘 Measuring Center or Average 量度中心或平均 Stemplot: Mean: 平均數 arithmetic average of observations Median: 中位數 middle value of... (in increasing.
影像處理學習營 練習(四)製作影片 【下】 影像處理學習營 教師 吳清月. 製作影片 學習重點 一. 設定影片秒數。  加入音樂,與影片做結合。 三. 影片存檔方式之選擇  上傳到網路.
計算機概論 - 排序 1 排序 (Sorting) 李明山 編撰 ※手動換頁.
Matlab Assignment Due Assignment 兩個 matlab 程式 : Eigenface : Eigenvector 和 eigenvalue 的應用. Fractal : Affine transform( rotation, translation,
影像處理學習營 練習(三)製作影片 【上】 影像處理學習營 教師 吳清月. 製作影片 學習重點 一. 利用 movie maker 軟體,編輯相片。  加上文字幕,與圖片結合。
下載 Dev C++ Compiler. 網址 ding.php?groupnamehttp://sourceforge.net/project/downloa ding.php?groupname=dev-cpp & filename=devcpp _setup.exe.
Hung-Hsiang WuWindows Processing Design1 Chapter 4 簡單視窗程式 表單與標籤的屬性 按鈕與編輯的屬性 設計簡單的應用程式 表單常用事件 一般鍵盤及滑鼠常見的事件 表單視窗間的呼叫 表單間資料傳送應用 專案選擇設定.
Introduction to SPSS Descriptive Statistics. Introduction to SPSS Statistics Program for the Social Sciences (SPSS) Commonly used statistical software.
Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have.
Cluster Analysis 目的 – 將資料分成幾個相異性最大的群組 基本問題 – 如何衡量事務之間的相似性 – 如何將相似的資料歸入同一群組 – 如何解釋群組的特性.
Introduction. 1.Data Mining and Knowledge Discovery 2.Data Mining Methods 3.Supervised Learning 4.Unsupervised Learning 5.Other Learning Paradigms 6.Introduction.
Institutional Market Services Client Services 進階推廣資料進階推廣資料 n 公式代入 : 股票市價 = 每股月收盤價 x 普通股股數 n 篩選某一年代下市公司 ~1994 北美下市公司 n 如何查詢相同國家的公司? n 如何利用 CUSIP 、
牽涉兩個變數的 Data Table 汪群超 11/1/98. Z=-X 2 +4X-Y 2 +6Y-7 觀察 Z 值變化的 X 範圍 觀察 Z 值變化的 Y 範圍.
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
1 Chapter 1: Introduction 1.1 Introduction to SAS Enterprise Miner.
Chapter 1: Introduction
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
Dr. Awad Khalil Computer Science Department AUC
Overview DM for Business Intelligence.
Introduction: The essential background
The CRISP-DM Process Model
Introduction to SPSS Edward A. Greenberg, PhD
SPSS Presented by Chabalala Chabalala Lebohang Kompi Balone Ndaba.
IBM SPSS Information Factory A SELECT INTERNATIONAL COMPANY.
資訊技術實驗室 ITLAB. Download 2009/3/52 Visual studio c 安裝教學.
Chartese Jones - MCIS Department - Mississippi Valley State University Mentor: Dr. Raymond Williams, Mississippi Valley State University What effect does.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Business Intelligence and Decision Modeling Week 11 Predictive Modeling (2) Logistic Regression.
Methodology Qiang Yang, MTM521 Material. A High-level Process View for Data Mining 1. Develop an understanding of application, set goals, lay down all.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
XLMiner – a Data Mining Toolkit QuantLink Solutions Pvt. Ltd.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Loan Default Model Saed Sayad 1www.ismartsoft.com.
Consul- ting Services Outsour- cing Services Techno- logy Services Local Profes- sional Services Competence Centers Business Intelligence WebTech SAP.
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
BY SANDY. WHAT IS DATAMINING TYPES OF DATAMINING TOOLS OVERVIEW OF TIBCO TIBCO SPOTFIRE MINER DATA ANALYSIS EXPLORE DATA MANIPULATE DATA CHART VIEW.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Statistical Analysis – Part 3
SNS COLLEGE OF TECHNOLOGY
XLMiner – a Data Mining Toolkit
An Interactive Tutorial for SPSS 10.0 for Windows©
USE OF DATA ANALYTICS TO PREDICT THE DEMAND OF BIKES
SPSS Assignment Help. Sage-Fox.com Free PowerPoint Templates SPSS is an abbreviation to Statistical Package for Social Science. It’s a windows based software.
Dr. Satish Nargundkar Georgia State University
Vincent Granville, Ph.D. Co-Founder, DSC
Machine Learning Week 1.
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Prepared by: Mahmoud Rafeek Al-Farra
Machine Learning with Weka
Analytics: Its More than Just Modeling
Data Mining Overview.
Presentation transcript:

DATA MODELING & PREPARATION Biz Pro 9 th Study Group

DM Task Formulation Data Collection Data Cleaning Data Exploration Data Modeling Define PurposeModel Selection Perform. Evaluation Model Deployment Initial Phase 90% Efforts Middle Phase 90% Professions Final Phase 90% Domains

Data Preparation and Exploration Data Preparation Data Visualization Dimension Reduction Prediction Linear Regression K-Nearest Neighbors Neural Networks Classification K-Nearest Neighbors Decision Trees Logistic Regression Neural Networks Unsupervised Cluster Analysis Performance Evaluation Cross Validation Performance Measures Time Series Forecasting Regression Methods Smoothing Methods Linear Processes Non-linear Processes Model Deployment Scoring New Data Domain Expertise Analysis

SPSS is a Windows based program Perform data entry and analysis and to create tables and graphs After SPSS 18.0, the company is acquisited by IBM (2009)

Menu Bar & Icons Data View/ Variable View

Just like Spreadsheet In EXCEL

The Variable View allows you to name your variables, to identify missing values, assign variable and value labels etc.

檔案 >> 開啟 >> 資料

Most of the time, the given variables are not coded as we need. Example – Gender = “Male” and “Female” But when modeling, we want “Is_male” = 1 or 0 Example – Score = 0 – 100 (Raw Score) But when analyzing, we want “grades” = A,B,C,D,X Example – Binning a Continuous Variables (Discretization)

轉換 >> 重新編碼成同一變數

轉換 >>Visual Binning/ 最適 Binning

轉換 >> 建立虛擬變數 (Need Python Essentials) 重新編碼成同一 / 不同變數 (Spend some time…)

資料 >> 選擇觀察值 (Only Numeric Variable Works)

資料 >> 合併檔案 >> 新增變數 (Need Same Variable Names)

Merged files according to start_station Merged files according to end_station

END! Thanks!