Decision Tree (Rule Induction)

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Decision Tree Approach in Data Mining
Data Mining Classification This lecture node is modified based on Lecture Notes for Chapter 4/5 of Introduction to Data Mining by Tan, Steinbach, Kumar,
Classification and Prediction
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Classification II.
Classification and Prediction
Classification.
Chapter 4 Classification and Scoring
Gini Index (IBM IntelligentMiner)
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Chapter 7 Decision Tree.
Rapid Miner Session CIS 600 Analytical Data Mining,EECS, SU Three steps for use  Assign the dataset file first  Select functionality  Execute.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Mohammad Ali Keyvanrad
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Basic Data Mining Technique
Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Privacy preserving data mining Li Xiong CS573 Data Privacy and Anonymity.
May 27, 2016Data Mining: Concepts and Techniques1 Chapter 6. Classification and Prediction What is classification? What is prediction? Issues regarding.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification CS 685: Special Topics in Data Mining Fall 2010 Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
CS690L Data Mining: Classification
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
1 Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Bayesian Classification
CSE/CIS 787 Analytical Data Mining, Dept. of EECS, SU Three steps for use  Assign the dataset file first  Assign the analysis type you want.
Classification And Bayesian Learning
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Classification and Prediction
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Classification Today: Basic Problem Decision Trees.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Decision Trees.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Decision Tree. Classification Databases are rich with hidden information that can be used for making intelligent decisions. Classification is a form of.
Rapid Miner Session CIS 787 Data Mining,EECS, SU Three steps for use  Assign the dataset file first  Assign the analysis type you want  Execute.
Lecture 10 (big data) Knowledge Induction using association rule and decision tree (Understanding customer behavior Using data mining skills)
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
3. 의사결정나무 Decision Tree (Rule Induction)
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Chapter 6 Classification and Prediction
Three steps for use Sample Datasets Assign the dataset file first
Classification and Prediction
MIS2502: Data Analytics Classification using Decision Trees
Prepared by: Mahmoud Rafeek Al-Farra
CS 685: Special Topics in Data Mining Jinze Liu
(Understanding customer behavior Using data mining skills)
CS 685: Special Topics in Data Mining Jinze Liu
CSCI N317 Computation for Scientific Applications Unit Weka
Decision Tree Concept of Decision Tree
©Jiawei Han and Micheline Kamber
CS 685: Special Topics in Data Mining Spring 2009 Jinze Liu
Decision Tree (Rule Induction)
Decision Tree  Decision tree is a popular classifier.
Decision Tree  Decision tree is a popular classifier.
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

Decision Tree (Rule Induction) Analysis of customer behavior and service modeling Decision Tree (Rule Induction)

Poll: Which data mining technique..?

Classification Process with 10 records Step 1: Model Construction with 6 records Algorithms Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’

Step 2: Test model with 6 records & Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

Who buys notebook computer? Training Dataset is given below: This follows an example from Quinlan’s ID3

Tree Output: A Decision Tree for Credit Approval age? <=30 overcast 30..40 >40 student? yes credit rating? no yes excellent fair no yes yes no

Extracting Classification Rules from Trees Represent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leaf Each attribute-value pair along a path forms a conjunction The leaf node holds the class prediction Rules are easier for humans to understand Example IF age = “<=30” AND student = “no” THEN buys_computer = “no” IF age = “<=30” AND student = “yes” THEN buys_computer = “yes” IF age = “31…40” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”

An Example of ‘Car Buyers’ – Who buys Lexton? no Job M/F Area Age Y/N 1 NJ M N 35 2 F 51 3 OW 31 Y 4 EM 38 5 S 33 6 54 7 49 8 32 9 10 11 12 50 13 36 14 * (a,b,c) means a: total # of records, b: ‘N’ counts, c: ‘Y’ counts

Lab on Decision Tree(1) SPSS Clementine, SAS Enterprise Miner See5/C5.0Download See5/C5.0 2.02 Evaluation from http://www.rulequest.com

Lab on Decision Tree(2) From below initial screen, choose File – Locate Data

Lab on Decision Tree(3) Select housing.data from Samples folder and click open.

Lab on Decision Tree(3(4) This data set is on deciding house price in Boston area. It has 350 cases and 13 variables.

Lab on Decision Tree (5) Input variables crime rate proportion large lots: residential space proportion industrial: ratio of commercial area CHAS: dummy variable nitric oxides ppm: polution rate in ppm av rooms per dwelling: # of room for dwelling proportion pre-1940 distance to employment centers: distance to the center of city accessibility to radial highways: accessibility to high way property tax rate per $10\,000 pupil-teacher ratio: teachers’ rate B: racial statistics percentage low income earners: ratio of low income people Decision variable Top 20%, Bottom 80%

Lab on Decision Tree(6) For the analysis, click Construct Classifier or click Construct Classifier from File menu

Lab on Decision Tree(7) Click on Global pruning to (V ). Then, click OK

Lab on Decision Tree(8) Decision Tree Evaluation with Training data Evaluation with Test data

Lab on Decision Tree(9) Understanding picture We can see that (av rooms per dwelling) is the most important variable in deciding house price.

Lab on Decision Tree(11) 의사결정나무 그림으로는 규칙을 알아보기 어렵다. To view the rules, close current screen and click Construct Classifier again or click Construct Classifier from File menu.

Lab on Decision Tree(12) Choose/click Rulesets. Then click OK.

Lab on Decision Tree(13)