Decision Trees in R Arko Barman With additions and modifications by Ch. Eick COSC 4335 Data Mining.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
IT 433 Data Warehousing and Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Statistics 202: Statistical Aspects of Data Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification Kuliah 4 4/29/2015. Classification: Definition  Given a collection of records (training set )  Each record contains a set of attributes,
Data Mining Classification This lecture node is modified based on Lecture Notes for Chapter 4/5 of Introduction to Data Mining by Tan, Steinbach, Kumar,
Data Mining Classification: Naïve Bayes Classifier
Classification: Basic Concepts and Decision Trees.
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
CSci 8980: Data Mining (Fall 2002)
1 BUS 297D: Data Mining Professor David Mease Lecture 5 Agenda: 1) Go over midterm exam solutions 2) Assign HW #3 (Due Thurs 10/1) 3) Lecture over Chapter.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Arko Barman Slightly edited by Ch. Eick COSC 6335 Data Mining
Data Mining: Classification
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 11 = Finish ch. 4 and start.
DATA MINING LECTURE 9 Classification Basic Concepts Decision Trees.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Classification: Basic Concepts, Decision Trees, and Model Evaluation
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Classification Basic Concepts, Decision Trees, and Model Evaluation
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.
Review - Decision Trees
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation COSC 4368.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification: Basic Concepts, Decision Trees. Classification: Definition l Given a collection of records (training set ) –Each record contains a set.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Classification and Prediction
CIS671-Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University AI reminders (based on.
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4.
1 Illustration of the Classification Task: Learning Algorithm Model.
Big Data Analysis and Mining Qinpei Zhao 赵钦佩 2015 Fall Decision Tree.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Classification: Basic Concepts, Decision Trees. Classification Learning: Definition l Given a collection of records (training set) –Each record contains.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining By Tan, Steinbach,
Decision Tree Lab. Load in iris data: Display iris data as a sanity.
Diagonal is sum of variances In general, these will be larger when “within” class variance is larger (a bad thing) Sw(iris[,1:4],iris[,5]) Sepal.Length.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Presentation prepared by Yehonatan Cohen and Danny Hendler Some of the slides based on the online book “Social media mining” Danny Hendler Advanced Topics.
Illustrating Classification Task
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining Decision Tree Induction
Teori Keputusan (Decision Theory)
Data Mining Classification: Basic Concepts and Techniques
Classification Basic Concepts, Decision Trees, and Model Evaluation
Machine Learning” Notes 2
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Basic Concepts and Decision Trees
آبان 96. آبان 96 Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan,
Statistical Learning Dong Liu Dept. EEIS, USTC.
Arko Barman COSC 6335 Data Mining
COSC 4368 Intro Supervised Learning Organization
COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr
Presentation transcript:

Decision Trees in R Arko Barman With additions and modifications by Ch. Eick COSC 4335 Data Mining

Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting Attributes Training Data Model: Decision Tree

Another Example of Decision Tree categorical continuous class MarSt Refund TaxInc YES NO Yes No Married Single, Divorced < 80K> 80K There could be more than one tree that fits the same data!

Decision Tree Classification Task Decision Tree

Apply Model to Test Data Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data Start from the root of tree.

Apply Model to Test Data Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data

Decision Trees Used for classifying data by partitioning attribute space Tries to find axis-parallel decision boundaries for specified optimality criteria Leaf nodes contain class labels, representing classification decisions Keeps splitting nodes based on split criterion, such as GINI index, information gain or entropy Pruning necessary to avoid overfitting

Decision Trees in R mydata<-data.frame(iris) attach(mydata) library(rpart) model<-rpart(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata, method="class") plot(model) text(model,use.n=TRUE,all=TRUE,cex=0.8)

Decision Trees in R library(tree) model1<-tree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata, method="class", split="gini") plot(model1) text(model1,all=TRUE,cex=0.6)

Decision Trees in R library(party) model2<-ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata) plot(model2)

Controlling number of nodes library(tree) mydata<-data.frame(iris) attach(mydata) model1<-tree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata, method="class", control = tree.control(nobs = 150, mincut = 10)) plot(model1) text(model1,all=TRUE,cex=0.6) predict(model1,iris) Note how the number of nodes is reduced by increasing the minimum number of observations in a child node! This is just an example. You can come up with better or more efficient methods!

Controlling number of nodes model2<-ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = mydata, controls = ctree_control(maxdepth=2)) plot(model2) Note that setting the maximum depth to 2 has reduced the number of nodes! This is just an example. You can come up with better or more efficient methods!

Scaling and Z-Scoring Datasets patched/library/base/html/scale.html patched/library/base/html/scale.html s<-scale(iris[1:4]) t<-scale(s, center=c(5,5,5,5), scale=FALSE) #subtracts the mean-vector and additionally (5,5,5,5) and does not dived by the standard deviation. ote+authentication ote+authentication