Arko Barman Slightly edited by Ch. Eick COSC 6335 Data Mining

Slides:



Advertisements
Similar presentations
CHAPTER 9: Decision Trees
Advertisements

Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionsSplitting Functions Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
IT 433 Data Warehousing and Data Mining
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification This lecture node is modified based on Lecture Notes for Chapter 4/5 of Introduction to Data Mining by Tan, Steinbach, Kumar,
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Decision Trees in R Arko Barman With additions and modifications by Ch. Eick COSC 4335 Data Mining.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
1 BUS 297D: Data Mining Professor David Mease Lecture 5 Agenda: 1) Go over midterm exam solutions 2) Assign HW #3 (Due Thurs 10/1) 3) Lecture over Chapter.
Spatial and Temporal Data Mining V. Megalooikonomou Introduction to Decision Trees ( based on notes by Jiawei Han and Micheline Kamber and on notes by.
Neural Nets How the brain achieves intelligence 10^11 1mhz cpu’s.
Tree-based methods, neutral networks
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Supervised Learning & Classification, part I Reading: W&F ch 1.1, 1.2, , 3.2, 3.3, 4.3, 6.1*
Chapter 7 Decision Tree.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
Tree-Based Methods (V&R 9.1) Demeke Kasaw, Andreas Nguyen, Mariana Alvaro STAT 6601 Project.
Data Mining: Classification
Fall 2004 TDIDT Learning CS478 - Machine Learning.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Review - Decision Trees
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Scaling up Decision Trees. Decision tree learning.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation COSC 4368.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
CIS671-Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University AI reminders (based on.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Machine Learning (ML) with Weka Weka can classify data or approximate functions: choice of many algorithms.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Classification: Basic Concepts, Decision Trees. Classification Learning: Definition l Given a collection of records (training set) –Each record contains.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Decision Tree Lab. Load in iris data: Display iris data as a sanity.
Diagonal is sum of variances In general, these will be larger when “within” class variance is larger (a bad thing) Sw(iris[,1:4],iris[,5]) Sepal.Length.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Mining Classification: Basic Concepts and Techniques
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Classification by Decision Tree Induction
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Basic Concepts and Decision Trees
Lecture 05: Decision Trees
Decision Trees By Cole Daily CSCI 446.
Statistical Learning Dong Liu Dept. EEIS, USTC.
INTRODUCTION TO Machine Learning 2nd Edition
Decision Tree  Decision tree is a popular classifier.
Decision Tree  Decision tree is a popular classifier.
Arko Barman COSC 6335 Data Mining
Presentation transcript:

Arko Barman Slightly edited by Ch. Eick COSC 6335 Data Mining Decision Trees in R Arko Barman Slightly edited by Ch. Eick COSC 6335 Data Mining

Decision Trees Used for classifying data by partitioning attribute space Tries to find axis-parallel decision boundaries for specified optimality criteria Leaf nodes contain class labels, representing classification decisions Keeps splitting nodes based on split criterion, such as GINI index, information gain or entropy Pruning necessary to avoid overfitting

Decision Trees in R mydata<-data.frame(iris) attach(mydata) library(rpart) model<-rpart(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata, method="class") plot(model) text(model,use.n=TRUE,all=TRUE,cex=0.8)

Decision Trees in R library(tree) model1<-tree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata, method="class", split="gini") plot(model1) text(model1,all=TRUE,cex=0.6)

Decision Trees in R library(party) model2<-ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata) plot(model2)

Controlling number of nodes This is just an example. You can come up with better or more efficient methods! library(tree) model1<-tree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=mydata, method="class", control = tree.control(nobs = 150, mincut = 10)) plot(model1) text(model1,all=TRUE,cex=0.6) Note how the number of nodes is reduced by increasing the minimum number of observations in a child node!

Controlling number of nodes This is just an example. You can come up with better or more efficient methods! model2<-ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = mydata, controls = ctree_control(maxdepth=2)) plot(model2) Note that setting the maximum depth to 2 has reduced the number of nodes!