Basics of Decision Trees  A flow-chart-like hierarchical tree structure –Often restricted to a binary structure  Root: represents the entire dataset.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Classification Techniques: Decision Tree Learning
Lecture outline Classification Decision-tree classification.
Classification and Prediction
Classification & Prediction
Classification and Prediction
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification II.
Classification Based in part on Chapter 10 of Hand, Manilla, & Smyth and Chapter 7 of Han and Kamber David Madigan.
Classification and Prediction
Classification.
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Gini Index (IBM IntelligentMiner)
Chapter 7 Decision Tree.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Classification supplemental. Scalable Decision Tree Induction Methods in Data Mining Studies SLIQ (EDBT’96 — Mehta et al.) – builds an index for each.
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to.
Feature Selection: Why?
May 27, 2016Data Mining: Concepts and Techniques1 Chapter 6. Classification and Prediction What is classification? What is prediction? Issues regarding.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification CS 685: Special Topics in Data Mining Fall 2010 Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Classification and Prediction
Bootstrapped Optimistic Algorithm for Tree Construction
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Decision Trees.
1 March 9, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 4 — Classification.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Chapter 6 Decision Tree.
Decision Tree Classification of Spatial Data Streams Using Peano Count Trees Qiang Ding Qin Ding * William Perrizo Department of Computer Science.
DECISION TREES An internal node represents a test on an attribute.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 6 Classification and Prediction
Information Management course
Classification and Prediction
Data Mining: Concepts and Techniques
Basic Concepts and Decision Trees
CS 685: Special Topics in Data Mining Jinze Liu
Classification with Decision Trees
CS 685: Special Topics in Data Mining Jinze Liu
©Jiawei Han and Micheline Kamber
Avoid Overfitting in Classification
Classification.
CS 685: Special Topics in Data Mining Spring 2009 Jinze Liu
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

Basics of Decision Trees  A flow-chart-like hierarchical tree structure –Often restricted to a binary structure  Root: represents the entire dataset  A node without children is called a leaf node. Otherwise is called an internal node. –Internal nodes: denote a test on an attribute –Branch (split): represents an outcome of the test Univariate split (based on a single attribute) Multivariate split –Leaf nodes: represent class labels or class distribution

Example (training dataset)  Three predictor attributes: salary, age, employment  Class label attribute: group Record IDSalaryAgeEmploymentGroup 130K30SelfC 240K35IndustryC 370K50AcademiaC 460K45SelfB 570K30AcademiaB 660K35IndustryA 760K35SelfA 870K30SelfA 940K45IndustryC

Example (univariate split)  Each internal node of a decision tree is labeled with a predictor attribute, called the splitting attribute  each leaf node is labeled with a class label.  Each edge originating from an internal node is labeled with a splitting predicate that involves only the node’s splitting attribute.  The combined information about splitting attributes and splitting predicates at a node is called the split criterion Salary Age Employment Group C Group BGroup A <= 50K> 50K <= 40> 40 Academia, IndustrySelf

Two Phases  Most decision tree generation consists of two phases –Tree construction –Tree pruning

Building Decision Trees  Basic algorithm (a greedy algorithm) –Tree is constructed in a top-down recursive divide- and-conquer manner –At start, all the training examples are at the root –Attributes are categorical (if continuous-valued, they are discritized in advance) –Examples are partitioned recursively based on selected attributes –Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)

Tree-Building Algorithm  We can build the whole tree by calling: BuildTree(dataset TrainingData, split-selection-method CL) Input: dataset S, split-selection-method CL Output: decision tree for S Top-Down Decision Tree Induction Schema (Binary Splits): BuildTree(dataset S, split-selection-method CL) (1) If (all points in S are in the same class) then return; (2) Using CL to evaluate splits for each attribute (3) Use best split found to partition S into S1 and S2; (4) BuildTree(S1, CL) (5) BuildTree(S2, CL)

Split Selection  Information gain / Gain ratio (ID3/C4.5) –All attributes are assumed to be categorical –Can be modified for continuous-valued attributes  Gini index (IBM IntelligentMiner) –All attributes are assumed continuous-valued –Assume there exist several possible split values for each attribute –May need other tools, such as clustering, to get the possible split values –Can be modified for categorical attributes

Information Gain (ID3)  T – Training set; S – any set of cases  freq(C i, S) – the number of cases that belong to class C i  |S| -- the number of cases in set S  Information of set S is defined:  consider a similar measurement after T has been partitioned in accordance with the n outcomes of a test X. The expected information requirement can be found as the weighted sum over the subsets, as  The quantity measures the information that is gained by partitioning T in accordance with the test X. The gain criterion, then, selects a test to maximize this information gain. gain(X) = info(T) – info X (T)

Gain Ratio (C4.5)  Gain criterion has a serious deficiency – it has a strong bias in favor of tests with many outcomes.  We have This represents the potential information generated by dividing T into n subsets, whereas the information gain measures the information relevant to classification that arises from the same division.  Then, expresses the proportion of information generated by the split that is useful, i.e., that appears helpful for classification. Gain ratio(X) = gain(X)/split info(X)

Gini Index (IBM IntelligentMiner)  If a data set T contains examples from n classes, gini index, gini(T) is defined as where p j is the relative frequency of class j in T.  If a data set T is split into two subsets T 1 and T 2 with sizes N 1 and N 2 respectively, the gini index of the split data contains examples from n classes, the gini index gini(T) is defined as  The attribute provides the smallest gini split (T) is chosen to split the node (need to enumerate all possible splitting points for each attribute).

Pruning Decision Trees  Why prune? –Overfitting Too many branches, some may reflect anomalies due to noise or outliers Result is in poor accuracy for unseen samples  Two approaches to prune –Prepruning: Halt tree construction early—do not split a node if this would result in the goodness measure falling below a threshold Difficult to choose an appropriate threshold –Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees Use a set of data different from the training data to decide which is the “best pruned tree”

Well-known Pruning Methods  Reduced Error Pruning  Pessimistic Error Pruning  Minimum Error Pruning  Critical Value Pruning  Cost-Complexity Pruning  Error-Based Pruning

Extracting Rules from Decision Trees  Represent the knowledge in the form of IF-THEN rules  One rule is created for each path from the root to a leaf  Each attribute-value pair along a path forms a conjunction  The leaf node holds the class prediction  Rules are easier for humans to understand  Example –IF salary=“>50k” AND age=“>40” THEN group=“C”

Why Decision Tree Induction  Compared to a neural network or a Bayesian classifier, a decision tree is easily interpreted/comprehended by humans  While training neural networks can take large amounts of time and thousands of iterations, inducing decision trees is efficient and is thus suitable for large training sets  Decision tree generation algorithms do not require additional information besides that already contained in the training data  Decision trees display good classification accuracy compared to other techniques  Decision tree induction can use SQL queries for accessing databases

Tree Quality Measures  Accuracy  Complexity –Tree size –Number of leaf nodes  Computational speed

Scalability  Scalability: Classifying data sets with millions of examples and hundreds of attributes with reasonable speed  SLIQ (EDBT’96 — Mehta et al.) –builds an index for each attribute and only class list and the current attribute list reside in memory  SPRINT (VLDB’96 — J. Shafer et al.) –constructs an attribute list data structure  PUBLIC (VLDB’98 — Rastogi & Shim) –integrates tree splitting and tree pruning: stop growing the tree earlier  RainForest (VLDB’98 — Gehrke, Ramakrishnan & Ganti) –separates the scalability aspects from the criteria that determine the quality of the tree –builds an AVC-list (attribute, value, class label)  BOAT  CLOUDS

Other Issues  Allow for continuous-valued attributes –Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals  Handle missing attribute values –Assign the most common value of the attribute –Assign probability to each of the possible values  Attribute (feature) construction –Create new attributes based on existing ones that are sparsely represented –This reduces fragmentation, repetition, and replication  Incremental tree induction  Integration of data warehousing techniques  Different data access methods  Bias in split selection

Decision Tree Induction Using P-trees  Basic Ideas –Calculate information gain, gain ratio or gini index by using the count information recorded in P-trees. –P-tree generation replaces sub-sample set creation. –Use P-tree to determine if all the samples are in the same class. –Without additional database scan

Using P-trees to Calculate Information Gain/Gain Ratio  C – class label attribute  Ps – P-tree of set S  Freq(C j, S) = rc{Ps ^ Pc(Vc j )}  |S| = rc{Ps}  |Ti| = rc{P T ^P(V Xi )}  |T| = rc{P T }  So every formula of Information Gain and Gain Ratio can be calculated directly using P-trees.

P-Classifier versus ID3 Classification cost with respect to the dataset size