1 Data Mining dr Iwona Schab Decision Trees. 2 Method of classification Recursive procedure which (progressively) divides sets of n units into groups.

Slides:



Advertisements
Similar presentations
Chapter 7 Classification and Regression Trees
Advertisements

Data Mining Lecture 9.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Classification Algorithms
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
1 2 Statistical methods for scorecard development 2.1 Methodologies used in credit granting Judgmental evaluation –5C’s – character, capital, collateral,
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Lecture Notes for Chapter 4 Introduction to Data Mining
Introduction to Predictive Learning
Decision Tree Algorithm
Basic Data Mining Techniques Chapter Decision Trees.
Tree-based methods, neutral networks
Basic Data Mining Techniques
Lecture 5 (Classification with Decision Trees)
Classification.
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Decision Tree Learning
Introduction to Directed Data Mining: Decision Trees
Classification Part 4: Tree-Based Methods
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Decision Trees.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
K Nearest Neighbors Classifier & Decision Trees
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining and Decision Support
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Classification and Regression Trees
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Introduction to Data Mining, 2nd Edition by
Classification and Prediction
Lecture 05: Decision Trees
Statistical Learning Dong Liu Dept. EEIS, USTC.
Text Categorization Berlin Chen 2003 Reference:
©Jiawei Han and Micheline Kamber
Presentation transcript:

1 Data Mining dr Iwona Schab Decision Trees

2 Method of classification Recursive procedure which (progressively) divides sets of n units into groups accoridng to a division rule Designed for supervised prediction problems (i.e. a set of input variables is used to prodict the value of target variable The primary goal is prediction The fitted tree model is used for target variable prediction for new cases (i.e. to score new cases/data) Result: a final partition of the observations  the Boolean rules needed to score new data

3 Decision Tree A predictive model represented in a tree-like structure Root node A split based on the values of the input Terminal node – the leaf Internal node

4 Decission tree Nonparametric method Allows for nonlinear relationships modelling Sound concept, Easy to interpret Robustness against outliers Detection and taking into accout of potential interactions between input variables Additional implementation: categorisation of continiuos variables, grouping of nopminal valueds

5 Decision Trees Types: Classification trees (Categorical response variable)  the leafs give the predicted class and the probability of class membership Regression trees (Continous response variable)  the leafs give the predicted value of the target Exemplary applications: Handwriting recognition Medical research Financial and capital markets

6 Decision Tree The path to each leaf expresses as a Boolean rule: if … then … The ’regions’ of the input space determined by the split values Intersections of subspaces defined by a single splitting variable Regression tree model is a multivariate step function Leaves represent the perdicted target All cases in a particular leaf are given the same predicted target Splits: Binary Multiway splits (inputs partitioned into disjoined ranges)

7 Analytical decision Recursive partitioning rule / splitting criterion Pruning criterion / stopping criterion Assignement of predicted target variable

8 Recursive partitioning rule Method used to fit the tree Top-dow, greedy algorithm Starts at the root node Splits involving each single input are examined Disjoint subsets of nominal inputs Disjoint ranges of ordinal / interval inputs The spliting criterion Measures the reduction in variability of the target distribution in the child node used to choose the split The split choosed determines the partitioning of the observations Partition repeted in each child node as if it were a root node of a new tree The partition continues deeper in the tree – the process is repeated recursively until is stopped by the stopping rule

9 Splits on (at least) ordinal input

10 Splits on nominal input

11 Binary splits

12 Partitioning rule – possible variations Incorporating some type of look-ahead or backup Often produce inferior trees have not been shown to be an improvement, Murthy and Salzberg, 1995) Oblique splits Splits on lienear combination of inputs (as apposite to the standard coordinte-axis splits. i.e. boundaries parallel to the input coordinates)

13 Recursive partitioning alghorithm

14 Stopping criterion Governs the depth and complexity of the tree Right balance bewteen depth and complexity When the tree is to complex: Perfect discriminantion in the training sample Lost stability Lost ability to generalise discovered patterns and relations Overfitted to the trainig sample Difficulties with interpretation of prodictive rules Trade-off beetwen the adjustment to the training sample and ability to generalise

15 Splitting criterion Impurity reduction Chi-square test An exhaustive tree algorithm considers: all possible partitions Of all inputs At every node  combinatorial explosion

16 Spliting criterion Minimise impurity within child nodes / maximise differencies between newly splited child nodes  chose the split into child nodes which: maximises the drop in inpurity resulting from the parnets node partition Maximises difference between nodes Measures of impurity: Basic ratio Gini impurity index Entropy Measures of difference Based on relative frequencies (classification tree) Based on target variance (regression tree)

17 Binary Decision trees Nonparamemetric model  no assumptions regarding distribution needed Classifies observations into pre-defined groups  target variable predited for the whole leafe Supervised segmentation In the bacis case: recoursive partition into two separate categories in order to maximise similarities of observation within the leaf and maximise differencies between leaves Tree model = rules of segmentation No previous selection of input variable

18 Trees vs hierarchical segmentation Hierarchical segmentation Descriptive apparoach Unsupervised classification Segmentation based on all variables Each partitioning based on all variable at the time – based on distance measure Trees Predictive appraoch Supervised classification Segmentation based on target variable Each partitioning based on one variable at the time (usually)

19 Requirements Large data sample In case of classification trees: sufficient number of cases falling into each class of target (suggeested: min 500 cases per class)

20 Stopping criterion The node reaches pre-defined size (e.g 10 or less cases) The algorithm has run the predefined number of generations The split results in (too) small drop of impurity Expectes losses in the testing sample Stability of resuls in the testing sample Probabilistic assumptions regarding the variables (e.g. CHAID algorithm)

21 Target assignement to the leaf

22 Disadvantages Lack of stability (often) Stability assessment on the basis of testing sample, without formal statistical inference In case of classification tree: target value calculated in the separate step with a „simplistic” method ( dominating frequency assignement) Target value calculated on the leaf level, not on the individual observation level

Drop of impurity ΔI Basic Impurity Index Average impurity of child nodes Spliting Example

Gini Impurity Index Entropy Pearson’s test for relative frequencies Spliting Example

Age#G#G#B#B Odds (of beeing good) Young : 1 Medium : 1 Older : 1 Total : 1 How to split the ordinal (in this case) variable „age”?  (young+older) vs. medium?  (young+medium) vs. older? Spliting Example

1. Young + Older= r versus Medium = l I(v)=min{400/2000 ;1600/2000}=0,2 p(r) = 1400/2000=0,7 p(l) = 600/2000=0,3 I(r) = 300/1400 I(l) = 100/600 Spliting Example

2. Young + Medium= r versusOlder = l i(v)=min{400/2000 ;1600/2000}=0,2 p(r) = 1600/2000=0,8 p(l) = 400/2000=0,2 I(r) = 300/1600 I(l) = 100/400 Spliting Example

1. Young + Older= r versus Medium = l p(r) = 1400/2000=0,7 p(l) = 600/2000=0,3 Spliting Example

2. Young + Medium= r versus Older= l p(r) = 1600/2000=0,8 p(l) = 400/2000=0,2 Spliting Example