CSE5230/DMS/2003/7 Data Mining - CSE5230 Decision Trees.

Slides:



Advertisements
Similar presentations
CHAPTER 9: Decision Trees
Advertisements

Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Induction and Decision Trees. Artificial Intelligence The design and development of computer systems that exhibit intelligent behavior. What is intelligence?
Decision Tree Rong Jin. Determine Milage Per Gallon.
Decision Tree Algorithm
CSE Data Mining, 2004Lecture 6.1 Data Mining - CSE5230 Classifiers 2 Decision Trees CSE5230/DMS/2004/6.
Basic Data Mining Techniques Chapter Decision Trees.
Ensemble Learning: An Introduction
Induction of Decision Trees
Data Mining.
Basic Data Mining Techniques
Three kinds of learning
A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T Y
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Classification.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 5 Data mining : A Closer Look.
Decision Tree Models in Data Mining
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
Enterprise systems infrastructure and architecture DT211 4
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
Data Mining Techniques
Next Generation Techniques: Trees, Network and Rules
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
© Negnevitsky, Pearson Education, Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data.
Inductive learning Simplest form: learn a function from examples
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Chapter 9 – Classification and Regression Trees
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
1 CO Games Development 2 Week 19 Probability Trees + Decision Trees (Learning Trees) Gareth Bellaby.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Neural Networks Automatic Model Building (Machine Learning) Artificial Intelligence.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
ID3 Algorithm Michael Crawford.
Customer Relationship Management (CRM) Chapter 4 Customer Portfolio Analysis Learning Objectives Why customer portfolio analysis is necessary for CRM implementation.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Classification and Regression Trees
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Ensemble Classifiers.
MIS2502: Data Analytics Advanced Analytics - Introduction
CO Games Development 2 Week 22 Trees
Classification and Prediction
MIS2502: Data Analytics Classification using Decision Trees
Data Mining for Business Analytics
Data Mining – Chapter 3 Classification
Decision Trees Jeff Storey.
Presentation transcript:

CSE5230/DMS/2003/7 Data Mining - CSE5230 Decision Trees

Lecture Outline Why use Decision Trees? What is a Decision Tree? Examples Use as a data mining technique Popular Models CART CHAID ID3 & C4.5

Why use Decision Trees? - 1 Whereas neural networks compute a mathematical function of their inputs to generate their outputs, decision trees use logical rules Petal-length IF Petal-length > 2.6 AND Petal-width  1.65 AND Petal-length > 5 AND Sepal-length > 6.05 THEN the flower is Iris virginica NB. This is not the only rule for this species. What is the other? Figure adapted from [SGI2001]  2.6 > 2.6 Iris setosa Petal-width  1.65 > 1.65 Petal-length Iris virginica  5 > 5 Sepal-length Iris versicolor  6.05 > 6.05 Iris versicolor Iris virginica

Why use Decision Trees? - 2 For some applications accuracy of classification or prediction is sufficient, e.g.: Direct mail firm needing to find a model for identifying customers who will respond to mail Predicting the stock market using past data In other applications it is better (sometimes essential) that the decision be explained, e.g.: Rejection of a credit application Medical diagnosis Humans generally require explanations for most decisions

Why use Decision Trees? - 3 Example: When a bank rejects a credit card application, it is better to explain to the customer that it was due to the fact that: He/she is not a permanent resident of Australia AND He/she has been residing in Australia for < 6 months AND He/she does not have a permanent job. This is better than saying: “We are very sorry, but our neural network thinks that you are not a credit-worthy customer.” (In which case the customer might become angry and move to another bank)

What is a Decision Tree? root node Built from root node (top) to leaf nodes (bottom) A record first enters the root node A test is applied to determine to which child node it should go next A variety of algorithms for choosing the initial test exists. The aim is to discriminate best between the target classes The process is repeated until a record arrives at a leaf node The path from the root to a leaf node provides an expression of a rule Petal-length test  2.6 > 2.6 child node Iris setosa Petal-width  1.65 > 1.65 Petal-length path Iris virginica  5 > 5 Sepal-length Iris versicolor  6.05 > 6.05 Iris versicolor Iris virginica leaf nodes

Building a Decision Tree - 1 Algorithms for building decision trees (DTs) begin by trying to find the test which does the “best job” of splitting the data into the desired classes The desired classes have to be identified at the start Example: we need to describe the profiles of customers of a telephone company who “churn” (do not renew their contracts). The DT building algorithm examines the customer database to find the best splitting criterion: The DT algorithm may discover out that the “Phone technology” variable is best for separating churners from non-churners Phone technology Age of customer Time has been a customer Gender

Building a Decision Tree - 2 The process is repeated to discover the best splitting criterion for the records assigned to each node Once built, the effectiveness of a decision tree can be measured by applying it to a collection of previously unseen records and observing the percentage of correctly classified records Phone technology new old Time has been a customer Churners  2.3 > 2.3

Time has been a Customer Example - 1 Phone Technology 50 Churners 50 Non-churners Requirement: Classify customers who churn, i.e. do not renew their phone contracts. (adapted from [BeS1997]) new old Time has been a Customer 30 Churners 50 Non-churners 20 Churners 0 Non-churners <= 2.3 years > 2.3 years Age 25 Churners 10 Non-churners 5 Churners 40 Non-churners <= 35 > 35 20 Churners 0 Non-churners 5 Churners 10 Non-churners

Example - 2 The number of records in a given parent node equals the sum of the records contained in the child nodes Quite easy to understand how the model is being built (unlike NNs) Easy use the model say for a targeted marketing campaign aimed at customers likely to churn Provides intuitive ideas about the customer base e.g: “Customers who have been with the company for a couple of years and have new phones are pretty loyal”

Use as a data mining technique - 1 Exploration Analyzing the predictors and splitting criteria selected by the algorithm may provide interesting insights which can be acted upon e.g. if the following rule was identified: IF time a customer < 1.1 years AND sales channel = telesales THEN chance of churn is 65% It might be worthwhile conducting a study on the way the telesales operators are making their calls

Use as a data mining technique - 2 Exploration (continued) Gleaning information from rules that fail e.g. from the phone example we obtained the rule: IF Phone technology = old AND Time has been a customer  2.3 years AND Age > 35 THEN there are only 15 customers (15% of total) Can this rule be useful? Perhaps we can attempt to build up this small market segment. If this is possible then we have the edge over competitors since we have a head start in this knowledge We can remove these customers from our direct marketing campaign since there are so few of them

Use as a data mining technique - 3 Exploration (continued) Again from the phone company example we noticed that: There was no combination of rules to reliably discriminate between churners and non-churners for the small market segment mentioned on the previous slide (5 churners, 10 non-churners). Do we consider this as an occasion where it was not possible to achieve our objective? From this failure we have learnt that age is not all that important for this category churners (unlike those under 35). Perhaps we were asking the wrong questions all along - this warrants further analysis

Use as a data mining technique - 4 Data Pre-processing Decision trees are very robust at handling different predictor types (number/categorical), and run quickly. Therefore the can be good for a first pass over the data in a data mining operation This will create a subset of the possibly useful predictors which can then be fed into another model, say a neural network Prediction Once the decision tree is built it can be then be used as a prediction tool, by using it on a new set of data

Popular Decision Tree Models: CART CART: Classification And Regression Trees, developed in 1984 by a team of researchers (Leo Breiman et al.) from Stanford University Used in the DM software Darwin - from Thinking Machines Corporation (recently bought by Oracle) Often uses an entropy measure to determine the split point (Shannon’s Information theory). measure of disorder (MOD) = where p is is the probability of that prediction value occurring in a particular node of the tree. Other measures used include Gini and twoing. CART produces a binary tree

CART - 2 Consider the “Churn” problem from slide 7.9 At the first node there are 100 customers to split, 50 who churn and 50 who don’t churn The MOD of this node is: MOD = -0.5*log2(0.5) + -0.5*log2(0.5) = 1.00 The algorithm will try each predictor For each predictor the algorithm will calculate the MOD of the split produced by several values to identify the optimum splitting on “Phone technology” produces two nodes, one with 50 churners and 30 non-churners, the other with 20 churners and 0 non-churners. The first of these has: MOD = -5/8*log2(5/8) + -3/8log2(3/8) = 0.95 and the second has a MOD of 0. CART will select the predictor producing nodes with the lowest MOD as the split point

Node splitting An ideally good split An ideally bad split Name Churned? Name Churned? Jim Yes Bob No Sally Yes Betty No Steve Yes Sue No Joe Yes Alex No An ideally bad split Steve No Sue Yes Joe No Alex Yes

Popular Decision Tree Models: CHAID CHAID: Chi-squared Automatic Interaction Detector, developed by J. A. Hartigan in 1975. Widely used since it is distributed as part of the popular statistical packages SAS and SPSS Differs from CART in the way it identifies the split points. Instead of the information measure, it uses chi-squared test to identify the split points (a statistical measure used for identifying independent variables) All predictors must be categorical or put into categorical form by binning The accuracy of the two methods CHAID and CART have been found to be similar

Popular Decision Tree Models: ID3 & C4.5 ID3: Iterative Dichtomiser, developed by the Australian researcher Ross Quinlan in 1979 Used in the data mining software Clementine of Integral Solutions Ltd. (taken over by SPSS) ID3 picks predictors and their splitting values on the basis of the information gain provided Gain is the difference between the amount of information that is needed to make a correct prediction both before and after the split has been made If the amount of information required is much lower after the split is made, then the split is said to have decreased the disorder of the original data

ID3 & C4.5 - 2

ID3 & C4.5 - 3 Split A will be selected C4.5 introduces a number of extensions to ID3: Handles unknown field values in training set Tree pruning method Automated rule generation

Strengths and Weaknesses Strengths of decision trees Able to generate understandable rules Classify with very little computation Handle both continuous and categorical data Provides a clear indication of which variables are most important for prediction or classification Weaknesses Not appropriate for estimation or prediction tasks (income, interest rates, etc.) Problematic with time series data (much pre-processing required), can be computationally expensive

References [SGI2001] Silicon Graphics Inc. MLC++ Utilities Manual, 2001 http://www.sgi.com/tech/mlc/utils.html [BeL1997] J. A. Berry and G. Linoff, Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons Inc.,1997 [BeS1997] A. Berson and S. J. Smith, Data Warehousing, Data Mining and OLAP, McGraw Hill, 1997