Chapter 6 Decision Trees

Slides:



Advertisements
Similar presentations
Chapter 7 Classification and Regression Trees
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Decision Trees Decision tree representation ID3 learning algorithm
CHAPTER 9: Decision Trees
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Decision Tree Algorithm (C4.5)
Classification Techniques: Decision Tree Learning
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
1 Input and Output Thanks: I. Witten and E. Frank.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Primer Parcial -> Tema 3 Minería de Datos Universidad del Cauca.
Overview Previous techniques have consisted of real-valued feature vectors (or discrete-valued) and natural measures of distance (e.g., Euclidean). Consider.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
CHAPTER 8 Decision Making CS267 BY GAYATRI BOGINERI (12) & BHARGAV VADHER (14) October 22, 2007.
Decision Tree Algorithm
Basic Data Mining Techniques Chapter Decision Trees.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Basic Data Mining Techniques
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Classification.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Decision Tree Models in Data Mining
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
Chapter 7 Decision Tree.
Basic Data Mining Techniques
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Inductive learning Simplest form: learn a function from examples
Decision Trees.
Chapter 9 – Classification and Regression Trees
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
1 Data preparation: Selection, Preprocessing, and Transformation Literature: Literature: I.H. Witten and E. Frank, Data Mining, chapter 2 and chapter 7.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
CS690L Data Mining: Classification
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Classification and Regression Trees
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Classification Algorithms Covering, Nearest-Neighbour.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
10. Decision Trees and Markov Chains for Gene Finding.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Classification Algorithms
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Advanced Artificial Intelligence
Data Mining – Chapter 3 Classification
Decision Trees.
Classification Algorithms
Lecture 05: Decision Trees
Data Mining CSCI 307, Spring 2019 Lecture 21
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Chapter 6 Decision Trees

An Example Outlook overcast humidity windy high normal false true sunny rain N P

Another Example - Grades Percent >= 90%? Yes Grade = A No 89% >= Percent >= 80%? Grade = B 79% >= Percent >= 70%? Grade = C Etc...

1 of 2 Yet Another Example

Yet Another Example English Rules (for example): 2 of 2 I f t ea r p ro d u c ti o n ra e = e uc d t h en r co m nd a i on = no n . ge = yo g d as ma = n nd t ar p od io at = al he ec en s ge = pr - re b y op c an st ic = no a du on al t = s ft yo d sp ac l e p es pt = m c = o th n re me pe cl ip = h yp e an o an om da of e a ig ic = ye ar = y s an sc n = er nd a s th tr

Decision Tree Template Drawn top-to-bottom or left-to-right Top (or left-most) node = Root Node Descendent node(s) = Child Node(s) Bottom (or right-most) node(s) = Leaf Node(s) Unique path from root to each leaf = Rule

Introduction Decision Trees You use mental decision trees often! Powerful/popular for classification & prediction Represent rules Rules can be expressed in English IF Age <=43 & Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No Rules can be expressed using SQL for query Useful to explore data to gain insight into relationships of a large number of candidate input variables to a target (output) variable You use mental decision trees often! Game: “I’m thinking of…” “Is it …?”

Decision Tree – What is it? A structure that can be used to divide up a large collection of records into successively smaller sets of records by applying a sequence of simple decision rules A decision tree model consists of a set of rules for dividing a large heterogeneous population into smaller, more homogeneous groups with respect to a particular target variable

Decision Tree Types Binary trees – only two choices in each split. Can be non-uniform (uneven) in depth N-way trees or ternary trees – three or more choices in at least one of its splits (3-way, 4-way, etc.)

Scoring Often it is useful to show the proportion of the data in each of the desired classes Clarify Fig 6.2

Decision Tree Splits (Growth) The best split at root or child nodes is defined as one that does the best job of separating the data into groups where a single class predominates in each group Example: US Population data input categorical variables/attributes include: Zip code Gender Age Split the above according to the above “best split” rule

Example: Good & Poor Splits Good Split

Split Criteria The best split is defined as one that does the best job of separating the data into groups where a single class predominates in each group Measure used to evaluate a potential split is purity The best split is one that increases purity of the sub-sets by the greatest amount A good split also creates nodes of similar size or at least does not create very small nodes

Tests for Choosing Best Split Purity (Diversity) Measures: Gini (population diversity) Entropy (information gain) Information Gain Ratio Chi-square Test We will only explore Gini in class

Gini (Population Diversity) The Gini measure of a node is the sum of the squares of the proportions of the classes. Root Node: 0.5^2 + 0.5^2 = 0.5 (even balance) Leaf Nodes: 0.1^2 + 0.9^2 = 0.82 (close to pure)

Pruning Decision Trees can often be simplified or pruned: CART C5 Stability-based We will not cover these in detail

Decision Tree Advantages Easy to understand Map nicely to a set of business rules Applied to real problems Make no prior assumptions about the data Able to process both numerical and categorical data

Decision Tree Disadvantages Output attribute must be categorical Limited to one output attribute Decision tree algorithms are unstable Trees created from numeric datasets can be complex

Alternative Representations Box Diagram Tree Ring Diagram Decision Table Supplementary Material

End of Chapter 6