From Heather’s blog: http://www.prettystrongmedicine.com/p/about.html.

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees.
Machine Learning Group University College Dublin Decision Trees What is a Decision Tree? How to build a good one…
Decision Tree Rong Jin. Determine Milage Per Gallon.
Induction of Decision Trees
Basic Data Mining Techniques
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 6 Decision Trees
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Building And Interpreting Decision Trees in Enterprise Miner.
Chapter 9 – Classification and Regression Trees
Today  Table/List operations  Parallel Arrays  Efficiency and Big ‘O’  Searching.
Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Support Vector Machines: a different approach to finding the decision boundary, particularly good at generalisation finishing off last lecture …
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
ID3 Algorithm Michael Crawford.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR K. TAMBALKAR.
Decision Trees Recitation 1/17/08 Mary McGlohon
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Konstantina Christakopoulou Liang Zeng Group G21
Decision Trees.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) [Edited by J. Wiebe] Decision Trees.
CIS 335 CIS 335 Data Mining Classification Part I.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Young adults by gender and chance of getting rich
Decision Trees an introduction.
Introduction to Machine Learning and Tree Based Methods
LECTURE 20: DECISION TREES
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Decision Trees (suggested time: 30 min)
Trees, bagging, boosting, and stacking
Ch9: Decision Trees 9.1 Introduction A decision tree:
Decision Tree Saed Sayad 9/21/2018.
Vincent Granville, Ph.D. Co-Founder, DSC
Decision Trees Greg Grudic
Classification by Decision Tree Induction
Data Mining – Chapter 3 Classification
Decision Trees.
Decision Trees By Cole Daily CSCI 446.
LECTURE 18: DECISION TREES
Chapter 7: Transformations
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
INTRODUCTION TO Machine Learning 2nd Edition
Decision Tree  Decision tree is a popular classifier.
Decision Tree  Decision tree is a popular classifier.
STT : Intro. to Statistical Learning
Presentation transcript:

From Heather’s blog: http://www.prettystrongmedicine.com/p/about.html

Decision Trees

Real world applications of DTs See here for a list: http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/survey/node32.html Includes: Agriculture, Astronomy, Biomedical Engineering, Control Systems, Financial analysis, Manufacturing and Production, Medicine, Molecular biology, Object recognition, Pharmacology, Physics, Plant diseases, Power systems, Remote Sensing, Software development, Text processing:

Field names

Field names Field values

Field names Field values Class values

Why decision trees? Popular, since they are interpretable ... and correspond to human reasoning/thinking about decision-making Can perform quite well in accuracy when compared with other approaches ... and there are good algorithms to learn decision trees from data

Figure 1. Binary Strategy as a tree model. Mohammed MA, Rudge G, Wood G, Smith G, et al. (2012) Which Is More Useful in Predicting Hospital Mortality -Dichotomised Blood Test Results or Actual Test Values? A Retrospective Study in Two Hospitals. PLoS ONE 7(10): e46860. doi:10.1371/journal.pone.0046860 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0046860

We will learn the ‘classic’ algorithm to learn a DT from categorical data:

We will learn the ‘classic’ algorithm to learn a DT from categorical data: ID3

Suppose we want a tree that helps us predict someone’s politics, given their gender, age, and wealth male middle-aged rich Right-wing young female poor Left-wing old

Choose a start node (field) at random gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old

Choose a start node (field) at random gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old ?

Choose a start node (field) at random gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old Age

Add branches for each value of this field gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old Age young old mid

Check to see what has filtered down gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old Age young old mid 1 L, 2 R 1 L, 1 R 0 L, 1 R

Where possible, assign a class value gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old Age young old mid 1 L, 2 R 1 L, 1 R 0 L, 1 R Right-Wing

Otherwise, we need to add further nodes gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old Age young old mid 1 L, 2 R 1 L, 1 R 0 L, 1 R ? ? Right-Wing

Repeat this process every time we need a new node gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old Age young old mid 1 L, 2 R 1 L, 1 R 0 L, 1 R ? ? Right-Wing

Starting with first new node – choose field at random gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old Age young old mid 1 L, 2 R 1 L, 1 R 0 L, 1 R wealth ? Right-Wing

Check the classes of the data at this node… gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old Age young old mid 1 L, 2 R 1 L, 1 R 0 L, 1 R wealth rich ? Right-Wing poor 1 L, 0 R 1 L, 1 R

And so on … gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old Age young old mid 1 L, 2 R 1 L, 1 R 0 L, 1 R wealth rich ? Right-Wing poor Right-wing 1 L, 1 R

But we can do better than randomly chosen fields! gender age wealth politics male middle-aged rich Right-wing young female poor Left-wing old

This is the tree we get if first choice is `gender’ age wealth politics male middle-aged rich Right-wing young female poor Left-wing old

This is the tree we get if first choice is `gender’ age wealth politics male middle-aged rich Right-wing young female poor Left-wing old gender male female Right-Wing Left-Wing

Algorithms for building decision trees (of this type) Initialise: tree T contains one ‘unexpanded’ node Repeat until no unexpanded nodes remove an unexpanded node U from T expand U by choosing a field add the resulting nodes to T

Algorithms for building decision trees (of this type) – expanding a node ?

Algorithms for building decision trees (of this type) – the essential step Field Value = X Value = Z Value = Y ? ? ?

So, which field? Field Value = X Value = Z Value = Y ? ? ?

Three choices: gender, age, or wealth politics male middle-aged rich Right-wing young female poor Left-wing old

Suppose we choose age (table now sorted by age values) gender age wealth politics male middle-aged rich Right-wing female poor Left-wing old young Two of the values have a mixture of classes

Suppose we choose wealth (table now sorted by wealth values) gender age wealth politics female middle-aged poor Left-wing male old Right-wing young rich One of the values has a mixture of classes - this choice is a bit less mixed up than age?

Suppose we choose gender (table now sorted by gender values) age wealth politics female middle-aged poor Left-wing young male old Right-wing rich The classes are not mixed up at all within the values

So, at each step where we choose a node to expand, we make the choice where the relationship between the field values and the class values is least mixed up

Measuring ‘mixed-up’ness: Shannon’s entropy measure Suppose you have a bag of N discrete things, and there T different types of things. Where, pT is the proportion of things in the bag that are type T, the entropy of the bag is:

Examples: Lower entropy = less mixed up This mixture: { left left left right right } has entropy: − ( 0.6 log(0.6) + 0.4 log(0.4)) = 0.292 This mixture: { A A A A A A A A B C } has entropy: − ( 0.8 log(0.8) + 0.1 log(0.1) + 0.1 log(0.1)) =0.278 This mixture: {same same same same same same} has entropy: − ( 1.0 log(1.0) ) = 0 Lower entropy = less mixed up

ID3 chooses fields based on entropy Each val has an entropy value – how mixed up the classes are for that value choice Field1 Field2 Field3 … val1 val1 val1 val2 val2 val2 val3 val3

ID3 chooses fields based on entropy Each val has an entropy value – how mixed up the classes are for that value choice And each val also has a proportion – how much of the data at this node has this val Field1 Field2 Field3 … val1xp1 val1xp1 val1xp1 val2xp2 val2xp2 val2xp2 val3xp3 val3xp3

ID3 chooses fields based on entropy So ID3 works out H(D|Field) for each field, which is the entropies of the values weighted by the proportions. Field1 Field2 Field3 … val1xp1 val1xp1 val1xp1 val2xp2 val2xp2 val2xp2 val3xp3 val3xp3 = = = H(D|Field1) H(D|Field2) H(D|Field3)

ID3 chooses fields based on entropy So ID3 works out H(D|Field) for each field, which is the entropies of the values weighted by the proportions. Field1 Field2 Field3 … val1xp1 val1xp1 val1xp1 val2xp2 val2xp2 val2xp2 val3xp3 val3xp3 = = = H(D|Field1) H(D|Field2) H(D|Field3) The one with the lowest value is chosen – this maximises ‘Information Gain’

Back here gender, age, or wealth politics male middle-aged rich Right-wing young female poor Left-wing old

Suppose we choose age (table now sorted by age values) gender age wealth politics male middle-aged rich Right-wing female poor Left-wing old young H(D| age) = proportion-weighted entropy = 0.3333 x − ( 0.5 x log(0.5) + 0.5 x log(0.5) ) + 0.1666 x − ( 1 x log(1) ) + x − ( 0.33 x log(0.33) + 0.66 xlog(0.66) ) 0.3333 0.16666 0.5

Suppose we choose wealth (table now sorted by wealth values) gender age wealth politics female middle-aged poor Left-wing male old Right-wing young rich H(D|wealth) = 0.3333 x − ( 0.5 x log(0.5) + 0.5 x log(0.5) ) + x − ( 1 x log(1) ) 0.6666 0.3333

Suppose we choose gender (table now sorted by gender values) age wealth politics female middle-aged poor Left-wing young male old Right-wing rich H(D| gender) = 0.3333 x − ( 1 x log (1) ) + x − ( 1 x log (1) ) 0.3333 0.6666 This is the one we would choose ...

Alternatives to Information Gain - all, somehow or other, give a measure of mixed-upness and have been used in building DTs Chi Square Gain Ratio, Symmetric Gain Ratio, Gini index Modified Gini index Symmetric Gini index J-Measure Minimum Description Length, Relevance RELIEF Weight of Evidence

Decision Trees Further reading is on google Interesting topics in context are: Pruning: close a branch down before you hit 0 entropy ( why?) Discretization and regression: trees that deal with real valued fields Decision Forests: what do you think these are?