Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Decision Trees Decision tree representation ID3 learning algorithm
Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
Data Mining and Machine Learning
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Decision Tree Algorithm
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees Chapter 18 From Data to Knowledge.
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
ICS 273A Intro Machine Learning
Decision Tree Pruning Methods Validation set – withhold a subset (~1/3) of training data to use for pruning –Note: you should randomize the order of training.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Chapter 7 Decision Tree.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Midwestern State University, Wichita Falls TX 1 Computerized Trip Classification of GPS Data: A Proposed Framework Terry Griffin - Yan Huang – Ranette.
By Wang Rui State Key Lab of CAD&CG
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Learning from Observations Chapter 18 Through
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning in Practice Lecture 18
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Classification Algorithms
Decision Tree Learning
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Decision Trees (suggested time: 30 min)
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
ID3 Algorithm.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning in Practice Lecture 17
INTRODUCTION TO Machine Learning 2nd Edition
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Decision Trees Recitation 1/17/08 Mary McGlohon

Announcements HW 1 out- DTs and basic probability Due Mon, Jan 28 at start of class Matlab High-level language, specialized for matrices Built-in plotting software, lots of math libraries On campus lab machines Interest in tutorial? Smiley Award Plug

AttendClass? Represent this tree as logical expression. AttendClass = Yes If Raining = False OR Material = New AND Before10am = False OR Is10601 = Yes Raining Is10601 Yes TrueFalse TrueFalse Yes Material NewOld Before10 TrueFalse Yes No Yes TrueFalse No Represent as a logical expression.

AttendClass? Represent this tree as logical expression. AttendClass = Yes If Raining = False OR Material = New AND Before10am = False OR Is10601 = Yes Raining Is10601 Yes TrueFalse TrueFalse Yes Material NewOld Before10 TrueFalse Yes No Yes TrueFalse No Represent as a logical expression. AttendClass = Yes if: (Raining = False) OR (Is10601 = True) OR (Material = New AND Before10 =False)

Split decisions There are other trees logically equivalent. How do we know which one to use?

Split decisions There are other trees logically equivalent. How do we know which one to use? Depends on what is important to us.

Information Gain Classically we rely on “information gain”, which uses the principle that we want to use the least number of bits, on average, to get our idea across. Suppose I want to send a weather forecast with 4 possible outcomes: Rain, Sun, Snow, and Tornado. 4 outcomes = 2 bits. In Pittsburgh there’s Rain 90% of the time, Snow 5%, Sun 4.9%, and Tornado.01%. So if you assign Rain to a 1-bit message, you rarely send >1 bit.

Entropy

RainIs10601Before10MaterialAttend ++-New Old Set S has 6 positive, 2 negative examples. H(S) = -.75 log2(.75) -.25 log2(.25) =

Conditional Entropy “The average number of bits it would take to encode a message Y, given knowledge of X”

Conditional Entropy RainIs10601Before10MaterialAttend ++-New Old H(Attend | Rain) = H(Attend | Rain=T)*P(Rain=T) + H(Attend|Rain=F)*P(Rain=F)

Conditional Entropy RainIs10601Before10MaterialAttend ++-New Old H(Attend | Rain) = H(Attend | Rain=T)*P(Rain=T) + H(Attend|Rain=F)*P(Rain=F)= 1 * * 0.5 = 0.5 Entropy of this set = 1 Entropy of this set = 0

Information Gain “How much conditioning on attribute A increases our knowledge (decreases entropy) of S. IG(S,A) = H(S) - H(S|A)

Information Gain IG(Attend,Rain) = H(Attend) - H(Attend|Rain)= =.3113 RainIs10601Before10MaterialAttend ++-New Old

What about this? Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo Raining Is10601 Yes TrueFalse TrueFalse YesNo For some dataset, could we ever build this DT?

What about this? Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo Raining Is10601 Yes TrueFalse TrueFalse YesNo For some dataset, could we ever build this DT? What if you were taking 20 classes, and it rains 90% of the time?

What about this? Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo Raining Is10601 Yes TrueFalse TrueFalse YesNo For some dataset, could we ever build this DT? What if you were taking 20 classes, and it rains 90% of the time? If most information is gained from Material or Before10, we won’t ever need to traverse to So even a bigger tree (node-wise) may be “simpler”, for some sets of data.

Node-based pruning Until further pruning is harmful, For each node n in trained tree T, Let Tn’ be T without n (and descendents). Assign removed node to be “best choice” under that traversal. Record error of Tn’ on validation set. Let T= Tk’ where Tk’ is pruned tree with best performance on validation set.

Node-based pruning Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo Raining Is10601 Yes TrueFalse TrueFalse YesNo For each node, record performance on validation set of tree without node. Suppose our initial tree has 0.7 accurate performance on validation.

Node-based pruning Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo Raining Is10601 Yes TrueFalse TrueFalse YesNo For each node, record performance on validation set of tree without node. Suppose our initial tree has 0.7 accurate performance on validation. Let’s test this node...

Node-based pruning Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo For each node, record performance on validation set of tree without node. Suppose our initial tree has 0.7 accurate performance on validation. Text Suppose that most examples where Material=New and Before10=True are “Yes”. Our new subtree has “Yes” here. Yes

Node-based pruning Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo For each node, record performance on validation set of tree without node. Suppose our initial tree has 0.7 accurate performance on validation. Text Suppose that most examples where Material=New and Before10=True are “Yes”. Our new subtree has “Yes” here. Yes Now, test this tree!

Node-based pruning Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo For each node, record performance on validation set of tree without node. Suppose our initial tree has 0.7 accurate performance on validation. Text Suppose that most examples where Material=New and Before10=True are “Yes”. Our new subtree has “Yes” here. Yes Now, test this tree!

Node-based pruning Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo For each node, record performance on validation set of tree without node. Suppose our initial tree has 0.7 accurate performance on validation. Text Suppose that most examples where Material=New and Before10=True are “Yes”. Our new subtree has “Yes” here. Yes Suppose we get accuracy of 0.73 on this pruned tree. Repeat the test procedure by removing a different node from the original tree...

Node-based pruning Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo Raining Is10601 Yes TrueFalse TrueFalse YesNo Try this tree (with a different node pruned)...

Node-based pruning Raining Material Before10 NewOld TrueFalse Yes TrueFalse No Raining Is10601 Yes TrueFalse TrueFalse YesNo Try this tree (with a different node pruned)... Now, test this tree and record its accuracy.

Node-based pruning Raining Material Before10 NewOld TrueFalse Yes TrueFalse No Raining Is10601 Yes TrueFalse TrueFalse YesNo Try this tree (with a different node pruned)... Now, test this tree and record its accuracy. Once we test all possible prunings, modify our tree T with the pruning that has the best performance. Repeat the entire pruning selection procedure on new T, replacing T each time with the best performing pruned tree, until we no longer gain anything by pruning.

Rule-based pruning Raining Material Before10 Is10601 NewOld TrueFalse Yes TrueFalse TrueFalse YesNo Raining Is10601 Yes TrueFalse TrueFalse YesNo 1. Convert tree to rules, one for each leaf: IF Material=Old AND Raining = False THEN Attend = Yes IF Material=Old AND Raining=True AND Is601=True THEN Attend=Yes...

Rule-based pruning 2. Prune each rule. For instance, to prune this rule: IF Material=Old AND Raining = F THEN Attend = T Test potential rule without preconditions on validation set, compare to performance of original rule on set. IF Material=OLD THEN Attend=T IF Raining=F THEN Attend = T

Rule-based pruning Suppose we got the following accuracy for each rule: IF Material=Old AND Raining = F THEN Attend = T IF Material=OLD THEN Attend=T IF Raining=F THEN Attend = T

Rule-based pruning Suppose we got the following accuracy for each rule: IF Material=Old AND Raining = F THEN Attend = T IF Material=OLD THEN Attend=T IF Raining=F THEN Attend = T Then, we would keep the best one and drop the others.

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with one precondition removed. IF Material=Old AND Raining=T AND Is601=T then Attend=T If Material=Old AND Raining=T then Attend=T If Material=Old AND Is601=T then Attend=T If Raining=T and Is601=T then Attend=T

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with one precondition removed. IF Material=Old AND Raining=T AND Is601=T then Attend=T If Material=Old AND Raining=T then Attend=T If Material=Old AND Is601=T then Attend=T If Raining=T and Is601=T then Attend=T

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with one precondition removed. IF Material=Old AND Raining=T AND Is601=T then Attend=T If Material=Old AND Raining=T then Attend=T If Material=Old AND Is601=T then Attend=T If Raining=T and Is601=T then Attend=T If a shorter rule works better, we may also choose to further prune on this step before moving on to next leaf. If Material=Old AND Raining=T then Attend=T If Material=Old then Attend=T If Raining = T then Attend = T-- 0.2

Rule-based pruning Repeat for next rule, comparing the original rule with each rule with one precondition removed. IF Material=Old AND Raining=T AND Is601=T then Attend=T If Material=Old AND Raining=T then Attend=T If Material=Old AND Is601=T then Attend=T If Raining=T and Is601=T then Attend=T If a shorter rule works better, we may also choose to further prune on this step before moving on to next leaf. If Material=Old AND Raining=T then Attend=T If Material=Old then Attend=T If Raining = T then Attend = T Well, maybe not this time!

Rule-based pruning Once we have done the same pruning procedure for each rule in the tree Order the ‘kept rules’ by their accuracy, and do all subsequent classification with that priority. -IF Material=Old AND Raining=T THEN Attend=T IF Raining=F THEN Attend = T (and so on for other pruned rules)... (Note that you may wind up with a differently-structured DT than before, as discussed in class)

Adding randomness RainIs601Material Before1 0 Attend? TF???F Raining Is10601 Yes TrueFalse TrueFalse Yes Material NewOld Before10 No Yes TrueFalse No What if you didn’t know if you had new material? For instance, you wanted to classify this:

Adding randomness RainIs601Material Before1 0 Attend? TF???F Raining Is10601 Yes TrueFalse TrueFalse Yes Material NewOld Before10 No Yes TrueFalse No What if you didn’t know if you had new material? For instance, you wanted to classify this: where to go? You could look at training set, and see that when Rain=T an 10601=F, p fraction of the examples had new material. Then flip a p-biased coin and descend the appropriate branch. But that might not be the best idea. Why not?

Adding randomness Also, you may have missing data in the training set. RainIs601Material Before1 0 Attend? TF???FT There are also methods to deal with this using probability. “Well, 60% of the time when Rain and not 601, there’s new material (when we know there is new material). So we’ll just randomly select 60% of rainy, non-601 examples where we don’t know the material, to be old material. Raining Is10601 Yes TrueFalse TrueFalse Yes ?

Adventures in Probability That approach tends to work well. Still, we may have the following trouble. What if there aren’t very many training examples where Rain = True and 10601=False? Wouldn’t we still want to use examples where Rain=False to get the missing value? Well, it “depends”. Stay tuned for lecture next week!