CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Tree Example MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree Learning
Decision Tree Rong Jin. Determine Milage Per Gallon.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Learning
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Prof. Giancarlo Mauri Lezione 3 - Learning Decision.
Decision Trees Decision tree representation Top Down Construction
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Ch 3. Decision Tree Learning
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Decision Tree Pruning Methods Validation set – withhold a subset (~1/3) of training data to use for pruning –Note: you should randomize the order of training.
Decision Tree Learning
ID3 and Decision tree by Tuan Nguyen May 2008.
Decision tree learning
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning 1 Introduction
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
Artificial Intelligence 7. Decision trees
Machine Learning CS 165B Spring 2012
Machine Learning Decision Tree.
Mohammad Ali Keyvanrad
Classification with Decision Trees and Rules Evgueni Smirnov.
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Decision-Tree Induction & Decision-Rule Induction
Decision Tree Learning
Data Mining-Knowledge Presentation—ID3 algorithm Prof. Sin-Min Lee Department of Computer Science.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information.
Seminar on Machine Learning Rada Mihalcea Decision Trees Very short intro to Weka January 27, 2003.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Friday’s Deliverable As a GROUP, you need to bring 2N+1 copies of your “initial submission” –This paper should be a complete version of your paper – something.
Decision Tree Pruning problem of overfit approaches
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Inductive Learning and Decision Trees
Università di Milano-Bicocca Laurea Magistrale in Informatica
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Decision trees (concept learnig)
Decision Tree Learning
Lecture 3: Decision Tree Learning
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Decision tree representation ID3 learning algorithm
INTRODUCTION TO Machine Learning
Presentation transcript:

CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3

ID3 (precursor to C4.5) “Canonical” decision tree learning algorithm by Ross QuinlanRoss Quinlan Has evolved into C5.0 from Rulequest – established product.Rulequest Basic idea: recursively build decision tree “top-down” –At each node, choose “best” attribute for decision from those remaining –What is “best”?

Algorithm ID3 (Training set S, Attributes R, Class C) If all records in S have same value for C, return node with that value If R is empty, return node with most common value for C in S Create non-leaf node: Let D be the attribute with largest Gain(D,S) among attributes in R; For each d  D, Let S d be items in S where value of D is d Create edge to ID3(R-{D},C, S d )

What is Gain(D,S)? Information theory: expected reduction in entropy due to sorting S on attribute D entropy(S) = -(p 1 *log(p 1 )+...+p n *log(p n )) –p i is the probability of class i in S. Gain(D,S) =

Example: Tennis DayOutlookTemp.HumidityWindPlay Tennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolNormalWeakYes D6RainCoolNormalStrongNo D7OvercastCoolNormalWeakYes D8SunnyMildHighWeakNo D9SunnyColdNormalWeakYes D10RainMildNormalStrongYes D11SunnyMildNormalStrongYes D12OvercastMildHighStrongYes D13OvercastHotNormalWeakYes D14RainMildHighStrongNo

What should be first? Humidity HighNormal [3+, 4-][6+, 1-] S=[9+,5-] E=0.940 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 E=0.985 E=0.592 Wind WeakStrong [6+, 2-][3+, 3-] S=[9+,5-] E=0.940 E=0.811 E=1.0 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048

(continued) Outlook Sunny Rain [2+, 3-] [3+, 2-] S=[9+,5-] E=0.940 Gain(S,Outlook) =0.940-(5/14)* (4/14)*0.0 – (5/14)* =0.247 E=0.971 Over cast [4+, 0] E=0.0

And recurse... Outlook SunnyOvercastRain Yes [D1,D2,…,D14] [9+,5-] S sunny =[D1,D2,D8,D9,D11] [2+,3-] ? ? [D3,D7,D12,D13] [4+,0-] [D4,D5,D6,D10,D14] [3+,2-] Gain(S sunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = Gain(S sunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = Gain(S sunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019

ID3 result Outlook SunnyOvercastRain Humidity HighNormal Wind StrongWeak NoYes No [D3,D7,D12,D13] [D8,D9,D11] [D6,D14] [D1,D2] [D4,D5,D10]

Inductive Bias in ID3 H is the power set of instances X –Unbiased ? Preference for short trees, and for those with high information gain attributes near the root Bias is a preference for some hypotheses, rather than a restriction of the hypothesis space H Occam’s razor: prefer the shortest (simplest) hypothesis that fits the data

Occam’s Razor Why prefer short hypotheses? Argument in favor: –Fewer short hypotheses than long hypotheses –A short hypothesis that fits the data is unlikely to be a coincidence –A long hypothesis that fits the data might be a coincidence Argument opposed: –There are many ways to define small sets of hypotheses –E.g. All trees with a prime number of nodes that use attributes beginning with ”Z” –What is so special about small sets based on size of hypothesis

Overfitting Consider error of hypothesis h over Training data: error train (h) Entire distribution D of data: error D (h) Hypothesis h  H overfits training data if there is an alternative hypothesis h’  H such that error train (h) < error train (h’) and error D (h) > error D (h’)

Avoid Overfitting How can we avoid overfitting? Stop growing when data split not statistically significant Grow full tree then post-prune Minimum description length (MDL): Minimize: size(tree) + size(misclassifications(tree))

Reduced-Error Pruning Split data into training and validation set Do until further pruning is harmful: 1.Evaluate impact on validation set of pruning each possible node (plus those below it) 2.Greedily remove the one that most improves the validation set accuracy Produces smallest version of most accurate subtree

Rule-Post Pruning 1.Convert tree to equivalent set of rules 2.Prune each rule independently of each other 3.Sort final rules into a desired sequence to use Method used in C4.5

Converting a Tree to Rules Outlook SunnyOvercastRain Humidity HighNormal Wind StrongWeak NoYes No R 1 : If (Outlook=Sunny)  (Humidity=High) Then PlayTennis=No R 2 : If (Outlook=Sunny)  (Humidity=Normal) Then PlayTennis=Yes R 3 : If (Outlook=Overcast) Then PlayTennis=Yes R 4 : If (Outlook=Rain)  (Wind=Strong) Then PlayTennis=No R 5 : If (Outlook=Rain)  (Wind=Weak) Then PlayTennis=Yes

Issues Attributes with many values –Maximize gain – always selects these –But not very interesting (classify on timestamp?) What to do with continuous attributes? –Gain ration Attributes with costs (e.g., medical tests) Unknown values