Chapter 3: Decision Tree Learning. Decision Tree Learning t Introduction t Decision Tree Representation t Appropriate Problems for Decision Tree Learning.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Advertisements

Decision Tree Learning
Decision Trees Decision tree representation ID3 learning algorithm
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree Learning
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Decision Tree Algorithm
Decision tree LING 572 Fei Xia 1/10/06. Outline Basic concepts Main issues Advanced topics.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Decision Trees Decision tree representation Top Down Construction
1 Interacting with Data Materials from a Course in Princeton University -- Hu Yan.
Ch 3. Decision Tree Learning
Decision Tree Learning
Decision tree LING 572 Fei Xia 1/16/06.
Decision tree learning
By Wang Rui State Key Lab of CAD&CG
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Artificial Intelligence 7. Decision trees
© Copyright 2008 STI INNSBRUCK Intelligent Systems Lecture X – xx 2009 Rule Learning Dieter Fensel and Tobias.
Mohammad Ali Keyvanrad
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
For Monday Read chapter 18, sections 5-6 Homework: –Chapter 18, exercises 1-2.
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Learning from Observations Chapter 18 Through
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Decision-Tree Induction & Decision-Rule Induction
Decision Tree Learning
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Decision Trees Reading: Textbook, “Learning From Examples”, Section 3.
Decision Trees.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 4-Inducción de árboles de decisión (1/2) Eduardo Poggi.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Inductive Learning and Decision Trees
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Decision Tree Learning
Introduction to Machine Learning Algorithms in Bioinformatics: Part II
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees.
Decision Trees Berlin Chen
Presentation transcript:

Chapter 3: Decision Tree Learning

Decision Tree Learning t Introduction t Decision Tree Representation t Appropriate Problems for Decision Tree Learning t Basic Algorithm t Hypothesis Space Search in Decision Tree Learning t Inductive Bias in Decision Tree Learning t Issues in Decision Tree Learning t Summary

Introduction t A method for approximating discrete-valued target functions t Easy to convert learned tree into if-then rule t ID3, ASSISTANT, C4.5 t Preference bias to smaller trees. t Search a completely expressive hypothesis space

Decision Tree Representation t Root -> leaf 로 sorting 하면서 학습에로 분류 t Node: attribute 테스트 t Branch: attribute’s value 에 해당 t Disjunction of conjunctions of constraints on the attribute values of instances

Appropriate Problems for Decision tree Learning t Instances are represented by attribute-value pairs t The target function has discrete output values t Disjunctive descriptions may be required t The training data may contain errors t The training data may contain missing attribute values

Basic Algorithm t 가능한 모든 decision trees space 에서의 top- down, greedy search t Training examples 를 가장 잘 분류 할 수 있는 attribute 를 루트에 둔다. t Entropy, Information gain

Entropy t Minimum number of bits of information needed to encode the classification of an arbitrary member of S t entropy = 0, if all members in the same class t entropy = 1, if |positive examples|=|negative examples|

Entropy(S)

Information Gain t Expected reduction in entropy caused by partitioning the examples according to attribute A t Attribute A 를 앎으로서 얻어지는 entropy 의 축소 정도

Which Attribute is the Best Classifier? (1) Humidity High Normal S:[9+, 5-] E=0.940 [3+, 4-] E=0.985 [6+, 1-] E=0.592

Which Attribute is the Best Classifier? (2) Wind Weak Strong S:[9+, 5-] E=0.940 [6+, 2-] E=0.811 [3+, 3-] E=1.000 Classifying examples by Humidity provides more information gain than by Wind.

Hypothesis Space Search in Decision Tree Learning (1) t Training examples 에 적합한 하나의 hypothesis 를 찾 는다. t ID3 의 hypothesis space  the set of possible decision trees t Simple-to-complex, hill-climbing search t I nformation gain => hill-climbing 의 guide

Hypothesis Space Search in Decision tree Learning (2) t Complete space of finite discrete-valued functions t Single current hypothesis 만 유지한다. t No back-tracking t 탐색의 각 단계에서 모든 training examples 고려 - 통계적인 결정을 내림

Inductive Bias (1) - Case ID3 t Examples 에 부합되는 decision tree 들 중 어 느 decision tree 를 선택해야 할 것인가 ? t Shorter trees are preferred over larger trees, t Trees that place high information gain attributes close to the root are preferred.

Inductive Bias (2)

Inductive Bias (3) t Occam’s razor  Prefer the simplest hypothesis that fits the data t Major difficulty  학습의 내부 표현에 의해 hypothesis 의 크기가 다양 할 수 있다.

Issues in Decision Tree Learning t How deeply to grow the decision tree t Handling continuous attributes t Choosing an appropriate attribute selection measure t Handling the missing attribute values

Avoiding Overfitting the Data (1) t Training examples 를 완벽하게 분류할 때까지 tree 를 성장시킴 ?  1. Data 에 noise 가 있을 때  2. Training examples 수가 적을 때 t Overfit: training data 에 대한 hypothesis h,h’ 가 있을 때  h 의 error < h’ 의 error, (training examples 에 대해서 )  h 의 error > h’ 의 error, ( 전체 인스턴스에 대해서 )

Avoiding Overfitting the Data (2) t 해결책  1.examples 를 training set 과 validation set 으로 나눈다.  2. 모든 data 는 training 으로 사용하고, 특정 노드의 절단이 성능을 시킬 수 있는 지 통계적으로 검사한다.  3.Training examples, decision tree 를 encoding 하는 복잡도 를 측정하는 explicit measure 개발 -chapter 6 t 1 번 방식 : training and validation set approach  validation set => hypothesis 의 pruning 효과 측정

Reduced Error Pruning t validation set 에 대하여, 노드가 절단된 tree 가 원래의 tree 보다 나쁘지 않은 결과를 나타낼 때, 그 노드를 삭제한다. t Training set 에서 우연하게 추가된 leaf 노드가 절단될 가능성이 있다.  이 같은 우연성이 validation set 에서도 나타나기는 힘들기 때문 t Training set, test set, validation set 으로 구성 t 단점 : data 의 수가 적을 때

Rule Post-Pruning (1) 1. Decision tree 를 만든다. (overfitting 허용 ) 2. Root 에서 leaf 에 이르는 rule 로 변환 3. Precondition 을 제거함으로써 estimated accuracy 을 향상시키는 rule 을 절단 4. Estimated accuracy 에 따라 sort 한다. Subsequent instance 를 분류할 때 정렬된 순으로 적용한다.

Rule Post-Pruning (2) t Pruning 전에 decision tree 를 rule 로 변환하는 이유  Decision node 가 사용되는 별개의 context 들을 구별 할 수 있다.  Root 나 leaf 노드에서의 attribute 테스트를 구분할 필 요 없다.

Incorporating Continuous- Valued Attributes Information gain 을 최대가 되게 하는 threshold 를 고른다. Attribute value 에 따라 sort 한다. t Target classification 이 변하는 pair 를 고른다. t 이 pair 의 중간값을 threshold 후보로 본다. t 이 후보들 중 information gain 을 최대로 하는 것을 선 택 Temperature: PlayTennis: No Yes No

Alternative Measures for Selecting Attributes (1) t Information gain measure 는 많은 value 를 가진 attribute 를 선호한다. Attribute Data (e.g. March ) Attribute Data (e.g. March ) Training data 에 대해서는 target attribute 를 완벽하게 분류 Training data 에 대해서는 target attribute 를 완벽하게 분류 좋은 predictor 는 되지 못한다 좋은 predictor 는 되지 못한다 Extreme example Extreme example

Alternative Measures for Selecting Attributes (2) t attribute A 의 value 에 대한 관점에서의 S 에 대한 entropy 이다. n 개의 data 를 n 개의 value 가 완벽하게 분류한다면n 개의 data 를 n 개의 value 가 완벽하게 분류한다면 2 부분으로 완벽하게 나누는 2 개의 value 를 가진다면2 부분으로 완벽하게 나누는 2 개의 value 를 가진다면

Alternative Measures for Selecting Attributes(3)

Handling Training Examples with Missing Attribute Values t node n 에 있는 examples 중에서 C(x) 를 가지는 것들 중 가장 흔한 attribute value 를 할당함 t attribute A 의 가능한 value 에 대해 확 률값을 할당.  Node n 에 있는 A 의 value 의 frequency 를 관찰함으로써 알 수 있다.

Handling Attributes with Differing Costs

Summary t ID3 family = root rule 부터 downward 로 성장, next best attribute 를 greedy search t Complete hypothesis space t Preference for smaller trees t Overfitting avoidance by Post-pruning