Machine Learning Inductive Learning and Decision Trees

Slides:



Advertisements
Similar presentations
Learning from Observations Chapter 18 Section 1 – 3.
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.
Machine Learning II Decision Tree Induction CSE 473.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Learning from Observations Chapter 18 Section 1 – 4.
Learning From Observations
Decision Trees Decision tree representation Top Down Construction
Learning: Introduction and Overview
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.
Inductive learning Simplest form: learn a function from examples
Mohammad Ali Keyvanrad
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Learning from Observations Chapter 18 Section 1 – 3, 5-8 (presentation TBC)
Learning from Observations Chapter 18 Section 1 – 3.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Machine Learning II Decision Tree Induction CSE 573.
Decision Tree Learning
Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information.
Chapter 18 Section 1 – 3 Learning from Observations.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Tree Learning
Learning from Observations
Learning from Observations
DECISION TREES An internal node represents a test on an attribute.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Introduce to machine learning
Decision trees (concept learnig)
Classification Algorithms
Decision Tree Learning
Artificial Intelligence
Presented By S.Yamuna AP/CSE
Knowledge Representation
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Lecture 05: Decision Trees
Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees.
Learning from Observations
Decision Trees Decision tree representation ID3 learning algorithm
Artificial Intelligence 6. Decision Tree Learning
Learning from Observations
A task of induction to find patterns
Machine Learning: Decision Tree Learning
A task of induction to find patterns
Presentation transcript:

Machine Learning Inductive Learning and Decision Trees CSE 473

Learning Learning is essential for unknown environments, i.e., when designer lacks omniscience Learning is useful as a system construction method, i.e., expose the agent to reality rather than trying to write it down Learning modifies the agent's decision mechanisms to improve performance © D. Weld and D. Fox

Feedback in Learning Supervised learning: correct answers for each example Unsupervised learning: correct answers not given Reinforcement learning: occasional rewards © D. Weld and D. Fox

Reinforcement Learning Same setting as MDP, model unknown Agent has to learn how the world works (transition model, reward structure) Goal is to learn optimal policy by interacting with the environment © D. Weld and D. Fox

Inductive learning Simplest form: learn a function from examples f is the target function An example is a pair (x, f(x)) Problem: find a hypothesis h such that h ≈ f given a training set of examples (This is a highly simplified model of real learning: Ignores prior knowledge Assumes examples are given) © D. Weld and D. Fox

Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: © D. Weld and D. Fox

Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: © D. Weld and D. Fox

Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: © D. Weld and D. Fox

Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: © D. Weld and D. Fox

Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: Ockham’s razor: prefer the simplest hypothesis consistent with data © D. Weld and D. Fox

Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: Ockham’s razor: prefer the simplest hypothesis consistent with data © D. Weld and D. Fox

Decision Trees Input: Description of an object or a situation through a set of attributes. Output: a decision, that is the predicted output value for the input. Both, input and output can be discrete or continuous. Discrete-valued functions lead to classification problems. Learning a continuous function is called regression. © D. Weld and D. Fox

© D. Weld and D. Fox

Experience: “Good day for tennis” Day Outlook Temp Humid Wind PlayTennis? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n © D. Weld and D. Fox

Decision Tree Representation Good day for tennis? Leaves = classification Arcs = choice of value for parent attribute Outlook Sunny Overcast Rain Humidity Wind Yes Strong Normal High Weak Yes No No Yes Decision tree is equivalent to logic in disjunctive normal form G-Day  (Sunny  Normal)  Overcast  (Rain  Weak) © D. Weld and D. Fox

DT Learning as Search Nodes Operators Initial node Heuristic? Goal? Decision Trees Tree Refinement: Sprouting the tree Smallest tree possible: a single leaf Information Gain Best tree possible (???) © D. Weld and D. Fox

What is the Simplest Tree? Day Outlook Temp Humid Wind Play? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n What is the Simplest Tree? How good? [10+, 4-] Means: correct on 10 examples incorrect on 4 examples © D. Weld and D. Fox

Which attribute should we use to split? Successors Yes Humid Wind Outlook Temp Which attribute should we use to split? © D. Weld and D. Fox

To be decided: How to choose best attribute? Information gain Entropy (disorder) When to stop growing tree? © D. Weld and D. Fox

Disorder is bad Homogeneity is good No Better Good © D. Weld and D. Fox

1.0 0.5 Entropy P of as % .00 .50 1.00 © D. Weld and D. Fox

Entropy (disorder) is bad Homogeneity is good Let S be a set of examples Entropy(S) = -P log2(P) - N log2(N) where P is proportion of pos example and N is proportion of neg examples and 0 log 0 == 0 Example: S has 10 pos and 4 neg Entropy([10+, 4-]) = -(10/14) log2(10/14) - (4/14)log2(4/14) = 0.863 © D. Weld and D. Fox

 Information Gain Measure of expected reduction in entropy Resulting from splitting along an attribute  v  Values(A) Gain(S,A) = Entropy(S) - (|Sv| / |S|) Entropy(Sv) Where Entropy(S) = -P log2(P) - N log2(N) © D. Weld and D. Fox

Gain of Splitting on Wind Day Wind Tennis? d1 weak n d2 s n d3 weak yes d4 weak yes d5 weak yes d6 s yes d7 s yes d8 weak n d9 weak yes d10 weak yes d11 s yes d12 s yes d13 weak yes d14 s n Values(wind)=weak, strong S = [10+, 4-] Sweak = [6+, 2-] Ss = [3+, 3-] Gain(S, wind) = Entropy(S) - (|Sv| / |S|) Entropy(Sv) = Entropy(S) - 8/14 Entropy(Sweak) - 6/14 Entropy(Ss) = 0.863 - (8/14) 0.811 - (6/14) 1.00 = -0.029  v  {weak, s} © D. Weld and D. Fox

Evaluating Attributes Yes Humid Wind Gain(S,Wind) =-0.029 Gain(S,Humid) =0.375 Outlook Temp Gain(S,Temp) =0.029 Gain(S,Outlook) =0.401 Value is actually different than this, but ignore this detail © D. Weld and D. Fox

Resulting Tree …. Good day for tennis? Outlook Sunny Overcast Rain No [2+, 3-] Yes [4+] No [2+, 3-] © D. Weld and D. Fox

Recurse! Outlook Sunny Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c n weak yes d11 m n s yes © D. Weld and D. Fox

One Step Later… Outlook Sunny Overcast Rain Humidity Yes [4+] No [2+, 3-] Normal High Yes [2+] No [3-] © D. Weld and D. Fox

Decision Tree Algorithm BuildTree(TraingData) Split(TrainingData) Split(D) If (all points in D are of the same class) Then Return For each attribute A Evaluate splits on attribute A Use best split to partition D into D1, D2 Split (D1) Split (D2) © D. Weld and D. Fox

Overfitting On training data Accuracy On test data 0.9 0.8 0.7 0.6 Number of Nodes in Decision tree © D. Weld and D. Fox

Overfitting 2 Figure from w.w.cohen © D. Weld and D. Fox

Overfitting… DT is overfit when exists another DT’ and DT has smaller error on training examples, but DT has bigger error on test examples Causes of overfitting Noisy data, or Training set is too small © D. Weld and D. Fox

© D. Weld and D. Fox

© D. Weld and D. Fox

Effect of Reduced-Error Pruning Cut tree back to here © D. Weld and D. Fox

Other Decision Tree Features Can handle continuous data Input: Use threshold to split Output: Estimate linear function at each leaf Can handle missing values Use expectation taken from other samples © D. Weld and D. Fox