CSE573 Autumn 1997 1 03/09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Tree Example MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Machine Learning II Decision Tree Induction CSE 473.
Decision Tree Learning
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Induction of Decision Trees
Three kinds of learning
Machine Learning: Symbol-Based
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Learning Chapter 18 and Parts of Chapter 20
Machine Learning Chapter 3. Decision Tree Learning
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
CpSc 810: Machine Learning Decision Tree Learning.
Inferring Decision Trees Using the Minimum Description Length Principle J. R. Quinlan and R. L. Rivest Information and Computation 80, , 1989.
Learning from Observations Chapter 18 Through
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Decision Tree Learning
Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Tree Learning
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Machine Learning Lecture 2: Decision Tree Learning.
Classification Algorithms
Decision Tree Learning
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
Prepared by: Mahmoud Rafeek Al-Farra
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Decision tree representation ID3 learning algorithm
Learning Chapter 18 and Parts of Chapter 20
Data Classification for Data Mining
Machine Learning Chapter 2
A task of induction to find patterns
Machine Learning: Decision Tree Learning
Data Mining CSCI 307, Spring 2019 Lecture 15
A task of induction to find patterns
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s “semantics” hasty conclusion This time –machine learning in general –decision-tree learning in particular

CSE573 Autumn Machine Learning in General Most general statement of the problem(s) –a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experiences with E World Model (Predictor) Problem Solver (Control Rules) External World Concept learning Reinforcement learning Explanation-Based learning

CSE573 Autumn Components of a Learning System The “function” to be learned –a logical expression describing a concept (concept- or inductive learning):  x. chair(x)  has-back(x)  can-support(x)  … –a “policy” mapping from percepts to behaviors or actions (reinforcement learning) –search control rules or ranking function (explanation-based learning) The inputs (training data) Feedback –whether or not chair(OBJ-25) –state description resulting from executing an action –reward/penalty for executing a plan –results of a problem-solving episode (quality of solution, time to solution) Restriction on the form of the function (bias)

CSE573 Autumn Learning as Function Induction Given a set of tuples (the training set) –(x 11, x 12, …, x 1n, y 1 ) –(x 21, x 22, …, x 2n, y 2 ) –… Produce a function f(x 1, …, x n ) = y –a compact function –that works on unobserved examples, not just the training set Without some restrictions on f’s functional form, this will be impossible –numeric functions: linear, polynomial –symbolic functions: disjunctive normal form Inevitable tradeoff between the complexity of the function learned and the computational complexity of learning it.

CSE573 Autumn Decision-Tree Learning The function learned is disjunctive-normal-form formulas –(a 1 = v 11 ) ^ (a 2 = v 12 ) ^ … ^ (a n = v 1n ) => y = y 1k –(a 1 = v 21 ) ^ (a 2 = v 22 ) ^ … ^ (a n = v 2n ) => y = y 2k (often the v ij and y ij values are either TRUE or FALSE) The tree representation has an attribute at each node and one child for each value that attribute can take on. A BCT DEEF TFFF A=a 1 A=a 2 A=a 3 B=T B=F

CSE573 Autumn Example

CSE573 Autumn Basic Algorithm Recall, a node in the tree represents a conjunction of attribute values. We will try to build “the shortest possible” tree that classifies all the training examples correctly. In the algorithm we also store the list of attributes we have not used so far for classification. Initialization: tree  {} attributes  {all attributes} examples  {all training examples} Recursion: –Choose a new attribute A with possible values {a i } –For each a i, add a subtree formed by recursively building the tree with the current node as root all attributes except A all examples where A=a i

CSE573 Autumn Basic Algorithm (cont.) Termination (working on a single node): –If all examples have the same classification, then this combination of attribute values is sufficient to classify all (training) examples. Return the unanimous classification. –If examples is empty, then there are no examples with this combination of attribute values. Associate some “guess” with this combination. –If attributes is empty, then the training data is not sufficient to discriminate. Return some “guess” based on the remaining examples.

CSE573 Autumn Some Sample Attribute Choices D1,D2,...,D14 D1D2D14... DAY D1,D2,...,D14 ALL D1,D2,...,D14 HUMIDITY D1, S2, D3, D4, D8, D14 D5, D6, D7 D9, D10, D11 D12, D13 = high = normal D1,D2,...,D14 OUTLOOK D1, D3, D8, D9, D11 D3, D7, D12, D13 = sunny= overcast= rain D4, D5, D6, D10, D14

CSE573 Autumn Remaining questions Special cases –When attributes are left but no example (bad coverage in the training set) –When examples are left but no attributes (bad set of predictive attributes) How to choose the next attribute to discriminate on?

CSE573 Autumn How to choose the next attribute What is our goal in building the tree in the first place? –Maximize accuracy over the entire data set –Minimize expected number of tests to classify an example (In both cases this can argue for building the shortest tree.) We can’t really do the first looking only at the training set: we can only build a tree accurate for our subset and assume the characteristics of the full data set are the same. To minimize the expected number of tests –the best test would be one where each branch has all positive or all negative instances –the worst test would be one where the proportion of positive to negative instances is the same in every branch knowledge of A would provide no information about the example’s ultimate classification

CSE573 Autumn The Entropy (Disorder) of a Collection Suppose S is a collection containing positive and negative examples of the target concept: –Entropy(S)  – (p + log 2 p + + p - log 2 p - ) –where p + is the fraction of examples that are positive and p - is the fraction of examples that are negative Good features –minimum of 0 where p + = 0 and where p - = 0 –maximum where p + = p - = 0.5 Interpretation: the minimum number of bits required to encode the classification of an arbitrary member of S. We want to reduce the entropy in the collection as quickly as possible.

CSE573 Autumn Entropy and Information Gain The best attribute is one that maximizes the expected decrease in entropy –if entropy decreases to 0, the tree need not be expanded further –if entropy does not decrease at all, the attribute was useless Gain is defined to be –Gain(S, A) = Entropy(S) –  v  values(A) p {A=v} Entropy(S {A=v} ) –where p {A=v} is the proportion of S where A=v, and –S {A=v} is the collection taken by selecting those elements of S where A=v

CSE573 Autumn Example S: [9+, 5-] E = 0.940

CSE573 Autumn Choosing the First Attribute Humidity S: [9+, 5-] E = HighLow S: [3+, 4-] E = S: [6+, 1-] E = Wind S: [9+, 5-] E = HighLow S: [6+, 2-] E = S: [3+, 3-] E = Gain(S, Humidity) = (7/14) (7/14).592 =.151 Gain(S, Wind) = (8/14) (6/14)1.00 =.048 Gain(S, Outlook) =.246 Gain(S, Temperature) =.029

CSE573 Autumn After the First Iteration Outlook SunnyRain ? Yes ? Overcast D1, D2, …, D D1, D2, D8, D9, D11 [3+, 2-] E=.970 D3, D7, D12, D13 [4+, 0-] D4, D5, D6, D10, D14 [3+, 2-] Gain(S sunny, Humidity) =.970 Gain(S sunny, Temp) =.570 Gain(S sunny, Wind) =.019

CSE573 Autumn Final Tree Outlook SunnyRain Humidity Yes Wind Overcast NoYesNoYes HighLowStrongWeak

CSE573 Autumn Remaining questions When a node has attributes remaining but no examples? –Constant default value –Based on majority of all examples consistent with the examples classified so far except for the last attribute When a node has examples remaining but no attributes? –Based on majority of all examples that remain unclassified.

CSE573 Autumn Some Additional Technical Problems Noise in the data Overfitting Missing values