Machine Learning Reading: Chapter 18. 2 Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.

Slides:

Advertisements

Similar presentations

Decision Trees Decision tree representation ID3 learning algorithm

Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Decision Tree Approach in Data Mining

Decision Tree Learning 主講人：虞台文大同大學資工所智慧型多媒體研究室.

Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.

Chapter 7 – Classification and Regression Trees

Chapter 7 – Classification and Regression Trees

ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.

Decision Tree Algorithm

Ensemble Learning: An Introduction

Induction of Decision Trees

Machine Learning Reading: Chapter Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically.

Three kinds of learning

LEARNING DECISION TREES

ICS 273A Intro Machine Learning

Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.

By Wang Rui State Key Lab of CAD&CG

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.

Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.

Mohammad Ali Keyvanrad

For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.

Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )

Chapter 9 – Classification and Regression Trees

For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Benk Erika Kelemen Zsolt

Learning from Observations Chapter 18 Through

Today Ensemble Methods. Recap of the course. Classifier Fusion

For Wednesday No reading Homework: –Chapter 18, exercise 6.

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Decision Trees, Part 1 Reading: Textbook, Chapter 6.

1 Tournament Not complete Processing will begin again tonight, 7:30PM until wee hours Friday, 8-5. Extra Credit 5 points for passing screening, in tournament.

DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.

Learning From Observations Inductive Learning Decision Trees Ensembles.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.

CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.

Ensemble Classifiers.

Machine Learning: Ensemble Methods

Chapter 18 From Data to Knowledge

Introduce to machine learning

Erich Smith Coleman Platt

Decision Trees (suggested time: 30 min)

Ch9: Decision Trees 9.1 Introduction A decision tree:

Data Science Algorithms: The Basic Methods

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

Decision Tree Saed Sayad 9/21/2018.

Introduction to Data Mining, 2nd Edition by

Introduction to Data Mining, 2nd Edition by

Data Mining Practical Machine Learning Tools and Techniques

Introduction to Data Mining, 2nd Edition

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning: Lecture 3

Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)

Machine Learning Chapter 3. Decision Tree Learning

Statistical Learning Dong Liu Dept. EEIS, USTC.

Ensemble learning.

Model Combination.

Decision trees One possible representation for hypotheses

Machine Learning: Methodology Chapter

Presentation transcript:

Machine Learning Reading: Chapter 18

2 Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous valued function is called regression Binary or boolean classification: category is either true or false

3 Learning Decision Trees Each node tests the value of an input attribute Branches from the node correspond to possible values of the attribute Leaf nodes supply the values to be returned if that leaf is reached

4 Example Iris Plant Database Which of 3 classes is a given Iris plant? Iris Setosa Iris Versicolour Iris Virginica Attributes Sepal length in cm Sepal width in cm Petal length in cm Petal width in cm

5 Summary Statistics: Min Max Mean SD ClassCorrelation sepal length: sepal width: petal length: (high!) petal width: (high!) Rules to learn If sepal length > 6 and sepal width > 3.8 and petal length < 2.5 and petal width < 1.5 then class = Iris Setosa If sepal length > 5 and sepal width > 3 and petal length >5.5 and petal width >2 then class = Iris Versicolour If sepal length 3 and petal length  2.5 and ≤ 5.5 and petal width  1.5 and ≤ 2 then class = Iris Virginica

6 Data S-lengthS-widthP-lengthP-widthClass Versicolour Setosa Verginica Verginica Versicolour Setosa Setosa Verginica Versicolour

7 Data S-lengthS-widthP-lengthClass Versicolour Setosa Verginica Verginica Versicolour Setosa Setosa Verginica Versicolour

8 Constructing the Decision Tree Goal: Find the smallest decision tree consistent with the examples Find the attribute that best splits examples Form tree with root = best attribute For each value v i (or range) of best attribute Selects those examples with best=v i Construct subtree i by recursively calling decision tree with subset of examples, all attributes except best Add a branch to tree with label=v i and subtree=subtree i

9 Construct example decision tree

10 Tree and Rules Learned S-length <5.5  5.5 3,4,8P-length  5.6 ≤2.4 1,5,9 2,6,7 If S-length < 5.5., then Verginica If S-length  5.5 and P-length  5.6, then Versicolour If S-length  5.5 and P-length ≤ 2.4, then Setosa

11 Comparison of Target and Learned If S-length < 5.5., then Verginica If S-length  5.5 and P-length  5.6, then Versicolour If S-length  5.5 and P-length ≤ 2.4, then Setosa If sepal length > 6 and sepal width > 3.8 and petal length < 2.5 and petal width < 1.5 then class = Iris Setosa If sepal length > 5 and sepal width > 3 and petal length >5.5 and petal width >2 then class = Iris Versicolour If sepal length 3 and petal length  2.5 and ≤ 5.5 and petal width  1.5 and ≤ 2 then class = Iris Virginica

12 Text Classification Is text i a finance new article? PositiveNegative

13 20 attributes Investors 2 Dow 2 Jones 2 Industrial 1 Average 3 Percent 5 Gain 6 Trading 8 Broader 5 stock 5 Indicators 6 Standard 2 Rolling 1 Nasdaq 3 Early 10 Rest 12 More 13 first 11 Same 12 The 30

14 20 attributes Men’s Basketball Championship UConn Huskies Georgia Tech Women Playing Crown Titles Games Rebounds All-America early rolling Celebrates Rest More First The same

15 Example stockrollingtheclass 10340other 26835finance 37725other 45714other 58220finance 69425finance 75620finance 80235other finance other

16 Constructing the Decision Tree Goal: Find the smallest decision tree consistent with the examples Find the attribute that best splits examples Form tree with root = best attribute For each value v i (or range) of best attribute Selects those examples with best=v i Construct subtree i by recursively calling decision tree with subset of examples, all attributes except best Add a branch to tree with label=v i and subtree=subtree i

17 Choosing the Best Attribute: Binary Classification Want a formal measure that returns a maximum value when attribute makes a perfect split and minimum when it makes no distinction Information theory (Shannon and Weaver 49) H(P(v 1 ),…P(v n ))=∑-P(v i )log 2 P(v i ) H(1/2,1/2)=-1/2log 2 1/2-1/2log 2 1/2=1 bit H(1/100,1/99)=-.01log log 2.99=.08 bits n i=1

18 Information based on attributes = Remainder (A) P=n=10, so H(1/2,1/2)= 1 bit

19 Information Gain Information gain (from attribute test) = difference between the original information requirement and new requirement Gain(A)=H(p/p+n,n/p+n)- Remainder(A)

20 Example stockrollingtheclass 10340other 26835finance 37725other 45714other 58220finance 69425finance 75620finance 80235other finance other

21 stockrolling <55-10  <5 1,8,9,102,3,4,5,6,7 1,5,6,8 2,3,4,7 9,10 Gain(stock)=1-[4/10H(1/10,3/10)+6/10H(4/10,2/10)]= 1-[.4((-.1* -3.32)-(.3*-1.74))+.6((-.4*-1.32)-(.2*-2.32))]= 1-[ ]=.105 Gain(rolling)=1-[4/10H(1/2,1/2)+4/10H(1/2,1/2)+2/10H(1/2,1/2)]=0

22 Other cases What if class is discrete valued, not binary? What if an attribute has many values (e.g., 1 per instance)?

23 Training vs. Testing A learning algorithm is good if it uses its learned hypothesis to make accurate predictions on unseen data Collect a large set of examples (with classifications) Divide into two disjoint sets: the training set and the test set Apply the learning algorithm to the training set, generating hypothesis h Measure the percentage of examples in the test set that are correctly classified by h Repeat for different sizes of training sets and different randomly selected training sets of each size.

24

25 Division into 3 sets Inadvertent peeking Parameters that must be learned (e.g., how to split values) Generate different hypotheses for different parameter values on training data Choose values that perform best on testing data Why do we need to do this for selecting best attributes?

26 Overfitting Learning algorithms may use irrelevant attributes to make decisions For news, day published and newspaper Decision tree pruning Prune away attributes with low information gain Use statistical significance to test whether gain is meaningful

27 K-fold Cross Validation To reduce overfitting Run k experiments Use a different 1/k of data for testing each time Average the results 5-fold, 10-fold, leave-one-out

28 Ensemble Learning Learn from a collection of hypotheses Majority voting Enlarges the hypothesis space

29 Boosting Uses a weighted training set Each example as an associated weight w j  0 Higher weighted examples have higher importance Initially, w j =1 for all examples Next round: increase weights of misclassified examples, decrease other weights From the new weighted set, generate hypothesis h 2 Continue until M hypotheses generated Final ensemble hypothesis = weighted-majority combination of all M hypotheses Weight each hypothesis according to how well it did on training data

30 AdaBoost If input learning algorithm is a weak learning algorithm L always returns a hypothesis with weighted error on training slightly better than random Returns hypothesis that classifies training data perfectly for large enough M Boosts the accuracy of the original learning algorithm on training data

31 Issues Representation How to map from a representation in the domain to a representation used for learning? Training data How can training data be acquired? Amount of training data How well does the algorithm do as we vary the amount of data? Which attributes influence learning most? Does the learning algorithm provide insight into the generalizations made?