Machine Learning Chapter 10. Learning Sets of Rules Tom M. Mitchell.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
From Decision Trees To Rules
Decision Tree Approach in Data Mining
Knowledge Representation and Reasoning Learning Sets of Rules and Analytical Learning Harris Georgiou – 4.
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Chapter 10 Learning Sets Of Rules
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday, November 27, 2001 William.
Learning set of rules.
Decision Tree Rong Jin. Determine Milage Per Gallon.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Algorithm
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Induction of Decision Trees
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Machine Learning: Symbol-Based
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Decision tree learning
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Decision Trees and Rule Induction
Mohammad Ali Keyvanrad
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 11 April 2007 William.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 16 March 2007 William.
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
Decision-Tree Induction & Decision-Rule Induction
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, 12 April 2007 William.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, November 29, 2001.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.
First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Rule-based Learning Propositional Version. Rule Learning Based on generalization operations A generalization (resp. specialization) operation is an operation.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
Chapter 2 Concept Learning
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Fast Effective Rule Induction
Decision Tree Learning
CS 9633 Machine Learning Concept Learning
Rule-Based Classification
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
First-Order Rule Learning
Rule Learning Hankui Zhuo April 28, 2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning Chapter 2
Machine Learning Chapter 2
Presentation transcript:

Machine Learning Chapter 10. Learning Sets of Rules Tom M. Mitchell

2 Learning Disjunctive Sets of Rules  Method 1: Learn decision tree, convert to rules  Method 2: Sequential covering algorithm: 1. Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat

3 Sequential Covering Algorithm SEQUENTIAL- COVERING (Target attribute; Attributes; Examples; Threshold)  Learned rules  {}  Rule  LEARN-ONE- RULE(Target_attribute, Attributes, Examples)  while PERFORMANCE (Rule, Examples) > Threshold, do –Learned_rules  Learned_rules + Rule –Examples  Examples – {examples correctly classified by Rule } –Rule  LEARN-ONE- RULE ( Target_attribute, Attributes, Examples ) –Learned_rules  sort Learned_rules accord to PERFORMANCE over Examples –return Learned_rules

4 Learn-One-Rule

5 Learn-One-Rule(Cont.)  Pos  positive Examples  Neg  negative Examples  while Pos, do Learn a NewRule - NewRule  most general rule possible - NewRule  Neg - while NewRuleNeg, do Add a new literal to specialize NewRule 1. Candidate literals  generate candidates 2. Best_literal  argmax L  Candidate literals Performance(SpecializeRule(NewRule; L)) 3. add Best_literal to NewRule preconditions 4. NewRuleNeg  subset of NewRuleNeg that satisfies NewRule preconditions - Learned_rules  Learned_rules + NewRule - Pos  Pos – {members of Pos coverd by NewRule}  Return Learned_rules

6 Subtleties: Learn One Rule 1. May use beam search 2. Easily generalizes to multi-valued target functions 3. Choose evaluation function to guide search: – Entropy (i.e., information gain) – Sample accuracy: where n c = correct rule predictions, n = all predictions  m estimate:

7 Variants of Rule Learning Programs  Sequential or simultaneous covering of data?  General  specific, or specific  general?  Generate-and-test, or example-driven?  Whether and how to post-prune?  What statistical evaluation function?

8 Learning First Order Rules Why do that?  Can learn sets of rules such as Ancestor(x, y)  Parent(x; y) Ancestor(x; y)  Parent(x; z) ^ Ancestor(z; y)  General purpose programming language P ROLOG : programs are sets of such rules

9 First Order Rule for Classifying Web Pages [Slattery, 1997] course(A)  has-word(A, instructor), Not has-word(A, good), link-from(A, B), has-word(B, assign), Not link-from(B, C) Train: 31/31, Test: 31/34

10

11 Specializing Rules in FOIL

12 Information Gain in FOIL

13 Induction as Inverted Deduction

14 Induction as Inverted Deduction(Cont’)

15 Induction is, in fact, the inverse operation of deduction, and cannot be conceived to exist without the corresponding operation, so that the question of relative importance cannot arise. Who thinks of asking whether addition or subtraction is the more important process in arithmetic? But at the same time much difference in difficulty may exist between a direct and inverse operation; : : : it must be allowed that inductive investigations are of a far higher degree of difficulty and complexity than any questions of deduction…. (Jevons 1874) Induction as Inverted Deduction(Cont’)

16 Induction as Inverted Deduction(Cont’)

17 Induction as Inverted Deduction(Cont’)

18 Induction as Inverted Deduction(Cont’)

19 Deduction: Resolution Rule

20 Inverting Resolution

21 Inverted Resolution (Propositional)

22 First order resolution

23 Inverting First order resolution

24 Cigol

25 Progol