First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume.

Slides:



Advertisements
Similar presentations
Analytical Learning.
Advertisements

Concept Learning and the General-to-Specific Ordering
2. Concept Learning 2.1 Introduction
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Decision Tree Approach in Data Mining
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
Knowledge Representation and Reasoning Learning Sets of Rules and Analytical Learning Harris Georgiou – 4.
Chapter 10 Learning Sets Of Rules
Machine Learning Chapter 10. Learning Sets of Rules Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday, November 27, 2001 William.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Learning set of rules.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 21 Jim Martin.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Machine Learning: Symbol-Based
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
CS1502 Formal Methods in Computer Science Lecture Notes 10 Resolution and Horn Sentences.
Basic Data Mining Techniques
17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.
Machine Learning Chapter 3. Decision Tree Learning
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 11 April 2007 William.
Machine Learning Chapter 11. Analytical Learning
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 16 March 2007 William.
Logic Specification and Z Schema 3K04 McMaster. Basic Logic Operators Logical negation ( ¬ ) Logical conjunction ( Λ or & ) Logical disjunction ( V or.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
1 Machine Learning: Rule Learning. 2 Learning Rules If-then rules in logic are a standard representation of knowledge that have proven useful in expert-systems.
Decision-Tree Induction & Decision-Rule Induction
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, 12 April 2007 William.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
START OF DAY 1 Reading: Chap. 1 & 2. Introduction.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.
First-Order Logic and Inductive Logic Programming.
Machine Learning Concept Learning General-to Specific Ordering
January 24, 2016Data Mining: Concepts and Techniques1 Data Mining: Classification and Prediction.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
Rule-based Learning Propositional Version. Rule Learning Based on generalization operations A generalization (resp. specialization) operation is an operation.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
Chapter 2 Concept Learning
CS 9633 Machine Learning Concept Learning
Analytical Learning Discussion (4 of 4):
Rule-Based Classification
Winnowing Algorithm CSL758 Instructors: Naveen Garg
First-Order Logic and Inductive Logic Programming
Ordering of Hypothesis Space
Data Mining: Classification and Prediction
Machine Learning Chapter 3. Decision Tree Learning
First-Order Rule Learning
Rule Learning Hankui Zhuo April 28, 2018.
Machine Learning Chapter 3. Decision Tree Learning
Data Classification for Data Mining
Machine Learning Chapter 2
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

First-Order Rule Learning

Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume the existence of a Learn_one_Rule function: Input: a set of training instances Output: a single high-accuracy (not necessarily high- coverage) rule

Sequential Covering (II) Algorithm Sequential_Covering(Instances) Learned_rules   Rule  Learn_one_Rule(Instances) While Quality(Rule, Instances) > Threshold Do Learned_rules  Learned_rules + Rule Instances  Instances - {instances correctly classified by Rule} Rule  Learn_one_Rule(Instances) Sort Learned_rules by Quality over Instances # Quality is user-defined rule quality evaluation function Return Learned_rules

CN2 (I) Algorithm Learn_one_Rule_CN2(Instances, k) Best_hypo   Candidate_hypo  {Best_hypo} While Candidate_hypo   Do All_constraints  {(a=v): a is an attribute and v is a value of a found in Instances} New_candidate_hypo  For each h  Candidate_hypo For each c  All_constraints, specialize h by adding c Remove from New_candidate_hypo any hypotheses that are duplicates, inconsistent or not maximally specific For all h  New_candidate_hypo If Quality_CN2(h, Instances) > Quality_CN2(Best_hypo, Instances) Best_hypo  h Candidate_hypo  the k best members of New_candidate_hypo as per Quality_CN2 Return a rule of the form “IF Best_hypo THEN Pred” # Pred = most frequent target attribute’s value among instances that match Best_hypo

CN2 (II) Algorithm Quality_CN2(h, Instances) h_instances  {i  Instances: i matches h} Return -Entropy(h_instances) where Entropy is computed with respect to the target attribute Note that CN2 performs a general-to-specific beam search, keeping not the single best candidate at each step, but a list of the k best candidates

Illustrative Training Set

CN2 Example (I) First pass: Full instance set 2-best1: « Income Level = Low » (4-0- 0), « Income Level = High » (0-1-5) Can’t do better than (4-0-0) Best_hypo: « Income Level = Low » First rule: IF Income Level = Low THEN HIGH

CN2 Example (II) Second pass: Instances 2-3, 5-6, 8-10, best1: « Income Level = High » (0-1- 5), « Credit History = Good » (0-1-3) Best_hypo: « Income Level = High » 2-best2: « Income Level = High AND Credit History = Good » (0-0-3), « Income level = High AND Collateral = None » (0-0-3) Best_hypo: « Income Level = High AND Credit History = Good » Can’t do better than (0-0-3) Second rule: IF Income Level = High AND Credit History = Good THEN LOW

CN2 Example (III) Third pass: Instances 2-3, 5-6, 8, 12, 14 2-best1: « Credit History = Good » (0- 1-0), « Debt level = High » (2-1-0) Best_hypo: « Credit History = Good » Can’t do better than (0-1-0) Third rule: IF Credit History = Good THEN MODERATE

CN2 Example (IV) Fourth pass: Instances 2-3, 5-6, 8, 14 2-best1: « Debt level = High » (2-0-0), « Income Level = Medium » (2-1-0) Best_hypo: « Debt Level = High » Can’t do better than (2-0-0) Fourth rule: IF Debt Level = High THEN HIGH

CN2 Example (V) Fifth pass: Instances 3, 5-6, 8 2-best1: « Credit History = Bad » (0-1- 0), « Income Level = Medium » (0-1-0) Best_hypo: « Credit History = Bad » Can’t do better than (0-1-0) Fifth rule: IF Credit History = Bad THEN MODERATE

CN2 Example (VI) Sixth pass: Instances 3, best1: « Income Level = High » (0-0- 2), « Collateral = Adequate » (0-0-1) Best_hypo: « Income Level = High » Can’t do better than (0-0-2) Sixth rule: IF Income Level = High THEN LOW

CN2 Example (VII) Seventh pass: Instance 3 2-best1: « Credit History = Unknown » (0-1-0), « Debt level = Low » (0-1-0) Best_hypo: « Credit History = Unknown » Can’t do better than (0-1-0) Seventh rule: IF Credit History = Unknown THEN MODERATE

CN2 Example (VIII) Quality: -  p i log(p i ) Rule 1: (4-0-0)- Rank 1 Rule 2: (0-0-3)- Rank 2 Rule 3: (1-1-3)- Rank 5 Rule 4: (4-1-2)- Rank 6 Rule 5: (3-1-0)- Rank 4 Rule 6: (0-1-5)- Rank 3 Rule 7: (2-1-2)- Rank 7

CN2 Example (IX) IF Income Level = Low THEN HIGH IF Income Level = High AND Credit History = Good THEN LOW IF Income Level = High THEN LOW IF Credit History = Bad THEN MODERATE IF Credit History = Good THEN MODERATE IF Debt Level = High THEN HIGH IF Credit History = Unknown THEN MODERATE

Limitations of AVL (I) Consider the MONK1 problem: 6 attributes A1: 1, 2, 3 A2: 1, 2, 3 A3: 1, 2 A4: 1, 2, 3 A5: 1, 2, 3, 4 A6: 1, 2 2 classes: 0, 1 Target concept: If (A1=A2 or A5=1) then Class 1

Limitations of AVL (II) Can you build a decision tree for this concept?

Limitations of AVL (III) Can you build a rule set for this concept? If A1=1 and A2=1 then Class=1 If A1=1 and A2=1 then Class=1 If A1=2 and A2=2 then Class=1 If A1=2 and A2=2 then Class=1 If A1=3 and A2=3 then Class=1 If A1=3 and A2=3 then Class=1 If A5=1 then Class=1 If A5=1 then Class=1 Class=0 Class=0

First-order Language Supports first-order concepts -> relations between attributes accounted for in a natural way For simplicity, restrict to Horn clauses A clause is any disjunction of literals whose variables are universally quantified Horn clauses (single non-negated literal):

FOIL (I) Algorithm FOIL(Target_predicate, Predicates, Examples) Pos  those Examples for which Target_predicate is true Neg  those Examples for which Target_predicate is false Learned_rules   While Pos   Do New_rule  the rule that predicts Target_predicate with no precondition New_rule_neg  Neg While New_rule_neg   Do Candidate_literals  GenCandidateLit(New_rule, Predicates) Best_literal  argmax L  Candidate_literals FoilGain(L, New_rule) Add Best_literal to New_rule’s preconditions New_rule_neg  subset of New_rule_neg that satisfies New_rule’s preconditions Learned_rules  Learned_rules + New_rule Pos  Pos – {members of Pos covered by New_rule} Return Learned_rules

FOIL (II) Algorithm GenCandidateLit(Rule, Predicates) Let Rule  P(x 1, …, x k )  L 1, …, L n Return all literals of the form Q(v 1, …, v r ) where Q is any predicate in Predicates and the v i ’s are either new variables or variables already present in Rule, with the constraint that at least one of the v i ’s must already exist as a variable in Rule Equal(x j, x k ) where x j and x k are variables already present in Rule The negation of all of the above forms of literals

FOIL (III) Algorithm FoilGain(L, Rule) Return where p 0 is the number of positive bindings of Rule p 0 is the number of positive bindings of Rule n 0 is the number of negative bindings of Rule n 0 is the number of negative bindings of Rule p 1 is the number of positive bindings of Rule+L p 1 is the number of positive bindings of Rule+L n 1 is the number of negative bindings of Rule+L n 1 is the number of negative bindings of Rule+L t is the number of positive bindings of Rule that are still covered after adding L to Rule t is the number of positive bindings of Rule that are still covered after adding L to Rule

Illustration (I) Consider the data: GrandDaughter(Victor, Sharon) Father(Sharon, Bob) Father(Tom, Bob) Female(Sharon) Father(Bob, Victor) Target concept: GrandDaughter(x, y) Closed-world assumption

Illustration (II) Training set: Positive examples: GrandDaughter(Victor, Sharon) Negative examples: GrandDaughter(Victor, Victor) GrandDaughter(Victor, Bob) GrandDaughter(Victor, Tom) GrandDaughter(Sharon, Victor) GrandDaughter(Sharon, Sharon) GrandDaughter(Sharon, Bob) GrandDaughter(Sharon, Tom) GrandDaughter(Bob, Victor) GrandDaughter(Bob, Sharon) GrandDaughter(Bob, Bob) GrandDaughter(Bob, Tom) GrandDaughter(Tom, Victor) GrandDaughter(Tom, Sharon) GrandDaughter(Tom, Bob) GrandDaughter(Tom, Tom)

Illustration (III) Most general rule: GrandDaughter(x, y)  Specializations: Father(x, y) Father(x, z) Father(y, x) Father(y, z) Father(x, z) Father(z, x) Female(x) Female(y) Equal(x, y) Negations of each of the above

Illustration (IV) Consider 1 st specialization GrandDaughter(x, y)  Father(x, y) 16 possible bindings: x/Victor, y/Victor x/Victor y/Sharon … x/Tom, y/Tom FoilGain: p 0 = 1 (x/Victor, y/Sharon) n 0 = 15 p 1 = 0 n 1 = 16 t = 0 So that GainFoil(1 st specialization) = 0

Illustration (V) Consider 4 th specialization GrandDaughter(x, y)  Father(y, z) 64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom FoilGain: p 0 = 1 (x/Victor, y/Sharon) n 0 = 15 p 1 = 1 (x/Victor, y/Sharon, z/Bob) n 1 = 11 (x/Victor, y/Bob, z/Victor) (x/Victor, y/Tom, z/Bob) (x/Sharon, y/Bob, z/Victor) (x/Sharon, y/Tom, z/Bob) (x/Bob, y/Tom, z/Bob) (x/Bob, y/Sharon, z/Bob) (x/Tom, y/Sharon, z/Bob) (x/Tom, y/Bob, z/Victor) (x/Sharon, y/Sharon, z/Bob) (x/Bob, y/Bob, z/Victor) (x/Tom, y/Tom, z/Bob) t = 1 So that GainFoil(4 th specialization) = 0.415

Illustration (VI) Assume the 4 th specialization is indeed selected Partial rule: GrandDaughter(x, y)  Father(y, z) Still covers 11 negative examples New set of candidate literals: All of the previous ones Female(z) Equal(x, z) Equal(y, z) Father(z, w) Father(w, z) Negations of each of the above

Illustration (VII) Consider the specialization GrandDaughter(x, y)  Father(y, z), Equal(x, z) 64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom FoilGain: p 0 = 1 (x/Victor, y/Sharon, z/Bob) n 0 = 11 p 1 = 0 n 1 = 3 (x/Victor, y/Bob, z/Victor) (x/Bob, y/Tom, z/Bob) (x/Bob, y/Sharon, z/Bob) t = 0 So that GainFoil(specialization) = 0

Illustration (VIII) Consider the specialization GrandDaughter(x, y)  Father(y, z), Father(z, x) 64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom FoilGain: p 0 = 1 (x/Victor, y/Sharon, z/Bob) n 0 = 11 p 1 = 1(x/Victor, y/Sharon, z/Bob) n 1 = 1 (x/Victor, y/Tom, z/Bob) t = 1 So that GainFoil(specialization) = 2.585

Illustration (IX) Assume that specialization is indeed selected Partial rule: GrandDaughter(x, y)  Father(y, z), Father(z, x) Still covers 1 negative example No new set of candidate literals Use all of the previous ones

Illustration (X) Consider the specialization GrandDaughter(x, y)  Father(y, z), Father(z, x), Female(y) 64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom FoilGain: p 0 = 1 (x/Victor, y/Sharon, z/Bob) n 0 = 1 p 1 = 1(x/Victor, y/Sharon, z/Bob) n 1 = 0 t = 1 So that GainFoil(specialization) = 1

Illustration (XI) No negative examples are covered and all positive examples are covered So, we get the final correct rule: GrandDaughter(x, y)  Father(y, z), Father(z, x), Female(y)