November 10, 19991 Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.

Slides:



Advertisements
Similar presentations
Analytical Learning.
Advertisements

1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
Machine Learning: Lecture 9
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
RIPPER Fast Effective Rule Induction
Knowledge Representation and Reasoning Learning Sets of Rules and Analytical Learning Harris Georgiou – 4.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2004.
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 10 Learning Sets Of Rules
Machine Learning Chapter 10. Learning Sets of Rules Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday, November 27, 2001 William.
Learning set of rules.
Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2005.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning: Symbol-Based
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Mohammad Ali Keyvanrad
1 Machine Learning: Lecture 11 Analytical Learning / Explanation-Based Learning (Based on Chapter 11 of Mitchell, T., Machine Learning, 1997)
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 11 April 2007 William.
Introduction to ILP ILP = Inductive Logic Programming = machine learning  logic programming = learning with logic Introduced by Muggleton in 1992.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 7, 2001.
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
CpSc 810: Machine Learning Decision Tree Learning.
Learning from Observations Chapter 18 Through
Decision-Tree Induction & Decision-Rule Induction
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Overview Concept Learning Representation Inductive Learning Hypothesis
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, November 29, 2001.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
For Wednesday Read 20.4 Lots of interesting stuff in chapter 20, but we don’t have time to cover it all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.
First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume.
CpSc 810: Machine Learning Analytical learning. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Data Mining and Decision Support
More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Rule-based Learning Propositional Version. Rule Learning Based on generalization operations A generalization (resp. specialization) operation is an operation.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Rule Learning Hankui Zhuo April 28, 2018.
Machine Learning Chapter 3. Decision Tree Learning
Knowledge in Learning Chapter 19
INTRODUCTION TO Machine Learning 2nd Edition
Lecture 14 Learning Inductive inference
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Presentation transcript:

November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming

November 10, Learning Rules § One of the most expressive and human readable representations for learned hypotheses is sets of production rules (if-then rules). §Rules can be derived from other representations (e.g., decision trees) or they can be learned directly. Here, we are concentrating on the direct method. §An important aspect of direct rule-learning algorithms is that they can learn sets of first-order rules which have much more representational power than the propositional rules that can be derived from decision trees. Learning first-order rules can also be seen as automatically inferring PROLOG programs from examples.

November 10, Propositional versus First-Order Logic § Propositional Logic does not include variables and thus cannot express general relations among the values of the attributes. §Example 1: in Propositional logic, you can write: IF (Father 1 =Bob) ^ (Name 2 =Bob)^ (Female 1 =True) THEN Daughter 1,2 =True. This rule applies only to a specific family! §Example 2: In First-Order logic, you can write: IF Father(y,x) ^ Female(y), THEN Daughter(x,y) This rule (which you cannot write in Propositional Logic) applies to any family!

November 10, Learning Propositional versus First-Order Rules §Both approaches to learning are useful as they address different types of learning problems. §Like Decision Trees, Feedforward Neural Nets and IBL systems, Propositional Rule Learning systems are suited for problems in which no substantial relationship between the values of the different attributes needs to be represented. §In First-Order Learning Problems, the hypotheses that must be represented involve relational assertions that can be conveniently expressed using first-order representations such as horn clauses (H <- L 1 ^…^L n ).

November 10, Learning Propositional Rules: Sequential Covering Algorithms Sequential-Covering(Target_attribute, Attributes, Examples, Threshold) §Learned_rules <-- { } §Rule <-- Learn-one-rule(Target_attribute, Attributes, Examples) §While Performance(Rule, Examples) > Threshold, do l Learned_rules <-- Learned_rules + Rule l Examples <-- Examples -{examples correctly classified by Rule} l Rule <-- Learn-one-rule(Target_attribute, Attributes, Examples) §Learned_rules <-- sort Learned_rules according to Performance over Examples §Return Learned_rules

November 10, Learning Propositional Rules: Sequential Covering Algorithms §The algorithm is called a sequential covering algorithm because it sequentially learns a set of rules that together cover the whole set of positive examples. §It has the advantage of reducing the problem of learning a disjunctive set of rules to a sequence of simpler problems, each requiring that a single conjunctive rule be learned. §The final set of rules is sorted so that the most accurate rules are considered first at classification time. §However, because it does not backtrack, this algorithm is not guaranteed to find the smallest or best set of rules ---> Learn-one-rule must be very effective!

November 10, Learning Propositional Rules: Learn-one-rule General-to-Specific Search: §Consider the most general rule (hypothesis) which matches every instances in the training set. §Repeat l Add the attribute that most improves rule performance measured over the training set. §Until the hypothesis reaches an acceptable level of performance. General-to-Specific Beam Search (CN2): §Rather than considering a single candidate at each search step, keep track of the k best candidates.

November 10, Comments and Variations regarding the Basic Rule Learning Algorithms §Sequential versus Simultaneous covering: sequential covering algorithms (CN2) make a larger number of independent choices than simultaneous covering ones (ID3). §Direction of the search: CN2 uses a general-to-specific search strategy. Other systems (GOLEM) uses a specific to general search strategy. General-to-specific search has the advantage of having a single hypothesis from which to start. §Generate-then-test versus example-driven: CN2 is a generate-then-test method. Other methods (AQ, CIGOL) are example-driven. Generate-then-test systems are more robust to noise.

November 10, Comments and Variations regarding the Basic Rule Learning Algorithms,Cont’d §Post-Pruning: pre-conditions can be removed from the rule whenever this leads to improved performance over a set of pruning examples distinct from the training set. §Performance measure: different evaluation can be used. Example: relative frequency (AQ), m- estimate of accuracy (certain versions of CN2) and entropy (original CN2).

November 10, Learning Sets of First-Order Rules: FOIL (Quinlan, 1990) FOIL is similar to the Propositional Rule learning approach except for the following: l FOIL accommodates first-order rules and thus needs to accommodate variables in the rule pre-conditions. l FOIL uses a special performance measure (FOIL-GAIN) which takes into account the different variable bindings. l FOILS seeks only rules that predict when the target literal is True (instead of predicting when it is True or when it is False). l FOIL performs a simple hillclimbing search rather than a beam search.

November 10, Induction as Inverted Deduction §Let D be a set of training examples, each of the form. Then, learning is the problem of discovering a hypothesis h, such that the classification f(x i ) of each training instance x i follows deductively from the hypothesis h, the description of x i and any other background knowledge B known to the system. Example: §x i : Male(Bob), Female(Sharon), Father(Sharon, Bob) §f(x i ): Child(Bob, Sharon) §B: Parent(u,v) <-- Father(v,u) §we want to find h s.t., (B^h^xi) |-- f(xi). h1: Child(u,v) <-- Father(v,u) h2: Child(u,v) <-- Parents(v,u) Examples:

November 10, Induction as Inverted Deduction: Attractive Features §The formulation subsumes the common definition of learning as finding some general concept that matches a given set of examples (with no background knowledge B, available) §Incorporating the notion of Background information allows for a richer definition of when a hypothesis may be said to fit the data. It allows the introduction of domain-specific information. §Background information can help guide the search for h, rather than merely searching the space of syntactically legal hypotheses.

November 10, Induction as Inverted Deduction: Difficulties § The formal definition of learning does not naturally accommodate noisy data. §The language of first-order logic is so expressive that the search through the space of hypotheses is intractable in the general case. Recent work has sought restricted forms of first-order expressions to improve search tractability. §Despite the intuition that background knowledge should help constrain the search for a hypothesis, in most ILP systems, the complexity of the hypothesis space search increases as background knowledge increases. [Chapters 11 and 12, however, discuss how background knowledge can decrease this complexity]

November 10, An Example: CIGOL Resolution Rule Inverse Resolution Rule (Deductive) (Inductive) C1: PassExam v  KnowMaterial C2: KnowMaterial v  Study C: PassExam v  Study C1: PassExam v  KnowMaterial C2: KnowMaterial v  Study C: PassExam v  Study This idea can also be applied to First-Order Resolution!

November 10, First-Order Multi-Step Inverse Resolution Example Father(Shannon, Tom) GrandChild(Bob, Shannon) GrandChild(Bob, x) v  Father(x, Tom) Father(Tom, Bob) GrandChild(y, x) v  Father(x,z) v  Father(z,y) GrandChild(y,x) <-- Father(x,z) ^ Father(z, y)