Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP
Logic Programming Consider the following example of a logic program: parent_of(charles,george). parent_of(george,diana). parent_of(bob,harry). parent_of(harry,elizabeth). grandparent_of(X,Y) :- parent_of(X,Z), parent_of(Z,Y). Query: grandparent_of(X,Y)? Answers: grandparent_of(charles,diana). grandparent_of(bob,elizabeth). From the program, we can ask queries about grandparents. CS7742L16ILP
(Machine) Learning The process by which relatively permanent changes occur in behavioral potential as a result of experience. (Anderson) Learning is constructing or modifying representations of what is being experienced. (Michalski) A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. (Mitchell) CS7743L16ILP
Forms of Reasoning Deduction: From causes to effect (Prediction) fact a, rule a => b INFER b (*First-order logic*) Abduction: From effects to possible causes (Explanation) rule a => b, observe b AN EXPLANATION a Induction: From correlated observations to rules (Learning) observe correlation between a 1, b 1,... a n, b n LEARN a -> b CS7744L16ILP
What is ILP? Inductive Logic Programming (ILP) Automated learning of logic rules from examples and background knowledge E.g., learn the rule for grandparents, given background knowledge of parents and examples of grandparents ILP can be used for classification and prediction CS7745L16ILP
Why ILP ? – multiple relations Genealogy example: Given known relations… father(Old,Young) and mother(Old,Young) male(Somebody) and female(Somebody) …learn new relations parent(X,Y) :- father(X,Y). parent(X,Y) :- mother(X,Y). brother(X,Y) :- male(X),father(Z,X),father(Z,Y). Most ML techniques cannot use more than one relation e.g., decision trees, neural networks, … CS7746L16ILP
ILP – formal definitions Given a logic program B representing background knowledge a set of positive examples E + a set of negative examples E - Find hypothesis H such that: 1. B U H e for every e E B U H f for every f E B U H is consistent. Assume that B e for some e E +. CS7747L16ILP
ILP – logical foundation Prolog = Programming with Logic is used to represent: Background knowledge (of the domain): facts Examples (of the relation to be learned): facts Theories (as a result of learning): rules Supports two forms of logical reasoning Deduction Induction CS7748L16ILP
Logical reasoning: deduction From rules to facts… B T |- E mother(penelope,victoria). mother(penelope,arthur). father(christopher,victoria). father(christopher,arthur). parent(X,Y) :- father(X,Y). parent(X,Y) :- mother(X,Y). parent(penelope,victoria). parent(penelope,arthur). parent(christopher,victoria). parent(christopher,arthur). CS7749L16ILP
Logical reasoning: induction From facts to rules… B E |- T mother(penelope,victoria). mother(penelope,arthur). father(christopher,victoria). father(christopher,arthur). parent(X,Y) :- father(X,Y). parent(X,Y) :- mother(X,Y). parent(penelope,victoria). parent(penelope,arthur). parent(christopher,victoria). parent(christopher,arthur). CS77410L16ILP
Example Background knowledge B: parent_of(charles,george). parent_of(george,diana). parent_of(bob,harry). parent_of(harry,elizabeth). Positive examples E +: grandparent_of(charles,diana). grandparent_of(bob,elizabeth). Generate hypothesis H: grandparent_of(X,Y) :- parent_of(X,Z), parent_of(Z,Y). CS77411L16ILP
12 Example: Same Generation CS774
Why ILP ? - Structured data Seed example of East-West trains (Michalski) What makes a train to go eastward ? CS77413L16ILP
Why ILP ? – multiple relations This is related to structured data TrainCar t1 c11 t1 c12 t1 c13 t1 c14 t2 c21 … … CarLengthShapeAxesRoof… c11shortrectangle2none… c12longrectangle3none… c13shortrectangle2peaked… c14longrectangle2none… c21shortrectangle2flat… ……………… has_carcar_properties CS77414L16ILP
Induction of a classifier: example Example of East-West trains B: relations has_car and car_properties ( length, roof, shape, etc.) e.g., has_car(t1,c11) E: the trains t1 to t10 C: east, west Possible T: east(T) :- has_car(T,C), length(C,short), roof(C,_). CS77415L16ILP
ILP systems Two of the most popular ILP systems: Progol FOIL Progol [Muggleton95] Developed by S. Muggleton et. al. Learns first-order Horn clauses (no negation in head and body literals of hypotheses) FOIL [Quinlan93] Developed by J. Quinlan et. al. Learns first-order rules (no negation in head literals of the hypotheses) CS77416L16ILP
Rule Learning (Intuition) How to come up with a rule for grandparent_of(X,Y) ? 1. Take the example grandparent_of(bob,elizabeth). 2. Find the subset of background knowledge relevant to this example: parent_of(bob,harry), parent_of(harry,elizabeth). 3. Form a rule from these facts grandparent_of(bob,elizabeth) :- parent_of(bob,harry), parent_of(harry,elizabeth). 4. Generalize the rule grandparent_of(X,Y) :- parent_of(X,Z), parent_of(Z,Y). 5. Check if this rule is valid w.r.t the positive and the negative examples CS77417L16ILP
Top-down induction of logic programs Employs refinement operators Typical refinement operators on a clause: Apply a substitution to clause Add a literal to the body of clause Refinement graph: Nodes correspond to clauses Arcs correspond to refinements CS77418L16ILP
Part of refinement graph has_a_daughter(X). has_a_daughter(X) :- has_a_daughter(X) :- has_a_daughter(X) :- male(Y). female(Y). parent(Y,Z). has_a_daughter(X) :- has_a_daughter(X) :- has_a_daughter(X) :- male(X). female(Y), parent(X,Z). parent(S,T) has_a_daughter(X) :- parent(X,Z), female(U). has_a_daughter(X) :- parent(X,Z), female(Z). CS77419L16ILP
Progol Algorithm Outline 1. From a subset of positive examples, construct the most specific rule r s. 2. Based on r s, find a generalized form r g of r s so that score(r g ) has the highest value among all candidates. 3. Remove all positive examples that are covered by r g. 4. Go to step 1 if there are still positive examples that are not yet covered. CS77420L16ILP
Scoring hypotheses score(r) is a measure of how well a rule r explains all the examples with preference given to shorter rules. p r = number of +ve examples correctly deducible from r n r = number of -ve examples correctly deducible from r c r = number of body literals in rule r score(r) = p r – (n r + c r ) CS77421L16ILP
Applications of ILP Constructing Biological Knowledge Bases by Extracting Information from Text Sources (M. Craven & J. Kumlien) [Craven99] The automatic discovery of structural principles describing protein fold space (A. Cootes, S.H. Muggleton, and M.J.E. Sternberg) [Cootes03] More from UT-ML group (Ray Mooney) CS77422L16ILP
Example of relation from text We want to extract the following relation: Sample sentence from biomedical articles: CS77423L16ILP
References [Quinlan93] J. R. Quinlan, R. M. Cameron-Jones. FOIL: A Midterm Report. Proceedings of Machine Learning: ECML-93 [Muggleton95] S. Muggleton. Inverse Entailment and Progol. New Generation Computing Journal, 13: , [Craven99] M. Craven & J. Kumlien (1999). Constructing Biological Knowledge Bases by Extracting Information from Text Sources. ISMB 99. [Cootes03] A. Cootes, S.H. Muggleton, and M.J.E. Sternberg. The automatic discovery of structural principles describing protein fold space. Journal of Molecular Biology, 330(4): , CS77424L16ILP