General Ideas in Inductive Logic Programming FOPI-RG - 19/4/05
Outline ILP Problem settings Bottom-up and top-down learning Major Problem Areas Search Space Positive Only Noise Additional Points Applications Final Thoughts
Inductive Logic Programming Given: a set of examples, E background knowledge, BK Produce a set of relations (clauses) using BK that describe E. ILP has two problem settings: nonmonotonic normal
Given: Two sets, I+ and I-, of Herbrand interpretations (positive and negative examples). Find: A theory which is true under each I in I+, and false under each I in I-. ILP Settings - nonmonotonic Often only use positive examples for this problem setting
ILP Settings - non-monotonic bird(X):- feathers(X). swims(X):- fish(X); penguin(X). Examples: begin(e1) penguin(e1) bird(e1) feathers(e1) swims(e1) end(e1) begin(e2) carp(e2) fish(e2) scales(e2) swims(e2) end(e2) begin(e4) eagle(e4) bird(e4) feathers(e4) flies(e4) end(e4) begin(e3) ostrich(e3) bird(e3) feathers(e3) runs(e3) end(e3) Relations: Background knowledge: (e.g. Claudien - complicated declarative language bias (Dlab) definition)
ILP Settings - normal Given: background knowledge, BK, and evidence E = E+ E Find: A hypothesis H such that BK H E+ BK H E BK can be: Extensional (ground facts) Intensional (more general predicate definitions) Something specialised (depending on the ILP engine)
ILP Settings - normal Examples:bird(penguin) bird(eagle) bird(crow) bird(ostrich) Positive:bird(carp) bird(bat) bird(horse) Negative: fish(X) :- has_scales(X), swims(X). mammal(X):- warm_blooded(X), live_young(X). Background knowledge: swims(carp). swims(penguin). flies(crow). flies(bat). flies(eagle).lays_eggs(penguin). lays_eggs(crow). lays_eggs(eagle). lays_eggs(ostrich). lays_eggs(carp). runs(horse). runs(ostrich). bird(X):- lays_eggs(X), flies(X). Theory (one or more clauses): bird(penguin). bird(X):- lays_eggs(X), runs(X).
Bottom-up vs Top-down bird(X):- true. false. coverset E {} bird(penguin). bird(eagle). bird(crow). ….. bird(X):- flies(X),…..lays_eggs(X). bird(X):- flies(X). everything that is v. like an eagle everything that flies e+ (1 bird) bird(X):- lays_eggs(X). theta subsumption etc.
Bottom-Up Approach bird(crow) bird(eagle) bird(X):- relative least general generalisation (rlgg) has(X, beak), has(X, talons), makes_nest(X), eats(X,rodents). lays_eggs(X), flies(X), has(X, feathers), bird(ostrich) bird(X):- has(X, talons), makes_nest(X), eats(X,Y), validate_food(X,Y). has(X, feathers), has(X, beak), lays_eggs(X), Used in GOLEM [Muggleton, 90]
Top-down Approach bird(X):-. bird(X):- lays_eggs(X). bird(X):- flies(X). bird(X):- lays_eggs(X), flies(X). … Some ILP engines use standard top-down search algorithms: depth-first, breadth-first, A*, etc. We can improve efficiency by: setting a depth-bound (max clauselength). paying attention to clause evaluation scores - coverage, MDL. — re-ordering candidate clauses based on score — pruning candidate clauses below a score threshold etc.
Problem areas Most commonly encountered: Exploring large search spaces Positive-only data sets Noisy data
Search Space The hypothesis space is bounded by: –Maximum clause length –Size of background knowledge (BK) Techniques to reduce background knowledge include: Excluding redundant predicates –Feature subset selection –Inverse entailment Replacing existing BK with compound predicates (feature construction).
1.Randomly pick a positive example, p. 2.Define the space of possible clauses that could entail that example. —Generate the bottom clause, — contains all the literals defined in BK that could cover p. 3.Search this space. Progol and Aleph’s Approach Uses inverse entailment. (Progol and Aleph were originally C-Progol and P-Progol, I’m unclear on the precise development history)
Positive-only Learning Assume output completeness –Every other pattern in the example space is negative Bottom-up learning Clustering –e.g. Relational Distance Based Clustering
Positive-only Learning Assume output completeness –Every other pattern in the example space is negative Bottom-up learning Clustering –e.g. Relational Distance Based Clustering
Create artificial negative examples –Using constraints to represent artificial negatives –Use a relational distance measure Use Bayes theorem –Progol creates an SLP from the data set and uses this create random artificial negatives Positive-only Learning
Noisy Data Techniques to avoid over-fitting. –Pre-pruning: limit length of clauses learned –Post-pruning: generalise/merge clauses that have a small cover set. –Leniency: don’t insist on a perfect theory Embed the uncertainty into the learning mechanism –Stochastic Logic Programs –Fuzzy ILP
Numerical Reasoning A lot of ILP engines don’t handle numerical reasoning without help. e.g. bird(X):- number_of_legs(X,Y), lessthan(Y, 3). [Karolic & Bratko, 97] First-Order Regression (if possible) add predicates to the background knowledge [Anthony & Frisch, 97] Farm it out to another process [Srinivasan & Camacho, 99] Lazy evaluation
Inventing Predicates Some ILP engines can invent new predicates and add them to the existing BK. FOIL only uses extensional BK and so can’t use this method. e.g. Progol uses constraints to call a predicate invention routine. :- constraint(invent/2)? invent(P,X):- {complicated code that includes asserts}.
Applications Natural Language Processing –Part of speech tagging –Semantic parsing –Learning Language in Logic workshops Bioinformatics –Predicting toxicology (carcinogens) –Discovering rules governing the 3D topology of protein structures It doesn’t work very well for User Modelling.
Final Thoughts There are lots of different approaches to ILP. Well known ILP systems: –Progol (and Aleph) –TILDE –Claudien –WARMR –FOIL –Merlin
Further Reading ILP Theory and Methods Inverse Entailment and Progol Analogical Prediction? – probably the other paper instead. FOIL TILDE Stochastic Logic Programs Lazy Evaluation Interactive Theory Revision (Luc’s book)? Beginning chapters Foundations of ILP, Nienhuys-Cheng and de Wolf Claudien and Dlab Learnability – pick a paper by Dzeroski