Frank DiMaio and Jude Shavlik Computer Sciences Department

Learning an Approximation to Inductive Logic Programming Clause Evaluation
Frank DiMaio and Jude Shavlik Computer Sciences Department University of Wisconsin - Madison USA Inductive Logic Programming 8 September 2004

Motivation Given bottom clause  |E| examples maximum clause length c ILP’s runtime assuming constant-time clause evaluation O( ||c |E| ) for exhaustive search O( || |E| ) for greedy search

Motivation Evaluation time of a clause on 1 example exponential in # variables (Dantsin et al 2001) Many clause evaluations in datasets with long bottom clauses, long maximum clause length, or many examples Result: long running time

ILP Time Complexity Search algorithm improvements
Better heuristic functions, search strategy Random uniform sampling (Srinivasan, 2000) Stochastic search (Rückart & Kramer, 2003)

ILP Time Complexity Faster clause evaluations
Clause reordering & optimizing (Blockeel et al 2002, Santos Costa et al 2003) Stochastic matching (Sebag et al, 2000) Sampling the training examples Evaluation of a candidate still O(|E|)

Outline Bottom clause and ILP search space
Learning a fast approximation to the clause evaluation function Using the clause evaluation function approximation to speed up ILP

Bottom Clause Given background knowledge as facts and relations in first-order logic A C B onTop(blockB,blockA,ex2). onTop(blockC,blockB,ex2). above(A,B,C) :- onTopOf(A,B,C). above(A,B,C) :- onTopOf(A,Z,C), above(Z,B,C). Generate example’s bottom clause () by saturating that example (Muggleton, 1995)  is the complete set all fully ground literals connected to example

Bottom Clause onTop(blockB,blockA,ex2). onTop (blockC,blockB,ex2).
above(A,B,C) :- onTopOf(A,B,C). above(A,B,C) :- onTopOf(A,Z,C), above(Z,B,C). positive(ex) :- onTop(blockB,blockA,ex2), onTop(blockC,blockB,ex2), onTop(blockB,blockA,ex2), onTop(blockC,blockB,ex2), above(blockB,blockA,ex2), above(blockB,blockA,ex2), above(blockC,blockB,ex2), above(blockC,blockB,ex2), above(blockC,blockA,ex2). above(blockC,blockA,ex2).

Building Candidate Hypotheses
positive(E). positive(E) : onTopOf(A,B,E), above(B,C,E). positive(ex2) :- onTopOf(blockB,blockA,ex2), onTopOf(blockC,blockB,ex2), above(blockB,blockA,ex2), above(blockC,blockB,ex2), ...

A Faster Clause Evaluation
Our idea: predict clause’s evaluation in O(1) time (i.e., independent of number of examples) Use multilayer feed-forward neural network to approximately score candidate clauses NN inputs specify bottom clause literals selected There is a unique input for every candidate clause in the search space

Neural Network Topology
Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) isRound(blockA) isRound(blockB) Candidate Clause positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C).

Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) 1 isRound(blockA) containsBlock(ex2,blockB) isRound(blockB) 1 onTopOf(blockB,blockA) isRed(blockA) Candidate Clause 1 isRound(blockA) positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C). isBlue(blockB)

Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) isRound(blockA) 1 count(containsBlock) isRound(blockB) 1 count(onTopOf) count(isRed) Candidate Clause 2 positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C). count(isRound)

Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) isRound(blockA) isRound(blockB) 5 length 3 number of variables Candidate Clause 3 number of shared variables positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C).

containsBlock(ex2,blockB) 1 Σ Predicted Positive Cover Predicted Negative Cover onTopOf(block2B,blockA) 1 isRed(blockA) isRound(blockA) 1 isBlue(blockB) … count(containsBlock) 1 count(onTopOf) 1 count(isRed) count(isRound) 2 … length 5 number of variables 3 number of shared variables 3

Experiments Trained (clause → score) on benchmark datasets
Carcinogenesis Mutagenesis Protein Metabolism Nuclear Smuggling Clauses generated by uniform random sampling Clause evaluation metric compression = posCovered – negCovered – length + 1 totalPositives 10-fold cross-validation learning curves

Results

Why not just use a fraction of examples?
We compare squared error of estimating scores with trained network estimating scores using subset of examples

Learning vs. Sampling

Using the Trained Network
Rapidly explore search space Explore network-defined surface Extract concepts from trained network

Online Training Algorithm
Begin with initial burn-in training When new clauses are evaluated on actual data, yielding I/O pair <C,[P,N]> insert <C,[P,N]> into recent_cache if one of top 100 clauses seen so far insert <C,[P,N]> sorted into best_cache At regular interval train net on recent_cache for fixed number of epochs train net on best_cache for fixed number of epochs

1. Rapidly explore search space
O(1) clause evaluation tool Whenever a clause evaluation is needed, approximate on network Before expanding network-approximated clause, evaluate against real data Behavior depends on underlying search Branch and bound – optimize order of evaluation A* (aleph’s default) – ignore non-promising clauses

pos(A). pos(A) :- f(A,B). pos(A) :- g(A). current node pos(A). pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). pos(A) : f(A,B). 2.3NN open list

pos(A). pos(A) :- f(A,B). pos(A) :- g(A). current node pos(A). pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). pos(A) : g(A). 3.7NN pos(A) : f(A,B). 2.3NN open list

pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) : g(A). current node pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). 3.7NN 2 pos(A) : f(A,B) 2.3NN pos(A) : g(A). 2 open list

pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) : f(A,B). current node pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). 2.3NN 4 pos(A) : g(A). 2 open list

pos(A). pos(A) :- f(A,B). pos(A) :- g(A). current node pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). pos(A) : f(A,B) 4 pos(A) : g(A). 2 open list

pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) : f(A,B). current node pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). 4 pos(A) : g(A). 2 open list

pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) : f(A,B) current node pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). 4 pos(A) : f(A,B),g(A). 5.7NN pos(A) : g(A). 2 pos(A) : f(A,B),g(B). 1.6NN open list

2. Explore network-defined surface
Trained network defines function over space of candidate clauses

2. Explore network-defined surface
Explore this surface using stochastic gradient ascent Rapid random restarts (Zelezny et al, 2002) random clause generation short local search Use network-defined surface to make “intelligent” rapid random restarts (Boyan & Moore, 2000)

Algorithm Illustration
Alternate searching network-defined surface exploring clause evaluation function surface Network approx. clause eval. fn. Clause evaluation fn. Candidate Clauses

3. Extract concepts from trained net
Extract decision tree from trained neural network (Craven & Shavlik 1995) Predicate invention High-weight edges into single hidden unit Add invented predicates to background

Biased-RRR Results

Future Work Implement and test other uses (#1 and #3) for utilizing trained neural network Look at relative ranking of network predictions rather than squared error Rankprop concerned with correctly predicting ranking (Caruana et al, 1997) Approximation quality in phase transition? (Botta et al, 2003)

Conclusion Can learn to accurately estimate score of candidate clauses
Several potential uses for speeding up ILP Helps scale ILP to ever larger (#ex’s, search space size) datasets

Acknowledgements NLM Grant 1T15 LM007359-01
US Air Force Grant F NLM Grant 1R01 LM

Frank DiMaio and Jude Shavlik Computer Sciences Department

Similar presentations

Presentation on theme: "Frank DiMaio and Jude Shavlik Computer Sciences Department"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Frank DiMaio and Jude Shavlik Computer Sciences Department

Similar presentations

Presentation on theme: "Frank DiMaio and Jude Shavlik Computer Sciences Department"— Presentation transcript:

Similar presentations

About project

Feedback