Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frank DiMaio and Jude Shavlik Computer Sciences Department

Similar presentations


Presentation on theme: "Frank DiMaio and Jude Shavlik Computer Sciences Department"— Presentation transcript:

1 Learning an Approximation to Inductive Logic Programming Clause Evaluation
Frank DiMaio and Jude Shavlik Computer Sciences Department University of Wisconsin - Madison USA Inductive Logic Programming 8 September 2004

2 Motivation Given bottom clause  |E| examples maximum clause length c ILP’s runtime assuming constant-time clause evaluation O( ||c |E| ) for exhaustive search O( || |E| ) for greedy search

3 Motivation Evaluation time of a clause on 1 example exponential in # variables (Dantsin et al 2001) Many clause evaluations in datasets with long bottom clauses, long maximum clause length, or many examples Result: long running time

4 ILP Time Complexity Search algorithm improvements
Better heuristic functions, search strategy Random uniform sampling (Srinivasan, 2000) Stochastic search (Rückart & Kramer, 2003)

5 ILP Time Complexity Faster clause evaluations
Clause reordering & optimizing (Blockeel et al 2002, Santos Costa et al 2003) Stochastic matching (Sebag et al, 2000) Sampling the training examples Evaluation of a candidate still O(|E|)

6 Outline Bottom clause and ILP search space
Learning a fast approximation to the clause evaluation function Using the clause evaluation function approximation to speed up ILP

7 Bottom Clause Given background knowledge as facts and relations in first-order logic A C B onTop(blockB,blockA,ex2). onTop(blockC,blockB,ex2). above(A,B,C) :- onTopOf(A,B,C). above(A,B,C) :- onTopOf(A,Z,C), above(Z,B,C). Generate example’s bottom clause () by saturating that example (Muggleton, 1995)  is the complete set all fully ground literals connected to example

8 Bottom Clause onTop(blockB,blockA,ex2). onTop (blockC,blockB,ex2).
above(A,B,C) :- onTopOf(A,B,C). above(A,B,C) :- onTopOf(A,Z,C), above(Z,B,C). positive(ex) :- onTop(blockB,blockA,ex2), onTop(blockC,blockB,ex2), onTop(blockB,blockA,ex2), onTop(blockC,blockB,ex2), above(blockB,blockA,ex2), above(blockB,blockA,ex2), above(blockC,blockB,ex2), above(blockC,blockB,ex2), above(blockC,blockA,ex2). above(blockC,blockA,ex2).

9 Building Candidate Hypotheses
positive(E). positive(E) : onTopOf(A,B,E), above(B,C,E). positive(ex2) :- onTopOf(blockB,blockA,ex2), onTopOf(blockC,blockB,ex2), above(blockB,blockA,ex2), above(blockC,blockB,ex2), ...

10 A Faster Clause Evaluation
Our idea: predict clause’s evaluation in O(1) time (i.e., independent of number of examples) Use multilayer feed-forward neural network to approximately score candidate clauses NN inputs specify bottom clause literals selected There is a unique input for every candidate clause in the search space

11 Neural Network Topology
Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) isRound(blockA) isRound(blockB) Candidate Clause positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C).

12 Neural Network Topology
Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) 1 isRound(blockA) containsBlock(ex2,blockB) isRound(blockB) 1 onTopOf(blockB,blockA) isRed(blockA) Candidate Clause 1 isRound(blockA) positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C). isBlue(blockB)

13 Neural Network Topology
Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) isRound(blockA) 1 count(containsBlock) isRound(blockB) 1 count(onTopOf) count(isRed) Candidate Clause 2 positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C). count(isRound)

14 Neural Network Topology
Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) isRound(blockA) isRound(blockB) 5 length 3 number of variables Candidate Clause 3 number of shared variables positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C).

15 Neural Network Topology
containsBlock(ex2,blockB) 1 Σ Predicted Positive Cover Predicted Negative Cover onTopOf(block2B,blockA) 1 isRed(blockA) isRound(blockA) 1 isBlue(blockB) count(containsBlock) 1 count(onTopOf) 1 count(isRed) count(isRound) 2 length 5 number of variables 3 number of shared variables 3

16 Experiments Trained (clause → score) on benchmark datasets
Carcinogenesis Mutagenesis Protein Metabolism Nuclear Smuggling Clauses generated by uniform random sampling Clause evaluation metric compression = posCovered – negCovered – length + 1 totalPositives 10-fold cross-validation learning curves

17 Results

18 Why not just use a fraction of examples?
We compare squared error of estimating scores with trained network estimating scores using subset of examples

19 Learning vs. Sampling

20 Using the Trained Network
Rapidly explore search space Explore network-defined surface Extract concepts from trained network

21 Online Training Algorithm
Begin with initial burn-in training When new clauses are evaluated on actual data, yielding I/O pair <C,[P,N]> insert <C,[P,N]> into recent_cache if one of top 100 clauses seen so far insert <C,[P,N]> sorted into best_cache At regular interval train net on recent_cache for fixed number of epochs train net on best_cache for fixed number of epochs

22 1. Rapidly explore search space
O(1) clause evaluation tool Whenever a clause evaluation is needed, approximate on network Before expanding network-approximated clause, evaluate against real data Behavior depends on underlying search Branch and bound – optimize order of evaluation A* (aleph’s default) – ignore non-promising clauses

23 1. Rapidly explore search space
pos(A). pos(A) :- f(A,B). pos(A) :- g(A). current node pos(A). pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). pos(A) : f(A,B). 2.3NN open list

24 1. Rapidly explore search space
pos(A). pos(A) :- f(A,B). pos(A) :- g(A). current node pos(A). pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). pos(A) : g(A). 3.7NN pos(A) : f(A,B). 2.3NN open list

25 1. Rapidly explore search space
pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) : g(A). current node pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). 3.7NN 2 pos(A) : f(A,B) 2.3NN pos(A) : g(A). 2 open list

26 1. Rapidly explore search space
pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) : f(A,B). current node pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). 2.3NN 4 pos(A) : g(A). 2 open list

27 1. Rapidly explore search space
pos(A). pos(A) :- f(A,B). pos(A) :- g(A). current node pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). pos(A) : f(A,B) 4 pos(A) : g(A). 2 open list

28 1. Rapidly explore search space
pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) : f(A,B). current node pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). 4 pos(A) : g(A). 2 open list

29 1. Rapidly explore search space
pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) : f(A,B) current node pos(A) : f(A,B),g(A). pos(A) : f(A,B),g(B). 4 pos(A) : f(A,B),g(A). 5.7NN pos(A) : g(A). 2 pos(A) : f(A,B),g(B). 1.6NN open list

30 2. Explore network-defined surface
Trained network defines function over space of candidate clauses

31 2. Explore network-defined surface
Explore this surface using stochastic gradient ascent Rapid random restarts (Zelezny et al, 2002) random clause generation short local search Use network-defined surface to make “intelligent” rapid random restarts (Boyan & Moore, 2000)

32 Algorithm Illustration
Alternate searching network-defined surface exploring clause evaluation function surface Network approx. clause eval. fn. Clause evaluation fn. Candidate Clauses

33 3. Extract concepts from trained net
Extract decision tree from trained neural network (Craven & Shavlik 1995) Predicate invention High-weight edges into single hidden unit Add invented predicates to background

34 Biased-RRR Results

35 Future Work Implement and test other uses (#1 and #3) for utilizing trained neural network Look at relative ranking of network predictions rather than squared error Rankprop concerned with correctly predicting ranking (Caruana et al, 1997) Approximation quality in phase transition? (Botta et al, 2003)

36 Conclusion Can learn to accurately estimate score of candidate clauses
Several potential uses for speeding up ILP Helps scale ILP to ever larger (#ex’s, search space size) datasets

37 Acknowledgements NLM Grant 1T15 LM007359-01
US Air Force Grant F NLM Grant 1R01 LM


Download ppt "Frank DiMaio and Jude Shavlik Computer Sciences Department"

Similar presentations


Ads by Google