1 Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM Foundations and New Directions of Data Mining Workshop 19 November 2003

2 Rule-Based Learning Goal: Induce a rule (or rules) that explains ALL positive examples and NO negative examples positive examplesnegative examples

3 Inductive Logic Programming (ILP) Encode background knowledge in first-order logic as facts… containsBlock(ex1,block1A). containsBlock(ex1,block1B). is_red(block1A). is_square(block1A). is_blue(block1B). is_round(block1B). on_top_of(block1B,block1A). above(A,B) :- onTopOf(A,B) above(A,B) :- onTopOf(A,Z), above(Z,B). and logical relations …

4 Inductive Logic Programming (ILP) Covering algorithm applied to explain all data + + + + + + + + + + + - - - - - - - - - Choose some positive exampleGenerate best rule that covers this exampleRemove all examples covered by this ruleRepeat until every positive example is covered

5 Inductive Logic Programming (ILP) Saturate an example by writing everything true about it The saturation of an example is the bottom clause (  ) A C B positive(ex2) :- contains_block(ex2,block2A), contains_block(ex2,block2B), contains_block(ex2,block2C), isRed(block2A), isRound(block2A), isBlue(block2B), isRound(block2B), isBlue(block2C), isSquare(block2C), onTopOf(block2B,block2A), onTopOf(block2C,block2B), above(block2B,block2A), above(block2C,block2B), above(block2C,block2A). ex2

6 Inductive Logic Programming (ILP) Candidate clauses are generated by  choosing literals from   converting ground terms to variables Search through the space of candidate clauses using standard AI search algo Bottom clause ensures search finite Selected literals from  containsBlock(ex2,block2B) isRed(block2A) onTopOf(block2B,block2A) Candidate Clause positive(A) :- containsBlock(A,B), onTopOf(B,C), isRed(C).

7 ILP Time Complexity Time complexity of ILP systems depends on  Size of bottom clause |  |  Maximum clause length c  Number of examples | E |  Search algorithm Π O(|  | c | E |) for exhaustive search O(|  || E |) for greedy search Assumes constant-time clause evaluation!

8 Ideas in Speeding Up ILP Search algorithm improvements  Better heuristic functions, search strategy  Srinivasan’s (2000) random uniform sampling (consider O(1) candidate clauses) Faster clause evaluations  Evaluation time of a clause (on 1 example) exponential in number of variables  Clause reordering & optimizing (Blockeel et al 2002, Santos Costa et al 2003) Evaluation of a candidate still O(|E|)

9 A Faster Clause Evaluation Our idea: predict clause’s evaluation in O(1) time (i.e., independent of number of examples) Use multilayer feed-forward neural network to approximately score candidate clauses NN inputs specify bottom clause literals selected There is a unique input for every candidate clause in the search space

10 Neural Network Topology Selected literals from  containsBlock(ex2,block2B) isRed(block2A) onTopOf(block2B,block2A) 1 containsBlock(ex2,block2B) 1 onTopOf(block2B,block2A) 1 isRed(block2A) 0 isRound(block2A) predicted output Σ Candidate Clause positive(A) :- containsBlock(A,B), onTopOf(B,C), isRed(C).

11 Speeding Up ILP Trained neural network provides a tool for approximate evaluation in O(1) time Given enough examples (large |E|), approximate evaluation is free versus evaluation on data During ILP’s search over hypothesis space …  Approximately evaluate every candidate explored  Only evaluate a clause on data if it is “promising”  Adaptive Sampling – use real evaluations to improve approximation during search

12 When to Evaluate Approximated Clauses? Treat neural network-predicted score as a Gaussian distribution of true score Only evaluate clauses when there is sufficient likelihood it is the best seen so far, e.g. Best = 22 Pred = 18.9 Pred = 11.1 current hypothesis potential moves P(Best) = 0.03 don’t evaluate P(Best) = 0.24 evaluate ← clause scores → current best

13 Results Trained learning only on benchmark datasets  Carcinogenesis  Mutagenesis  Protein Metabolism  Nuclear Smuggling Clauses generated by random sampling Clause evaluation metric compression = posCovered – negCovered – length + 1 totalPositives 10-fold c.v. learning curves

14 Results

15 Future Work Test in an ILP system  Potential for speedup in datasets with many examples  Will inaccuracy hurt search? Space of Clauses Predicted Score The trained network defines a function over the space of candidate clauses We can use this function …  Extract concepts  Escape local maxima in heuristic search

16 Acknowledgements Funding provided by  NLM grant 1T15 LM007359-01  NLM grant 1R01 LM07050-01  DARPA EELD grant F30602-01-2-0571

