Frank DiMaio and Jude Shavlik Computer Sciences Department

Slides:



Advertisements
Similar presentations
General Ideas in Inductive Logic Programming FOPI-RG - 19/4/05.
Advertisements

Slides from: Doug Gray, David Poole
Greedy best-first search Use the heuristic function to rank the nodes Search strategy –Expand node with lowest h-value Greedily trying to find the least-cost.
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
RIPPER Fast Effective Rule Induction
For Wednesday Read chapter 19, sections 1-3 No homework.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
Analysis of Classification-based Error Functions Mike Rimer Dr. Tony Martinez BYU Computer Science Dept. 18 March 2006.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
1/24 Learning to Extract Genic Interactions Using Gleaner LLL05 Workshop, 7 August 2005 ICML 2005, Bonn, Germany Mark Goadrich, Louis Oliphant and Jude.
Data Mining to Aid Beam Angle Selection for IMRT Stuart Price-University of Maryland Bruce Golden- University of Maryland Edward Wasil- American University.
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Artificial Neural Networks
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Evolving a Sigma-Pi Network as a Network Simulator by Justin Basilico.
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.
Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Hung X. Nguyen and Matthew Roughan The University of Adelaide, Australia SAIL: Statistically Accurate Internet Loss Measurements.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
An Artificial Neural Network Approach to Surface Waviness Prediction in Surface Finishing Process by Chi Ngo ECE/ME 539 Class Project.
Alice E. Smith and Mehmet Gulsen Department of Industrial Engineering
Learning Ensembles of First- Order Clauses That Optimize Precision-Recall Curves Mark Goadrich Computer Sciences Department University of Wisconsin - Madison.
Data Mining and Decision Support
Evolving RBF Networks via GP for Estimating Fitness Values using Surrogate Models Ahmed Kattan Edgar Galvan.
A Simulation-Based Study of Overlay Routing Performance CS 268 Course Project Andrey Ermolinskiy, Hovig Bayandorian, Daniel Chen.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Evolutionary Computation Evolving Neural Network Topologies.
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
Chapter 3 Solving problems by searching. Search We will consider the problem of designing goal-based agents in observable, deterministic, discrete, known.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Transformation: Normalization
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Feedforward Networks
Artificial Neural Networks
The Gradient Descent Algorithm
Sofus A. Macskassy Fetch Technologies
Presented by: Dr Beatriz de la Iglesia
Real Neurons Cell structures Cell body Dendrites Axon
Classification with Perceptrons Reading:
Intelligent Information System Lab
Artificial Intelligence Problem solving by searching CSC 361
Data Mining Lecture 11.
Analysis and design of algorithm
Generalization ..
Machine Learning Today: Reading: Maria Florina Balcan
Announcements Homework 3 due today (grace period through Friday)
Objective of This Course
Louis Oliphant and Jude Shavlik
Mark Goadrich Computer Sciences Department
Neural Networks Geoff Hulten.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Neural networks (1) Traditional multi-layer perceptrons
Lecture 9 Administration Heuristic search, continued
CMSC 471 Fall 2011 Class #4 Tue 9/13/11 Uninformed Search
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Learning an Approximation to Inductive Logic Programming Clause Evaluation Frank DiMaio and Jude Shavlik Computer Sciences Department University of Wisconsin - Madison USA Inductive Logic Programming 8 September 2004

Motivation Given bottom clause  |E| examples maximum clause length c ILP’s runtime assuming constant-time clause evaluation O( ||c |E| ) for exhaustive search O( || |E| ) for greedy search

Motivation Evaluation time of a clause on 1 example exponential in # variables (Dantsin et al 2001) Many clause evaluations in datasets with long bottom clauses, long maximum clause length, or many examples Result: long running time

ILP Time Complexity Search algorithm improvements Better heuristic functions, search strategy Random uniform sampling (Srinivasan, 2000) Stochastic search (Rückart & Kramer, 2003)

ILP Time Complexity Faster clause evaluations Clause reordering & optimizing (Blockeel et al 2002, Santos Costa et al 2003) Stochastic matching (Sebag et al, 2000) Sampling the training examples Evaluation of a candidate still O(|E|)

Outline Bottom clause and ILP search space Learning a fast approximation to the clause evaluation function Using the clause evaluation function approximation to speed up ILP

Bottom Clause Given background knowledge as facts and relations in first-order logic A C B onTop(blockB,blockA,ex2). onTop(blockC,blockB,ex2). above(A,B,C) :- onTopOf(A,B,C). above(A,B,C) :- onTopOf(A,Z,C), above(Z,B,C). Generate example’s bottom clause () by saturating that example (Muggleton, 1995)  is the complete set all fully ground literals connected to example

Bottom Clause onTop(blockB,blockA,ex2). onTop (blockC,blockB,ex2). above(A,B,C) :- onTopOf(A,B,C). above(A,B,C) :- onTopOf(A,Z,C), above(Z,B,C). positive(ex) :- onTop(blockB,blockA,ex2), onTop(blockC,blockB,ex2), onTop(blockB,blockA,ex2), onTop(blockC,blockB,ex2), above(blockB,blockA,ex2), above(blockB,blockA,ex2), above(blockC,blockB,ex2), above(blockC,blockB,ex2), above(blockC,blockA,ex2). above(blockC,blockA,ex2).

Building Candidate Hypotheses positive(E). positive(E) :- onTopOf(A,B,E), above(B,C,E). positive(ex2) :- onTopOf(blockB,blockA,ex2), onTopOf(blockC,blockB,ex2), above(blockB,blockA,ex2), above(blockC,blockB,ex2), ...

A Faster Clause Evaluation Our idea: predict clause’s evaluation in O(1) time (i.e., independent of number of examples) Use multilayer feed-forward neural network to approximately score candidate clauses NN inputs specify bottom clause literals selected There is a unique input for every candidate clause in the search space

Neural Network Topology Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) isRound(blockA) isRound(blockB) Candidate Clause positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C).

Neural Network Topology Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) 1 isRound(blockA) containsBlock(ex2,blockB) isRound(blockB) 1 onTopOf(blockB,blockA) isRed(blockA) Candidate Clause 1 isRound(blockA) positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C). isBlue(blockB)

Neural Network Topology Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) isRound(blockA) 1 count(containsBlock) isRound(blockB) 1 count(onTopOf) count(isRed) Candidate Clause 2 positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C). count(isRound)

Neural Network Topology Selected literals from  containsBlock(ex2,blockB) onTopOf(blockB,blockA) isRound(blockA) isRound(blockB) 5 length 3 number of variables Candidate Clause 3 number of shared variables positive(A) :- containsBlock(A,B), onTopOf(B,C), isRound(B), isRound(C).

Neural Network Topology containsBlock(ex2,blockB) 1 Σ Predicted Positive Cover Predicted Negative Cover onTopOf(block2B,blockA) 1 isRed(blockA) isRound(blockA) 1 isBlue(blockB) … count(containsBlock) 1 count(onTopOf) 1 count(isRed) count(isRound) 2 … length 5 number of variables 3 number of shared variables 3

Experiments Trained (clause → score) on benchmark datasets Carcinogenesis Mutagenesis Protein Metabolism Nuclear Smuggling Clauses generated by uniform random sampling Clause evaluation metric compression = posCovered – negCovered – length + 1 totalPositives 10-fold cross-validation learning curves

Results

Why not just use a fraction of examples? We compare squared error of estimating scores with trained network estimating scores using subset of examples

Learning vs. Sampling

Using the Trained Network Rapidly explore search space Explore network-defined surface Extract concepts from trained network

Online Training Algorithm Begin with initial burn-in training When new clauses are evaluated on actual data, yielding I/O pair <C,[P,N]> insert <C,[P,N]> into recent_cache if one of top 100 clauses seen so far insert <C,[P,N]> sorted into best_cache At regular interval train net on recent_cache for fixed number of epochs train net on best_cache for fixed number of epochs

1. Rapidly explore search space O(1) clause evaluation tool Whenever a clause evaluation is needed, approximate on network Before expanding network-approximated clause, evaluate against real data Behavior depends on underlying search Branch and bound – optimize order of evaluation A* (aleph’s default) – ignore non-promising clauses

1. Rapidly explore search space pos(A). pos(A) :- f(A,B). pos(A) :- g(A). current node pos(A). pos(A) :- f(A,B),g(A). pos(A) :- f(A,B),g(B). pos(A) :- f(A,B). 2.3NN open list

1. Rapidly explore search space pos(A). pos(A) :- f(A,B). pos(A) :- g(A). current node pos(A). pos(A) :- f(A,B),g(A). pos(A) :- f(A,B),g(B). pos(A) :- g(A). 3.7NN pos(A) :- f(A,B). 2.3NN open list

1. Rapidly explore search space pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) :- g(A). current node pos(A) :- f(A,B),g(A). pos(A) :- f(A,B),g(B). 3.7NN 2 pos(A) :- f(A,B) 2.3NN pos(A) :- g(A). 2 open list

1. Rapidly explore search space pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) :- f(A,B). current node pos(A) :- f(A,B),g(A). pos(A) :- f(A,B),g(B). 2.3NN 4 pos(A) :- g(A). 2 open list

1. Rapidly explore search space pos(A). pos(A) :- f(A,B). pos(A) :- g(A). current node pos(A) :- f(A,B),g(A). pos(A) :- f(A,B),g(B). pos(A) :- f(A,B) 4 pos(A) :- g(A). 2 open list

1. Rapidly explore search space pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) :- f(A,B). current node pos(A) :- f(A,B),g(A). pos(A) :- f(A,B),g(B). 4 pos(A) :- g(A). 2 open list

1. Rapidly explore search space pos(A). pos(A) :- f(A,B). pos(A) :- g(A). pos(A) :- f(A,B) current node pos(A) :- f(A,B),g(A). pos(A) :- f(A,B),g(B). 4 pos(A) :- f(A,B),g(A). 5.7NN pos(A) :- g(A). 2 pos(A) :- f(A,B),g(B). 1.6NN open list

2. Explore network-defined surface Trained network defines function over space of candidate clauses

2. Explore network-defined surface Explore this surface using stochastic gradient ascent Rapid random restarts (Zelezny et al, 2002) random clause generation short local search Use network-defined surface to make “intelligent” rapid random restarts (Boyan & Moore, 2000)

Algorithm Illustration Alternate searching network-defined surface exploring clause evaluation function surface Network approx. clause eval. fn. Clause evaluation fn. Candidate Clauses

3. Extract concepts from trained net Extract decision tree from trained neural network (Craven & Shavlik 1995) Predicate invention High-weight edges into single hidden unit Add invented predicates to background

Biased-RRR Results

Future Work Implement and test other uses (#1 and #3) for utilizing trained neural network Look at relative ranking of network predictions rather than squared error Rankprop concerned with correctly predicting ranking (Caruana et al, 1997) Approximation quality in phase transition? (Botta et al, 2003)

Conclusion Can learn to accurately estimate score of candidate clauses Several potential uses for speeding up ILP Helps scale ILP to ever larger (#ex’s, search space size) datasets

Acknowledgements NLM Grant 1T15 LM007359-01 US Air Force Grant F30602-01-2-0571 NLM Grant 1R01 LM07050-01