Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.

Slides:

Advertisements

Similar presentations

Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.

Advertisements

V-Detector: A Negative Selection Algorithm Zhou Ji, advised by Prof. Dasgupta Computer Science Research Day The University of Memphis March 25, 2005.

1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005.

General Ideas in Inductive Logic Programming FOPI-RG - 19/4/05.

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

IBM Labs in Haifa © 2005 IBM Corporation Adaptive Application of SAT Solving Techniques Ohad Shacham and Karen Yorav Presented by Sharon Barner.

Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수

Machine Learning Week 2 Lecture 1.

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:

Planning under Uncertainty

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

Neural Networks Marco Loog.

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

Learning From Data Chichang Jou Tamkang University.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Fitting. Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local –can’t tell whether.

Modeling Gene Interactions in Disease CS 686 Bioinformatics.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

1/24 Learning to Extract Genic Interactions Using Gleaner LLL05 Workshop, 7 August 2005 ICML 2005, Bonn, Germany Mark Goadrich, Louis Oliphant and Jude.

Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.

Genetic Algorithms and Ant Colony Optimisation

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, April 3, 2000 DingBing.

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

COMP3503 Intro to Inductive Modeling

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Introduction to ILP ILP = Inductive Logic Programming = machine learning  logic programming = learning with logic Introduced by Muggleton in 1992.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 7, 2001.

Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.

Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.

Machine learning in financial forecasting Haindrich Henrietta Vezér Evelin.

Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Fuzzy Systems Michael J. Watts

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.

Gleaning Relational Information from Biomedical Text Mark Goadrich Computer Sciences Department University of Wisconsin - Madison Joint Work with Jude.

For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?

First-Order Logic and Inductive Logic Programming.

Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Oct, 30, 2015 Slide credit: some slides adapted from Stuart.

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto Univ.)

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

Learning Ensembles of First- Order Clauses That Optimize Precision-Recall Curves Mark Goadrich Computer Sciences Department University of Wisconsin - Madison.

Data Mining and Decision Support

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

Frank DiMaio and Jude Shavlik Computer Sciences Department

Information Complexity Lower Bounds

Data Mining Lecture 11.

CSCI 5822 Probabilistic Models of Human and Machine Learning

Louis Oliphant and Jude Shavlik

BNFO 602 Phylogenetics Usman Roshan.

Mark Goadrich Computer Sciences Department

Lecture 14 Learning Inductive inference

Presentation transcript:

Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM Foundations and New Directions of Data Mining Workshop 19 November 2003

Rule-Based Learning Goal: Induce a rule (or rules) that explains ALL positive examples and NO negative examples positive examplesnegative examples

Inductive Logic Programming (ILP) Encode background knowledge in first-order logic as facts… containsBlock(ex1,block1A). containsBlock(ex1,block1B). is_red(block1A). is_square(block1A). is_blue(block1B). is_round(block1B). on_top_of(block1B,block1A). above(A,B) :- onTopOf(A,B) above(A,B) :- onTopOf(A,Z), above(Z,B). and logical relations …

Inductive Logic Programming (ILP) Covering algorithm applied to explain all data Choose some positive exampleGenerate best rule that covers this exampleRemove all examples covered by this ruleRepeat until every positive example is covered

Inductive Logic Programming (ILP) Saturate an example by writing everything true about it The saturation of an example is the bottom clause (  ) A C B positive(ex2) :- contains_block(ex2,block2A), contains_block(ex2,block2B), contains_block(ex2,block2C), isRed(block2A), isRound(block2A), isBlue(block2B), isRound(block2B), isBlue(block2C), isSquare(block2C), onTopOf(block2B,block2A), onTopOf(block2C,block2B), above(block2B,block2A), above(block2C,block2B), above(block2C,block2A). ex2

Inductive Logic Programming (ILP) Candidate clauses are generated by  choosing literals from   converting ground terms to variables Search through the space of candidate clauses using standard AI search algo Bottom clause ensures search finite Selected literals from  containsBlock(ex2,block2B) isRed(block2A) onTopOf(block2B,block2A) Candidate Clause positive(A) :- containsBlock(A,B), onTopOf(B,C), isRed(C).

ILP Time Complexity Time complexity of ILP systems depends on  Size of bottom clause |  |  Maximum clause length c  Number of examples | E |  Search algorithm Π O(|  | c | E |) for exhaustive search O(|  || E |) for greedy search Assumes constant-time clause evaluation!

Ideas in Speeding Up ILP Search algorithm improvements  Better heuristic functions, search strategy  Srinivasan’s (2000) random uniform sampling (consider O(1) candidate clauses) Faster clause evaluations  Evaluation time of a clause (on 1 example) exponential in number of variables  Clause reordering & optimizing (Blockeel et al 2002, Santos Costa et al 2003) Evaluation of a candidate still O(|E|)

A Faster Clause Evaluation Our idea: predict clause’s evaluation in O(1) time (i.e., independent of number of examples) Use multilayer feed-forward neural network to approximately score candidate clauses NN inputs specify bottom clause literals selected There is a unique input for every candidate clause in the search space

Neural Network Topology Selected literals from  containsBlock(ex2,block2B) isRed(block2A) onTopOf(block2B,block2A) 1 containsBlock(ex2,block2B) 1 onTopOf(block2B,block2A) 1 isRed(block2A) 0 isRound(block2A) predicted output Σ Candidate Clause positive(A) :- containsBlock(A,B), onTopOf(B,C), isRed(C).

Speeding Up ILP Trained neural network provides a tool for approximate evaluation in O(1) time Given enough examples (large |E|), approximate evaluation is free versus evaluation on data During ILP’s search over hypothesis space …  Approximately evaluate every candidate explored  Only evaluate a clause on data if it is “promising”  Adaptive Sampling – use real evaluations to improve approximation during search

When to Evaluate Approximated Clauses? Treat neural network-predicted score as a Gaussian distribution of true score Only evaluate clauses when there is sufficient likelihood it is the best seen so far, e.g. Best = 22 Pred = 18.9 Pred = 11.1 current hypothesis potential moves P(Best) = 0.03 don’t evaluate P(Best) = 0.24 evaluate ← clause scores → current best

Results Trained learning only on benchmark datasets  Carcinogenesis  Mutagenesis  Protein Metabolism  Nuclear Smuggling Clauses generated by random sampling Clause evaluation metric compression = posCovered – negCovered – length + 1 totalPositives 10-fold c.v. learning curves

Results

Future Work Test in an ILP system  Potential for speedup in datasets with many examples  Will inaccuracy hurt search? Space of Clauses Predicted Score The trained network defines a function over the space of candidate clauses We can use this function …  Extract concepts  Escape local maxima in heuristic search

Acknowledgements Funding provided by  NLM grant 1T15 LM  NLM grant 1R01 LM  DARPA EELD grant F