Online Structure Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Department of Computer Science The University of Texas at Austin.

Slides:



Advertisements
Similar presentations
Artificial Intelligence
Advertisements

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Max-Margin Weight Learning for Markov Logic Networks
Joint Inference in Information Extraction Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)
Discriminative Training of Markov Logic Networks
Improving the Accuracy and Scalability of Discriminative Learning Methods for Markov Logic Networks Tuyen N. Huynh Adviser: Prof. Raymond J. Mooney PhD.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.
Jointly Identifying Temporal Relations with Markov Logic Katsumasa Yoshikawa †, Sebastian Riedel ‡, Masayuki Asahara †, Yuji Matsumoto † † Nara Institute.
Department of Computer Science The University of Texas at Austin Probabilistic Abduction using Markov Logic Networks Rohit J. Kate Raymond J. Mooney.
1 Learning Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Adbuctive Markov Logic for Plan Recognition Parag Singla & Raymond J. Mooney Dept. of Computer Science University of Texas, Austin.
Efficient Weight Learning for Markov Logic Networks Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.
School of Computing Science Simon Fraser University Vancouver, Canada.
1 Statistical Predicate Invention Stanley Kok Dept. of Computer Science and Eng. University of Washington Joint work with Pedro Domingos.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Recursive Random Fields Daniel Lowd University of Washington June 29th, 2006 (Joint work with Pedro Domingos)
Recursive Random Fields Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Relational Models. CSE 515 in One Slide We will learn to: Put probability distributions on everything Learn them from data Do inference with them.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
1 CILOG User Manual Bayesian Networks Seminar Sep 7th, 2006.
1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
Online Learning Algorithms
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Boosting Markov Logic Networks
On Bridging Simulation and Formal Verification Eugene Goldberg Cadence Research Labs (USA) VMCAI-2008, San Francisco, USA.
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.
IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Markov Logic And other SRL Approaches
Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Query Segmentation Using Conditional Random Fields Xiaohui and Huxia Shi York University KEYS’09 (SIGMOD Workshop) Presented by Jaehui Park,
Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From AAAI 2008 William Pentney, Department of Computer Science & Engineering University of.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Ensemble Methods in Machine Learning
John Lafferty Andrew McCallum Fernando Pereira
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
1 Bottom-Up Search and Transfer Learning in SRL Raymond J. Mooney University of Texas at Austin with acknowledgements to Lily Mihalkova Tuyen Huynh and.
Frank DiMaio and Jude Shavlik Computer Sciences Department
Lecture 7: Constrained Conditional Models
Summary of “Efficient Deep Learning for Stereo Matching”
Dan Roth Department of Computer Science
“Traditional” image segmentation
Presentation transcript:

Online Structure Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Department of Computer Science The University of Texas at Austin ECML-PKDD-2011, Athens, Greece

Large-scale structured/relational learning 2 D. McDermott and J. Doyle. Non-monotonic Reasoning I. Artificial Intelligence, 13: 41-72, Modern, clean, quiet, $750 up--BIG pool, parking, laundry, elevator. Open viewing SAT/SUN, 10am-6pm, at Avenue, corner East 17 St. Other times call first: Sam, Citeseer Citation segmentation [Peng & McCallum, 2004] Craigslist ad segmentation [Grenager et al., 2005] Modern, clean, quiet, $750 up--BIG pool, parking, laundry, elevator. Open viewing SAT/SUN, 10am-6pm, at Avenue, corner East 17 St. Other times call first: Sam,

Motivation 3  Markov Logic Networks (MLNs) [Richardson & Domingos, 2006] are an elegant and powerful formalism for handling complex structured/relational data.  All existing structure learning algorithms for MLNs are batch learning methods.  Effectively designed for problems that have a few “mega” examples.  Do not scale to problems with a large number of smaller structured examples.  No existing online structure learning algorithms for MLNs. The first online structure learner for MLNs

Outline 4  Motivation  Background  Markov Logic Networks  OSL: Online structure learning algorithm  Experiment Evaluation  Summary

Background 5

 An MLN is a weighted set of first-order formulas.  Larger weight indicates stronger belief that the clause should hold.  Probability of a possible world (a truth assignment to all ground atoms) x: Markov Logic Networks (MLNs) Weight of formula iNo. of true groundings of formula i in x [Richardson & Domingos, 2006] 10 InField(f,p1,c)  Next(p1,p2)  InField(f,p2,c) 5 Token(t,p,c)  IsInitial(t)  InField(Author,p,c) ˅ InField(Venue,p,c) 6

Existing structure learning methods for MLNs 7  Top-down approach:  MSL [Kok & Domingos, 2005], DSL [Biba et al., 2008]  Start from unit clauses and search for new clauses  Bottom-up approach:  BUSL [Mihalkova & Mooney, 2007], LHL [Kok & Domingos, 2009], LSM [Kok & Domingos, 2010]  Use data to generate candidate clauses

OSL: Online Structure Learner for MLNs 8

9 MLN Max-margin structure learning L 1 -regularized weight learning Online Structure Learner (OSL) xtxt ytyt yPtyPt New clauses New weights Old and new clauses

Max-margin structure learning 10

 Learn definite clauses:  Consider a relational example as a hypergraph: Nodes: constants Hyperedges: true ground atoms, connecting the nodes that are its arguments  Search in the hypergraph for paths that connect the arguments of a target literal. Alice JoanTom MaryFredAnn BobCarol Parent: Married: Uncle(Tom, Mary) Parent(Joan,Mary)  Parent(Alice,Joan)  Parent(Alice,Tom)  Uncle(Tom,Mary) Parent(x,y)  Parent(z,x)  Parent(z,w)  Uncle(w,y) 11 [Richards & Mooney, 1992] Relational pathfinding

Relational pathfinding (cont.) 12  We use a generalization of the relational pathfinding:  A path does not need to connect arguments of the target atom.  Any two consecutive atoms in a path must share at least one input/output argument.  Similar approach used in LHL [Kok & Domingos, 2009] and LSM [Kok & Domingos, 2010].  Can result in an intractable number of possible paths

Mode declarations [Muggleton, 1995] 13  A language bias to constrain the search for definite clauses.  A mode declaration specifies:  The number of appearances of a predicate in a clause.  Constraints on the types of arguments of a predicate.

Mode-guided relational pathfinding 14  Use mode declarations to constrain the search for paths in relational pathfinding:  Introduce a new mode declaration for paths, modep(r,p): r (recall number): a non-negative integer limiting the number of appearances of a predicate in a path to r can be 0, i.e don’t look for paths containing atoms of a particular predicate p: an atom whose arguments are: Input(+): bound argument, i.e must appear in some previous atom Output(-): can be free argument Don’t explore(.): don’t expand the search on this argument

Mode-guided relational pathfinding (cont.) 15  Example in citation segmentation: constrain the search space to paths connecting true ground atoms of two consecutive tokens  InField(field,position,citationID): the field label of the token at a position  Next(position,position): two positions are next to each other  Token(word,position,citationID): the word appears at a given position modep(2,InField(.,–,.)) modep(1,Next(–, –)) modep(2,Token(.,+,.))

Mode-guided relational pathfinding (cont.) 16 P09  { Token(To,P09,B2), Next(P08,P09), Next(P09,P10), LessThan(P01,P09) … } InField(Title,P09,B2) Wrong prediction Hypergraph {InField(Title,P09,B2),Token(To,P09,B2)} Paths

Mode-guided relational pathfinding (cont.) 17 P09  { Token(To,P09,B2), Next(P08,P09), Next(P09,P10), LessThan(P01,P09) … } InField(Title,P09,B2) Wrong prediction Hypergraph {InField(Title,P09,B2),Token(To,P09,B2)} {InField(Title,P09,B2),Token(To,P09,B2),Next(P08,P09)} Paths

Generalizing paths to clauses modec(InField(c,v,v)) modec(Token(c,v,v)) modec(Next(v,v)) … Modes {InField(Title,P09,B2),Token(To,P09,B2), Next(P08,P09),InField(Title,P08,B2)} … InField(Title,p1,c)  Token(To,p1,c)  Next(p2,p1)  InField(Title,p2,c) Paths Conjunctions C1: ¬InField(Title,p1,c) ˅ ¬Token(To,p1,c) ˅ ¬Next(p2,p1) ˅ ¬ InField(Title,p2,c) C2: InField(Title,p1,c) ˅ ¬Token(To,p1,c) ˅ ¬Next(p2,p1) ˅ ¬ InField(Title,p2,c) Token(To,p1,c)  Next(p2,p1)  InField(Title,p2,c)  InField(Title,p1,c) Clauses 18

L 1 -regularized weight learning 19  Many new clauses are added at each step and some of them may not be useful in the long run.  Use L 1 -regularization to zero out those clauses  Use a state-of-the-art online L 1 -regularized learning algorithm named ADAGRAD_FB [Duchi et.al., 2010], a L 1 -regularized adaptive subgradient method.

Experiment Evaluation 20  Investigate the performance of OSL on two scenarios:  Starting from a given MLN  Starting from an empty MLN  Task: natural language field segmentation  Datasets:  CiteSeer: 1,563 citations, 4 disjoint subsets corresponding 4 different research areas  Craigslist: 8,767 ads, but only 302 of them were labeled

Input MLNs 21  A simple linear chain CRF (LC_0):  Only use the current word as features  Transition rules between fields Next(p1,p2)  InField(+f1,p1,c)  InField(+f2,p2,c) Token(+w,p,c)  InField(+f,p,c)

Input MLNs (cont.) 22  Isolated segmentation model (ISM) [Poon & Domingos, 2007], a well-developed MLN for citation segmentation :  In addition to the current word feature, also has some features that based on words that appear before or after the current word  Only has transition rules within fields, but takes into account punctuations as field boundary: ¬HasPunc(p1,c)  InField(+f,p1,c)  Next(p1,p2)  InField(+f,p2,c) HasComma(p1,c)  InField(+f,p1,c)  Next(p1,p2)  InField(+f,p2,c)

Systems compared  ADAGRAD_FB: only do weight learning  OSL-M2: a fast version of OSL where the parameter minCountDiff is set to 2  OSL-M1: a slow version of OSL where the parameter minCountDiff is set to 1 23

Experimental setup 24  OSL: specify mode declarations to constrain the search space to paths connecting true ground atoms of two consecutive tokens:  A linear chain CRF: Features based on current, previous and following words Transition rules with respect to current, previous and following words  4-fold cross-validation  Average F 1

Average F 1 scores on CiteSeer 25

Average training time on CiteSeer 26

Some good clauses found by OSL on CiteSeer 27  OSL-M1-ISM:  The current token is a Title and is followed by a period then it is likely that the next token is in the Venue field  OSL-M1-Empty:  Consecutive tokens are usually in the same field InField(Title,p1,c)  FollowBy(PERIOD,p1,c)  Next(p1,p2)  InField(Venue,p2,c) Next(p1,p2)  InField(Author,p1,c)  InField(Author,p2,c) Next(p1,p2)  InField(Title,p1,c)  InField(Title,p2,c) Next(p1,p2)  InField(Venue,p1,c)  InField(Venue,p2,c)

Summary 28  The first online structure learner (OSL) for MLNs:  Can either enhance an existing MLN or learn an MLN from scratch.  Can handle problems with thousands of small structured training examples.  Outperforms existing algorithms on CiteSeer and Craigslist information extraction datasets.

Thank you! 29 Questions?