Markov Logic: A Representation Language for Natural Language Semantics Pedro Domingos Dept. Computer Science & Eng. University of Washington (Based on.

Slides:

Advertisements

Similar presentations

Joint Inference in Information Extraction Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Advertisements

Discriminative Training of Markov Logic Networks

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

CPSC 322, Lecture 30Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30 March, 25, 2015 Slide source: from Pedro Domingos UW.

Exact Inference in Bayes Nets

Markov Logic Networks: Exploring their Application to Social Network Analysis Parag Singla Dept. of Computer Science and Engineering Indian Institute of.

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis Lecture)

Markov Logic Networks Instructor: Pedro Domingos.

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

Markov Logic: Combining Logic and Probability Parag Singla Dept. of Computer Science & Engineering Indian Institute of Technology Delhi.

Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u

Markov Networks.

Efficient Weight Learning for Markov Logic Networks Daniel Lowd University of Washington (Joint work with Pedro Domingos)

Unifying Logical and Statistical AI Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Jesse Davis, Stanley Kok,

Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson

Markov Logic Networks Hao Wu Mariyam Khalid. Motivation.

Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.

School of Computing Science Simon Fraser University Vancouver, Canada.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

Statistical Relational Learning Pedro Domingos Dept. of Computer Science & Eng. University of Washington.

CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

Inference. Overview The MC-SAT algorithm Knowledge-based model construction Lazy inference Lifted inference.

Unifying Logical and Statistical AI Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Stanley Kok, Daniel Lowd,

Relational Models. CSE 515 in One Slide We will learn to: Put probability distributions on everything Learn them from data Do inference with them.

Markov Logic Networks: A Unified Approach To Language Processing Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with.

Markov Logic: A Simple and Powerful Unification Of Logic and Probability Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint.

Learning, Logic, and Probability: A Unified View Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Stanley Kok, Matt.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington.

Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.

Pedro Domingos Dept. of Computer Science & Eng.

Markov Logic Parag Singla Dept. of Computer Science University of Texas, Austin.

Markov Logic: A Unifying Language for Information and Knowledge Management Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint.

Machine Learning For the Web: A Unified View Pedro Domingos Dept. of Computer Science & Eng. University of Washington Includes joint work with Stanley.

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Markov Logic And other SRL Approaches

Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.

Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

First-Order Logic and Inductive Logic Programming.

1 Markov Logic Stanley Kok Dept. of Computer Science & Eng. University of Washington Joint work with Pedro Domingos, Daniel Lowd, Hoifung Poon, Matt Richardson,

CPSC 322, Lecture 31Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 25, 2015 Slide source: from Pedro Domingos UW & Markov.

John Lafferty Andrew McCallum Fernando Pereira

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

CPSC 422, Lecture 32Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 32 Nov, 27, 2015 Slide source: from Pedro Domingos UW & Markov.

Inference Algorithms for Bayes Networks

CPSC 322, Lecture 30Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30 Nov, 23, 2015 Slide source: from Pedro Domingos UW.

Markov Logic Pedro Domingos Dept. of Computer Science & Eng. University of Washington.

1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.

Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.

Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.

First Order Representations and Learning coming up later: scalability!

Scalable Statistical Relational Learning for NLP William Y. Wang William W. Cohen Machine Learning Dept and Language Technologies Inst. joint work with:

Probabilistic Reasoning Inference and Relational Bayesian Networks.

New Rules for Domain Independent Lifted MAP Inference

Lecture 7: Constrained Conditional Models

An Introduction to Markov Logic Networks in Knowledge Bases

Markov Logic Networks for NLP CSCI-GA.2591

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 29

First-Order Logic and Inductive Logic Programming

Logic for Artificial Intelligence

Markov Networks.

Lifted First-Order Probabilistic Inference [de Salvo Braz, Amir, and Roth, 2005] Daniel Lowd 5/11/2005.

Learning Markov Networks

Markov Networks.

Sequential Learning with Dependency Nets

Presentation transcript:

Markov Logic: A Representation Language for Natural Language Semantics Pedro Domingos Dept. Computer Science & Eng. University of Washington (Based on joint work with Stanley Kok, Matt Richardson and Parag Singla)

Overview Motivation Background Representation Inference Learning Applications Discussion

Motivation Natural language is characterized by Complex relational structure High uncertainty (ambiguity, imperfect knowledge) First-order logic handles relational structure Probability handles uncertainty Let’s combine the two

Markov Logic [Richardson & Domingos, 2006] Syntax: First-order logic + Weights Semantics: Templates for Markov nets Inference: Weighted satisfiability + MCMC Learning: Voted perceptron + ILP

Overview Motivation Background Representation Inference Learning Applications Discussion

Markov Networks Undirected graphical models B DC A Potential functions defined over cliques

Markov Networks Undirected graphical models B DC A Potential functions defined over cliques Weight of Feature iFeature i

First-Order Logic Constants, variables, functions, predicates E.g.: Anna, X, mother_of(X), friends(X, Y) Grounding: Replace all variables by constants E.g.: friends (Anna, Bob) World (model, interpretation): Assignment of truth values to all ground predicates

Overview Motivation Background Representation Inference Learning Applications Discussion

Markov Logic Networks A logical KB is a set of hard constraints on the set of possible worlds Let’s make them soft constraints: When a world violates a formula, It becomes less probable, not impossible Give each formula a weight (Higher weight  Stronger constraint)

Definition A Markov Logic Network (MLN) is a set of pairs (F, w) where F is a formula in first-order logic w is a real number Together with a set of constants, it defines a Markov network with One node for each grounding of each predicate in the MLN One feature for each grounding of each formula F in the MLN, with the corresponding weight w

Example: Friends & Smokers Cancer(A) Smokes(A)Smokes(B) Cancer(B) Suppose we have two constants: Anna (A) and Bob (B)

Example: Friends & Smokers Cancer(A) Smokes(A)Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Suppose we have two constants: Anna (A) and Bob (B)

Example: Friends & Smokers Cancer(A) Smokes(A)Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Suppose we have two constants: Anna (A) and Bob (B)

Example: Friends & Smokers Cancer(A) Smokes(A)Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Suppose we have two constants: Anna (A) and Bob (B)

More on MLNs MLN is template for ground Markov nets Typed variables and constants greatly reduce size of ground Markov net Functions, existential quantifiers, etc. MLN without variables = Markov network (subsumes graphical models)

Relation to First-Order Logic Infinite weights  First-order logic Satisfiable KB, positive weights  Satisfying assignments = Modes of distribution MLNs allow contradictions between formulas

Overview Motivation Background Representation Inference Learning Applications Discussion

MPE/MAP Inference Find most likely truth values of non-evidence ground atoms given evidence Apply weighted satisfiability solver (maxes sum of weights of satisfied clauses) MaxWalkSat algorithm [Kautz et al., 1997] Start with random truth assignment With prob p, flip atom that maxes weight sum; else flip random atom in unsatisfied clause Repeat n times Restart m times

Conditional Inference P(Formula|MLN,C) = ? MCMC: Sample worlds, check formula holds P(Formula1|Formula2,MLN,C) = ? If Formula2 = Conjunction of ground atoms First construct min subset of network necessary to answer query (generalization of KBMC) Then apply MCMC (or other)

Ground Network Construction Initialize Markov net to contain all query preds For each node in network Add node’s Markov blanket to network Remove any evidence nodes Repeat until done

Probabilistic Inference Recall Exact inference is #P-complete Conditioning on Markov blanket is easy: Gibbs sampling exploits this

Markov Chain Monte Carlo Gibbs Sampler 1. Start with an initial assignment to nodes 2. One node at a time, sample node given others 3. Repeat 4. Use samples to compute P(X) Apply to ground network Initialization: MaxWalkSat Can use multiple chains

Overview Motivation Background Representation Inference Learning Applications Discussion

Learning Data is a relational database Closed world assumption (if not: EM) Learning parameters (weights) Generatively: Pseudo-likelihood Discriminatively: Voted perceptron + MaxWalkSat Learning structure Generalization of feature induction in Markov nets Learn and/or modify clauses Inductive logic programming with pseudo- likelihood as the objective function

Generative Weight Learning Maximize likelihood (or posterior) Use gradient ascent Requires inference at each step (slow!) Feature count according to data Feature count according to model

Pseudo-Likelihood [Besag, 1975] Likelihood of each variable given its Markov blanket in the data Does not require inference at each step Widely used

Most terms not affected by changes in weights After initial setup, each iteration takes O(# ground predicates x # first-order clauses) Optimization where nsat i (x=v) is the number of satisfied groundings of clause i in the training data when x takes value v Parameter tying over groundings of same clause Maximize using L-BFGS [Liu & Nocedal, 1989]

Gradient of Conditional Log Likelihood # true groundings of formula in DB Expected # of true groundings – slow! Approximate expected count by MAP count Discriminative Weight Learning

 Used for discriminative training of HMMs  Expected count in gradient approximated by count in MAP state  MAP state found using Viterbi algorithm  Weights averaged over all iterations Voted Perceptron [Collins, 2002] initialize w i =0 for t=1 to T do find the MAP configuration using Viterbi  w i, =  * (training count – MAP count) end for

 HMM is special case of MLN  Expected count in gradient approximated by count in MAP state  MAP state found using MaxWalkSat  Weights averaged over all iterations Voted Perceptron for MLNs [Singla & Domingos, 2004] initialize w i =0 for t=1 to T do find the MAP configuration using MaxWalkSat  w i, =  * (training count – MAP count) end for

Overview Motivation Background Representation Inference Learning Applications Discussion

Applications to Date Entity resolution (Cora, BibServ) Information extraction for biology (won LLL-2005 competition) Probabilistic Cyc Link prediction Topic propagation in scientific communities Etc.

Entity Resolution Most logical systems make unique names assumption What if we don’t? Equality predicate: Same(A,B), or A = B Equality axioms Reflexivity, symmetry, transitivity For every unary predicate P: x1 = x2 => (P(x1) P(x2)) For every binary predicate R: x1 = x2  y1 = y2 => (R(x1,y1) R(x2,y2)) Etc. But in Markov logic these are soft and learnable Can also introduce reverse direction: R(x1,y1)  R(x2,y2)  x1 = x2 => y1 = y2 Surprisingly, this is all that’s needed

Example: Citation Matching

Markov Logic Formulation: Predicates Are two bibliography records the same? SameBib(b1,b2) Are two field values the same? SameAuthor(a1,a2) SameTitle(t1,t2) SameVenue(v1,v2) How similar are two field strings? Predicates for ranges of cosine TF-IDF score: TitleTFIDF.0(t1,t2) is true iff TF-IDF(t1,t2)=0 TitleTFIDF.2(a1,a2) is true iff 0 <TF-IDF(a1,a2) < 0.2 Etc.

Markov Logic Formulation: Formulas Unit clauses (defaults): ! SameBib(b1,b2) Two fields are same => Corresponding bib. records are same: Author(b1,a1)  Author(b2,a2)  SameAuthor(a1,a2) => SameBib(b1,b2) Two bib. records are same => Corresponding fields are same: Author(b1,a1)  Author(b2,a2)  SameBib(b1,b2) => SameAuthor(a1,a2) High similarity score => Two fields are same: TitleTFIDF.8(t1,t2) =>SameTitle(t1,t2) Transitive closure (not incorporated in experiments): SameBib(b1,b2)  SameBib(b2,b3) => SameBib(b1,b3) 25 predicates, 46 first-order clauses

What Does This Buy You? Objects are matched collectively Multiple types matched simultaneously Constraints are soft, and strengths can be learned from data Easy to add further knowledge Constraints can be refined from data Standard approach still embedded

Example RecordTitleAuthorVenue B1Object Identification using CRFsLinda StewartPKDD 04 B2Object Identification using CRFsLinda Stewart8 th PKDD B3Learning Boolean FormulasBill JohnsonPKDD 04 B4Learning of Boolean FormulasWilliam Johnson8 th PKDD Subset of a Bibliography Database

Standard Approach [Fellegi & Sunter, 1969] b1=b2 ? Sim(Linda Stewart, Linda Stewart) b3=b4 ? Author Title Venue Sim(PKDD 04, 8 th PKDD) Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Bill Johnson, William Johnson) Title Author Sim(Learning Boolean Formulas, Learning of Boolean Expressions) Sim(PKDD 04, 8 th PKDD) Venue record-match node field-similarity node (evidence node)

What’s Missing? b1=b2 ? Sim(Linda Stewart, Linda Stewart) b3=b4 ? Author Title Venue Sim(PKDD 04, 8 th PKDD) Sim(Object Identification using CRF, Object Identification using CRF) Sim(Bill Johnson, William Johnson) Title Author Sim(Learning Boolean Formulas, Learning of Boolean Expressions) Sim(PKDD 04, 8 th PKDD) Venue If from b1=b2, you infer that “PKDD 04” is same as “8th PKDD”, how can you use that to help figure out if b3=b4?

Merging the Evidence Nodes Author Still does not solve the problem. Why? b1=b2 ? Sim(Linda Stewart, Linda Stewart) b3=b4 ? Author Title Venue Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Bill Johnson, William Johnson) Title Author Sim(Learning Boolean Formulas, Learning of Boolean Expressions) Sim(PKDD 04, 8 th PKDD)

Introducing Field-Match Nodes b1=b2 ? Sim(Linda Stewart, Linda Stewart) b3=b4 ? Author Title Venue b1.T=b2.T? b1.V=b2.V? b3.V=b4.V? b3.A=b4.A? b3.T=b4.T? b1.A=b2.A? Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Bill Johnson, William Johnson) Title Author Sim(Learning Boolean Formulas, Learning of Boolean Expressions) field-match node Full representation in Collective Model Sim(PKDD 04, 8 th PKDD)

Flow of Information b1=b2 ? Sim(Linda Stewart, Linda Stewart) b3=b4 ? Author Title Venue b1.T=b2.T? b1.V=b2.V? b3.V=b4.V? b3.A=b4.A? b3.T=b4.T? b1.A=b2.A? Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Bill Johnson, William Johnson) Title Author Sim(Learning Boolean Formulas, Learning of Boolean Expressions) Sim(PKDD 04, 8 th PKDD)

Flow of Information b1=b2 ? Sim(Linda Stewart, Linda Stewart) b3=b4 ? Author Title Venue b1.T=b2.T? b1.V=b2.V? b3.V=b4.V? b3.A=b4.A? b3.T=b4.T? b1.A=b2.A? Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Bill Johnson, William Johnson) Title Author Sim(Learning Boolean Formulas, Learning of Boolean Expressions) Sim(PKDD 04, 8 th PKDD)

Flow of Information b1=b2 ? Sim(Linda Stewart, Linda Stewart) b3=b4 ? Author Title Venue b1.T=b2.T? b1.V=b2.V? b3.V=b4.V? b3.A=b4.A? b3.T=b4.T? b1.A=b2.A? Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Bill Johnson, William Johnson) Title Author Sim(Learning Boolean Formulas, Learning of Boolean Expressions) Sim(PKDD 04, 8 th PKDD)

Flow of Information b1=b2 ? Sim(Linda Stewart, Linda Stewart) b3=b4 ? Author Title Venue b1.T=b2.T? b1.V=b2.V? b3.V=b4.V? b3.A=b4.A? b3.T=b4.T? b1.A=b2.A? Sim(Object Identification using CRF, Object Identification using CRF) Sim(Bill Johnson, William Johnson) Title Author Sim(Learning Boolean Formulas, Learning of Boolean Expressions) Sim(PKDD 04, 8 th PKDD)

Flow of Information b1=b2 ? Sim(Linda Stewart, Linda Stewart) b3=b4 ? Author Title Venue b1.T=b2.T? b1.V=b2.V? b3.V=b4.V? b3.A=b4.A? b3.T=b4.T? b1.A=b2.A? Sim(Object Identification using CRFs, Object Identification using CRFs) Sim(Bill Johnson, William Johnson) Title Author Sim(Learning Boolean Formulas, Learning of Boolean Expressions) Sim(PKDD 04, 8 th PKDD)

Experiments Databases: Cora [McCallum et al., IRJ, 2000]: 1295 records, 132 papers BibServ.org [Richardson & Domingos, ISWC-03]: 21,805 records, unknown #papers Goal: De-duplicate bib.records, authors and venues Pre-processing: Form canopies [McCallum et al, KDD-00 ] Compared with naïve Bayes (standard method), etc. Measured area under precision-recall curve (AUC) Our approach wins across the board

Results: Matching Venues on Cora

Overview Motivation Background Representation Inference Learning Applications Discussion

Relation to Other Approaches RepresentationLogical language Probabilistic language Markov logicFirst-order logicMarkov nets RMNsConjunctive queries Markov nets PRMsFrame systemsBayes nets KBMCHorn clausesBayes nets SLPsHorn clausesBayes nets

Going Further First-order logic is not enough We can “Markovize” other representations in the same way Lots to do

Summary NLP involves relational structure, uncertainty Markov logic combines first-order logic and probabilistic graphical models Syntax: First-order logic + Weights Semantics: Templates for Markov networks Inference: MaxWalkSat + KBMC + MCMC Learning: Voted perceptron + PL + ILP Applications to date: Entity resolution, IE, etc. Software: Alchemy