Markov Logic Networks Hao Wu Mariyam Khalid. Motivation.

Slides:



Advertisements
Similar presentations
Discriminative Training of Markov Logic Networks
Advertisements

Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.
CPSC 322, Lecture 30Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30 March, 25, 2015 Slide source: from Pedro Domingos UW.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Mar, 4, 2015 Slide credit: some slides adapted from Stuart.
Markov Logic Networks: Exploring their Application to Social Network Analysis Parag Singla Dept. of Computer Science and Engineering Indian Institute of.
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis Lecture)
Markov Logic Networks Instructor: Pedro Domingos.
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u
Markov Networks.
Efficient Weight Learning for Markov Logic Networks Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Unifying Logical and Statistical AI Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Jesse Davis, Stanley Kok,
Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson
Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.
SAT ∩ AI Henry Kautz University of Rochester. Outline Ancient History: Planning as Satisfiability The Future: Markov Logic.
Proof methods Proof methods divide into (roughly) two kinds: –Application of inference rules Legitimate (sound) generation of new sentences from old Proof.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Statistical Relational Learning Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Inference. Overview The MC-SAT algorithm Knowledge-based model construction Lazy inference Lifted inference.
Unifying Logical and Statistical AI Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Stanley Kok, Daniel Lowd,
Relational Models. CSE 515 in One Slide We will learn to: Put probability distributions on everything Learn them from data Do inference with them.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
Markov Logic: A Simple and Powerful Unification Of Logic and Probability Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint.
Learning, Logic, and Probability: A Unified View Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Stanley Kok, Matt.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Markov Logic Parag Singla Dept. of Computer Science University of Texas, Austin.
Markov Logic: A Unifying Language for Information and Knowledge Management Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint.
Machine Learning For the Web: A Unified View Pedro Domingos Dept. of Computer Science & Eng. University of Washington Includes joint work with Stanley.
Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.
Markov Logic And other SRL Approaches
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
First-Order Logic and Inductive Logic Programming.
1 The Wumpus Game StenchBreeze Stench Gold Breeze StenchBreeze Start  Breeze.
1 Markov Logic Stanley Kok Dept. of Computer Science & Eng. University of Washington Joint work with Pedro Domingos, Daniel Lowd, Hoifung Poon, Matt Richardson,
CPSC 322, Lecture 31Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 25, 2015 Slide source: from Pedro Domingos UW & Markov.
CPSC 322, Lecture 30Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30 Nov, 23, 2015 Slide source: from Pedro Domingos UW.
Markov Logic Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.
Logical Agents Chapter 7. Outline Knowledge-based agents Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules and theorem.
Markov Logic: A Representation Language for Natural Language Semantics Pedro Domingos Dept. Computer Science & Eng. University of Washington (Based on.
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
First Order Representations and Learning coming up later: scalability!
Scalable Statistical Relational Learning for NLP William Y. Wang William W. Cohen Machine Learning Dept and Language Technologies Inst. joint work with:
Probabilistic Reasoning Inference and Relational Bayesian Networks.
New Rules for Domain Independent Lifted MAP Inference
An Introduction to Markov Logic Networks in Knowledge Bases
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Markov Logic Networks for NLP CSCI-GA.2591
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 29
First-Order Logic and Inductive Logic Programming
Logic for Artificial Intelligence
Markov Networks.
Lifted First-Order Probabilistic Inference [de Salvo Braz, Amir, and Roth, 2005] Daniel Lowd 5/11/2005.
Learning Markov Networks
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
CS 188: Artificial Intelligence
Mostly pilfered from Pedro’s slides
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Markov Logic Networks Hao Wu Mariyam Khalid

Motivation

How would we model this scenario?

Motivation How would we model this scenario? – Logical Approach

Motivation How would we model this scenario? – Logical Approach – Statistical Approach

First Order Logic Four types of symbols: Constants:concrete object in the domain (e.g., people: Anna, Bob) Variables:range over the objects in the domain Functions:Mapping from tuples of objects to objects (e.g., GrandpaOf) Predicates:relations among objects in the domain (e.g., Friends) or attributes of objects (e.g. Fired)

First Order Logic Four types of symbols: Constants:concrete object in the domain (e.g., people: Anna, Bob) Variables:range over the objects in the domain Functions:Mapping from tuples of objects to objects (e.g., GrandpaOf) Predicates:relations among objects in the domain (e.g., Friends) or attributes of objects (e.g. Fired) Logical connectives and quantifier:

First Order Logic

Advantages: Compact representation a wide variety of knowledge Flexible and modularly incorporate a wide range of domain knowledge

First Order Logic Advantages: Compact representation a wide variety of knowledge Flexible and modularly incorporate a wide range of domain knowledge Disadvantages: No possibility to handle uncertainty No handling of imperfection and contradictory knowledge

Markov Networks Set of variables: The distribution is given by: with as normalization factor and as the potential function

Markov Networks Representation as log-linear model:

Markov Networks Representation as log-linear model: In our case there will be only binary features: – Each feature corresponds to each possible state The weight is equal to the log of the potential:

Markov Networks Has playing Friend Plays Fired

Markov Networks

Whether an employee A can convince another employee B to play, depends on the liability of B. For a high liability of B there is a higher probability ( ω =4) than for a low liability ( ω =2)

Markov Networks

Advantages: Efficiently handling uncertainty Tolerant against imperfection and contradictory knowledge

Markov Network Advantages: Efficiently handling uncertainty Tolerant against imperfection and contradictory knowledge Disadvantages: Very complex networks for a wide variety of knowledge Difficult to incorporate a wide range of domain knowledge

Motivation

Ideally we want a framework that can incorporate the advantages of both

Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

Markov Logic Networks

Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

Markov Logic Networks Each formula matches one clique Each formula owns a weight that reflects the importance of this formula If a world violates one formula then it is less probable but not impossible

Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

Markov Networks Markov Logic Networks28 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) Constants: Alice (A) and Bob (B)

Markov Logic Network Markov Logic Networks29 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) Fired(B)Fired(A) Friends(x,y)Plays(x)Plays(y)ω True 3 FalseTrue0 FalseTrue 3 False True3 False0 TrueFalse 3 TrueFalse3 3 Plays(x)Fired(x)ω True 2 False0 True0 False 2

MAP/MPE Inference Giving evidences, find the most possible state.

MAP/MPE Inference Giving evidences, find the most possible state. Let x be evidence

MAP/MPE Inference Let x be evidence A weighted MaxSAT problem.

WalkSAT for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if all clauses satisfied then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes number of satisfied clauses return failure

MaxWalkSAT for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if ∑ weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes ∑ weights(sat. clauses) return failure, best solution found

LazySAT MaxWalkSAT may need a lot of memory.

LazySAT MaxWalkSAT may need a lot of memory. Most network are sparse. Exploit sparseness; ground clauses lazily

for i ← 1 to max-tries do active_atoms ← atoms in clauses unsatisfied by DB active_clauses ← clauses activated by active_atoms soln = random truth assignment to active_atoms for j ← 1 to max-flips do if ∑ weights(sat. clauses) ≥ threshold then return soln c ← random unsatisfied clause with probability p v f ← a randomly chosen variable from c else for each variable v in c do compute DeltaGain(v), using weighted_KB if v f  active_atoms v f ← v with highest DeltaGain(v) if v f  active_atoms then activate v f and add clauses activated by v f soln ← soln with v f flipped return failure, best soln found

Inference What is P(Formula1|Formula2,M L,C )

Inference What is P(Formula1|Formula2,M L,C )

Inference However directly compute this equation is intractable in most of the cases

Inference First we need to construct a minimal network for each set of evidence.

Inference First we need to construct a minimal network for each set of evidence. network ← Ø queue ← query nodes repeat node ← dequeue(queue) add node to network if node not in evidence then add neighbors(node) to queue until queue = Ø

Example Markov Logic Networks43 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A)Fired(B)

Example Markov Logic Networks44 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

Example Markov Logic Networks45 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

Example Markov Logic Networks46 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

Example Markov Logic Networks47 Friends(A,A) Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

Example Markov Logic Networks48 Friends(A,A) Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

Example Markov Logic Networks49 Friends(A,A) Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo).

Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). Gibbs Sampling: state ← random truth assignment for i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of x P(F) ← fraction of states in which F is true

Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). But there is Deterministic dependencies htat can break MCMC

Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). But there is Deterministic dependencies htat can break MCMC MC-SAT: – Combine MCMC and WalkSat

Learning Learning weights Learning structure(formula)

Learning Weight Assumption: Closed world assumption Anything I didn’t see is false – Otherwise EM

Learning Weight: Generative Using Pseudo-likeihood:

Pseudo-likelihood

It is efficient but bad for long range dependency

Voted Perceptron w i ← 0 for t ← 1 to T do y MAP ← Viterbi(x) w i ← w i + η [count i (y Data ) – count i (y MAP )] return ∑ t w i / T w i ← 0 for t ← 1 to T do y MAP ← MaxWalkSAT(x) w i ← w i + η [count i (y Data ) – count i (y MAP )] return ∑ t w i / T

Learning Structure Structure can also be learned. Start with hand-coded KB Add/Remove literal, flip sign Using PL + Structure prior Search

Example Markov Logic Networks61 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A)Fired(B)

Alchemy Open Source software package developed by University of Washington

Alchemy:Example Name Entity Resolution: Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) InField(i+1,+f,c) f != f ’ => (!InField(i,+f,c) v !InField(i,+f ’,c)) Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i ’,c ’ ) ^ InField(i ’,+f,c ’ ) => SameField(+f,c,c ’ ) SameField(+f,c,c ’ ) SameCit(c,c ’ ) SameField(f,c,c ’ ) ^ SameField(f,c ’,c ” ) => SameField(f,c,c ” ) SameCit(c,c ’ ) ^ SameCit(c ’,c ” ) => SameCit(c,c ” )

Application Information extraction Entity resolution Web Mining Natural language processing Social network analysis And more

Application: Jointly Disambiguating and Clustering Concepts and Entities Disambiguation

Application: Jointly Disambiguating and Clustering Concepts and Entities Clustering

Application: Jointly Disambiguating and Clustering Concepts and Entities Jointly Disambiguating and clustering

Application: Jointly Disambiguating and Clustering Concepts and Entities

Features – Local Features Prior probability (p3, f7) Relatedness (p4, f8, f11) Local context similarity(p5, f9) String edit distance (p6, f10)

Application: Jointly Disambiguating and Clustering Concepts and Entities Features – Local Features Prior probability (p3, f7) Relatedness (p4, f8, f11) Local context similarity(p5, f9) String edit distance (p6, f10) – Global Features Shared lemma (p7, f12) Head match (p8,f6) Acronyms (p8,f6) Cross-document n-gram feature (p9,f13)

Application: Jointly Disambiguating and Clustering Concepts and Entities System

Application: Jointly Disambiguating and Clustering Concepts and Entities System

Application: Jointly Disambiguating and Clustering Concepts and Entities System

Application: Jointly Disambiguating and Clustering Concepts and Entities System

Application: Jointly Disambiguating and Clustering Concepts and Entities System

Application: Jointly Disambiguating and Clustering Concepts and Entities Evaluation

Application: Jointly Disambiguating and Clustering Concepts and Entities Evaluation

Application: SRL Riedel & Meza-Ruiz (2008) Semantic F-score 74.59% Three stage: – 1. Predicate Identification – 2. Argument Identification – 3. Argument Classification

Application: SRL They used 5 hidden predicates: – Predicate Identification isPredicate(p) [p is position] Sense(p,e) [e is sense] – Argument Identification isArgument(a) [a is word] hasRole(p,a) – Argument Classification Role(p,a,r) [r is role]

Application: SRL Local formulae W(l2,l1) Lemma(a1,l1) W(l2,l3) hasRole(p,a1) hasRole(p,a2) Lemma(a1,l1)

Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Example:

Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Also some soft constraints:

Application: SRL Global formulae hasRole(p,a1) isArg(a1) Role(p,a1,r1) hasRole(p,a2) isArg(a2) Role(p,a1,r2) isPredicate(p) sense(p,e)

Application: SRL They compare five model: ModelWSJBrownTrain TimeTest Time Full75.72%65.38%25h24m Up76.96%63.86%11h14m Down73.48%59.34%22h23m Isolated60.49%48.12%11h14m Structural74.93%64.23%22h33m

Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Example: