Markov Logic Networks Hao Wu Mariyam Khalid. Motivation.

Markov Logic Networks Hao Wu Mariyam Khalid

Motivation

How would we model this scenario?

Motivation How would we model this scenario? – Logical Approach

Motivation How would we model this scenario? – Logical Approach – Statistical Approach

First Order Logic Four types of symbols: Constants:concrete object in the domain (e.g., people: Anna, Bob) Variables:range over the objects in the domain Functions:Mapping from tuples of objects to objects (e.g., GrandpaOf) Predicates:relations among objects in the domain (e.g., Friends) or attributes of objects (e.g. Fired)

First Order Logic Four types of symbols: Constants:concrete object in the domain (e.g., people: Anna, Bob) Variables:range over the objects in the domain Functions:Mapping from tuples of objects to objects (e.g., GrandpaOf) Predicates:relations among objects in the domain (e.g., Friends) or attributes of objects (e.g. Fired) Logical connectives and quantifier:

First Order Logic

Advantages: Compact representation a wide variety of knowledge Flexible and modularly incorporate a wide range of domain knowledge

First Order Logic Advantages: Compact representation a wide variety of knowledge Flexible and modularly incorporate a wide range of domain knowledge Disadvantages: No possibility to handle uncertainty No handling of imperfection and contradictory knowledge

Markov Networks Set of variables: The distribution is given by: with as normalization factor and as the potential function

Markov Networks Representation as log-linear model:

Markov Networks Representation as log-linear model: In our case there will be only binary features: – Each feature corresponds to each possible state The weight is equal to the log of the potential:

Markov Networks Has playing Friend Plays Fired

Markov Networks

Whether an employee A can convince another employee B to play, depends on the liability of B. For a high liability of B there is a higher probability ( ω =4) than for a low liability ( ω =2)

Markov Networks

Advantages: Efficiently handling uncertainty Tolerant against imperfection and contradictory knowledge

Markov Network Advantages: Efficiently handling uncertainty Tolerant against imperfection and contradictory knowledge Disadvantages: Very complex networks for a wide variety of knowledge Difficult to incorporate a wide range of domain knowledge

Motivation

Ideally we want a framework that can incorporate the advantages of both

Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

Markov Logic Networks

Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

Markov Logic Networks Each formula matches one clique Each formula owns a weight that reflects the importance of this formula If a world violates one formula then it is less probable but not impossible

Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

Markov Networks Markov Logic Networks28 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) Constants: Alice (A) and Bob (B)

Markov Logic Network Markov Logic Networks29 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) Fired(B)Fired(A) Friends(x,y)Plays(x)Plays(y)ω True 3 FalseTrue0 FalseTrue 3 False True3 False0 TrueFalse 3 TrueFalse3 3 Plays(x)Fired(x)ω True 2 False0 True0 False 2

MAP/MPE Inference Giving evidences, find the most possible state.

MAP/MPE Inference Giving evidences, find the most possible state. Let x be evidence

MAP/MPE Inference Let x be evidence A weighted MaxSAT problem.

WalkSAT for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if all clauses satisfied then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes number of satisfied clauses return failure

MaxWalkSAT for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if ∑ weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes ∑ weights(sat. clauses) return failure, best solution found

LazySAT MaxWalkSAT may need a lot of memory.

LazySAT MaxWalkSAT may need a lot of memory. Most network are sparse. Exploit sparseness; ground clauses lazily

for i ← 1 to max-tries do active_atoms ← atoms in clauses unsatisfied by DB active_clauses ← clauses activated by active_atoms soln = random truth assignment to active_atoms for j ← 1 to max-flips do if ∑ weights(sat. clauses) ≥ threshold then return soln c ← random unsatisfied clause with probability p v f ← a randomly chosen variable from c else for each variable v in c do compute DeltaGain(v), using weighted_KB if v f  active_atoms v f ← v with highest DeltaGain(v) if v f  active_atoms then activate v f and add clauses activated by v f soln ← soln with v f flipped return failure, best soln found

Inference What is P(Formula1|Formula2,M L,C )

Inference However directly compute this equation is intractable in most of the cases

Inference First we need to construct a minimal network for each set of evidence.

Inference First we need to construct a minimal network for each set of evidence. network ← Ø queue ← query nodes repeat node ← dequeue(queue) add node to network if node not in evidence then add neighbors(node) to queue until queue = Ø

Example Markov Logic Networks43 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A)Fired(B)

Example Markov Logic Networks44 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

Example Markov Logic Networks45 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

Example Markov Logic Networks46 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

Example Markov Logic Networks47 Friends(A,A) Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo).

Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). Gibbs Sampling: state ← random truth assignment for i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of x P(F) ← fraction of states in which F is true

Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). But there is Deterministic dependencies htat can break MCMC

Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). But there is Deterministic dependencies htat can break MCMC MC-SAT: – Combine MCMC and WalkSat

Learning Learning weights Learning structure(formula)

Learning Weight Assumption: Closed world assumption Anything I didn’t see is false – Otherwise EM

Learning Weight: Generative Using Pseudo-likeihood:

Pseudo-likelihood

It is efficient but bad for long range dependency

Voted Perceptron w i ← 0 for t ← 1 to T do y MAP ← Viterbi(x) w i ← w i + η [count i (y Data ) – count i (y MAP )] return ∑ t w i / T w i ← 0 for t ← 1 to T do y MAP ← MaxWalkSAT(x) w i ← w i + η [count i (y Data ) – count i (y MAP )] return ∑ t w i / T

Learning Structure Structure can also be learned. Start with hand-coded KB Add/Remove literal, flip sign Using PL + Structure prior Search

Example Markov Logic Networks61 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A)Fired(B)

Alchemy Open Source software package developed by University of Washington

Alchemy:Example Name Entity Resolution: Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) InField(i+1,+f,c) f != f ’ => (!InField(i,+f,c) v !InField(i,+f ’,c)) Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i ’,c ’ ) ^ InField(i ’,+f,c ’ ) => SameField(+f,c,c ’ ) SameField(+f,c,c ’ ) SameCit(c,c ’ ) SameField(f,c,c ’ ) ^ SameField(f,c ’,c ” ) => SameField(f,c,c ” ) SameCit(c,c ’ ) ^ SameCit(c ’,c ” ) => SameCit(c,c ” )

Application Information extraction Entity resolution Web Mining Natural language processing Social network analysis And more

Application: Jointly Disambiguating and Clustering Concepts and Entities Disambiguation

Application: Jointly Disambiguating and Clustering Concepts and Entities Clustering

Application: Jointly Disambiguating and Clustering Concepts and Entities Jointly Disambiguating and clustering

Application: Jointly Disambiguating and Clustering Concepts and Entities

Features – Local Features Prior probability (p3, f7) Relatedness (p4, f8, f11) Local context similarity(p5, f9) String edit distance (p6, f10)

Application: Jointly Disambiguating and Clustering Concepts and Entities Features – Local Features Prior probability (p3, f7) Relatedness (p4, f8, f11) Local context similarity(p5, f9) String edit distance (p6, f10) – Global Features Shared lemma (p7, f12) Head match (p8,f6) Acronyms (p8,f6) Cross-document n-gram feature (p9,f13)

Application: Jointly Disambiguating and Clustering Concepts and Entities System

Application: Jointly Disambiguating and Clustering Concepts and Entities Evaluation

Application: SRL Riedel & Meza-Ruiz (2008) Semantic F-score 74.59% Three stage: – 1. Predicate Identification – 2. Argument Identification – 3. Argument Classification

Application: SRL They used 5 hidden predicates: – Predicate Identification isPredicate(p) [p is position] Sense(p,e) [e is sense] – Argument Identification isArgument(a) [a is word] hasRole(p,a) – Argument Classification Role(p,a,r) [r is role]

Application: SRL Local formulae W(l2,l1) Lemma(a1,l1) W(l2,l3) hasRole(p,a1) hasRole(p,a2) Lemma(a1,l1)

Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Example:

Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Also some soft constraints:

Application: SRL Global formulae hasRole(p,a1) isArg(a1) Role(p,a1,r1) hasRole(p,a2) isArg(a2) Role(p,a1,r2) isPredicate(p) sense(p,e)

Application: SRL They compare five model: ModelWSJBrownTrain TimeTest Time Full75.72%65.38%25h24m Up76.96%63.86%11h14m Down73.48%59.34%22h23m Isolated60.49%48.12%11h14m Structural74.93%64.23%22h33m

Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Example:

Markov Logic Networks Hao Wu Mariyam Khalid. Motivation.

Similar presentations

Presentation on theme: "Markov Logic Networks Hao Wu Mariyam Khalid. Motivation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Markov Logic Networks Hao Wu Mariyam Khalid. Motivation.

Similar presentations

Presentation on theme: "Markov Logic Networks Hao Wu Mariyam Khalid. Motivation."— Presentation transcript:

Similar presentations

About project

Feedback