Download presentation
Presentation is loading. Please wait.
Published byIlene Webster Modified over 9 years ago
1
Markov Logic Networks Hao Wu Mariyam Khalid
2
Motivation
3
How would we model this scenario?
4
Motivation How would we model this scenario? – Logical Approach
5
Motivation How would we model this scenario? – Logical Approach – Statistical Approach
6
First Order Logic Four types of symbols: Constants:concrete object in the domain (e.g., people: Anna, Bob) Variables:range over the objects in the domain Functions:Mapping from tuples of objects to objects (e.g., GrandpaOf) Predicates:relations among objects in the domain (e.g., Friends) or attributes of objects (e.g. Fired)
7
First Order Logic Four types of symbols: Constants:concrete object in the domain (e.g., people: Anna, Bob) Variables:range over the objects in the domain Functions:Mapping from tuples of objects to objects (e.g., GrandpaOf) Predicates:relations among objects in the domain (e.g., Friends) or attributes of objects (e.g. Fired) Logical connectives and quantifier:
8
First Order Logic
9
Advantages: Compact representation a wide variety of knowledge Flexible and modularly incorporate a wide range of domain knowledge
10
First Order Logic Advantages: Compact representation a wide variety of knowledge Flexible and modularly incorporate a wide range of domain knowledge Disadvantages: No possibility to handle uncertainty No handling of imperfection and contradictory knowledge
11
Markov Networks Set of variables: The distribution is given by: with as normalization factor and as the potential function
12
Markov Networks Representation as log-linear model:
13
Markov Networks Representation as log-linear model: In our case there will be only binary features: – Each feature corresponds to each possible state The weight is equal to the log of the potential:
14
Markov Networks Has playing Friend Plays Fired
15
Markov Networks
16
Whether an employee A can convince another employee B to play, depends on the liability of B. For a high liability of B there is a higher probability ( ω =4) than for a low liability ( ω =2)
17
Markov Networks
18
Advantages: Efficiently handling uncertainty Tolerant against imperfection and contradictory knowledge
19
Markov Network Advantages: Efficiently handling uncertainty Tolerant against imperfection and contradictory knowledge Disadvantages: Very complex networks for a wide variety of knowledge Difficult to incorporate a wide range of domain knowledge
20
Motivation
21
Ideally we want a framework that can incorporate the advantages of both
22
Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want
23
Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want
24
Markov Logic Networks
25
Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want
26
Markov Logic Networks Each formula matches one clique Each formula owns a weight that reflects the importance of this formula If a world violates one formula then it is less probable but not impossible
27
Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want
28
Markov Networks Markov Logic Networks28 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) Constants: Alice (A) and Bob (B)
29
Markov Logic Network Markov Logic Networks29 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) Fired(B)Fired(A) Friends(x,y)Plays(x)Plays(y)ω True 3 FalseTrue0 FalseTrue 3 False True3 False0 TrueFalse 3 TrueFalse3 3 Plays(x)Fired(x)ω True 2 False0 True0 False 2
30
MAP/MPE Inference Giving evidences, find the most possible state.
31
MAP/MPE Inference Giving evidences, find the most possible state. Let x be evidence
32
MAP/MPE Inference Let x be evidence A weighted MaxSAT problem.
33
WalkSAT for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if all clauses satisfied then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes number of satisfied clauses return failure
34
MaxWalkSAT for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if ∑ weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes ∑ weights(sat. clauses) return failure, best solution found
35
LazySAT MaxWalkSAT may need a lot of memory.
36
LazySAT MaxWalkSAT may need a lot of memory. Most network are sparse. Exploit sparseness; ground clauses lazily
37
for i ← 1 to max-tries do active_atoms ← atoms in clauses unsatisfied by DB active_clauses ← clauses activated by active_atoms soln = random truth assignment to active_atoms for j ← 1 to max-flips do if ∑ weights(sat. clauses) ≥ threshold then return soln c ← random unsatisfied clause with probability p v f ← a randomly chosen variable from c else for each variable v in c do compute DeltaGain(v), using weighted_KB if v f active_atoms v f ← v with highest DeltaGain(v) if v f active_atoms then activate v f and add clauses activated by v f soln ← soln with v f flipped return failure, best soln found
38
Inference What is P(Formula1|Formula2,M L,C )
39
Inference What is P(Formula1|Formula2,M L,C )
40
Inference However directly compute this equation is intractable in most of the cases
41
Inference First we need to construct a minimal network for each set of evidence.
42
Inference First we need to construct a minimal network for each set of evidence. network ← Ø queue ← query nodes repeat node ← dequeue(queue) add node to network if node not in evidence then add neighbors(node) to queue until queue = Ø
43
Example Markov Logic Networks43 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A)Fired(B)
44
Example Markov Logic Networks44 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)
45
Example Markov Logic Networks45 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)
46
Example Markov Logic Networks46 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)
47
Example Markov Logic Networks47 Friends(A,A) Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)
48
Example Markov Logic Networks48 Friends(A,A) Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)
49
Example Markov Logic Networks49 Friends(A,A) Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)
50
Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo).
51
Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). Gibbs Sampling: state ← random truth assignment for i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of x P(F) ← fraction of states in which F is true
52
Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). But there is Deterministic dependencies htat can break MCMC
53
Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). But there is Deterministic dependencies htat can break MCMC MC-SAT: – Combine MCMC and WalkSat
54
Learning Learning weights Learning structure(formula)
55
Learning Weight Assumption: Closed world assumption Anything I didn’t see is false – Otherwise EM
56
Learning Weight: Generative Using Pseudo-likeihood:
57
Pseudo-likelihood
58
It is efficient but bad for long range dependency
59
Voted Perceptron w i ← 0 for t ← 1 to T do y MAP ← Viterbi(x) w i ← w i + η [count i (y Data ) – count i (y MAP )] return ∑ t w i / T w i ← 0 for t ← 1 to T do y MAP ← MaxWalkSAT(x) w i ← w i + η [count i (y Data ) – count i (y MAP )] return ∑ t w i / T
60
Learning Structure Structure can also be learned. Start with hand-coded KB Add/Remove literal, flip sign Using PL + Structure prior Search
61
Example Markov Logic Networks61 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A)Fired(B)
62
Alchemy Open Source software package developed by University of Washington
63
Alchemy:Example Name Entity Resolution: Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) InField(i+1,+f,c) f != f ’ => (!InField(i,+f,c) v !InField(i,+f ’,c)) Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i ’,c ’ ) ^ InField(i ’,+f,c ’ ) => SameField(+f,c,c ’ ) SameField(+f,c,c ’ ) SameCit(c,c ’ ) SameField(f,c,c ’ ) ^ SameField(f,c ’,c ” ) => SameField(f,c,c ” ) SameCit(c,c ’ ) ^ SameCit(c ’,c ” ) => SameCit(c,c ” )
64
Application Information extraction Entity resolution Web Mining Natural language processing Social network analysis And more
65
Application: Jointly Disambiguating and Clustering Concepts and Entities Disambiguation
66
Application: Jointly Disambiguating and Clustering Concepts and Entities Clustering
67
Application: Jointly Disambiguating and Clustering Concepts and Entities Jointly Disambiguating and clustering
68
Application: Jointly Disambiguating and Clustering Concepts and Entities
71
Features – Local Features Prior probability (p3, f7) Relatedness (p4, f8, f11) Local context similarity(p5, f9) String edit distance (p6, f10)
72
Application: Jointly Disambiguating and Clustering Concepts and Entities Features – Local Features Prior probability (p3, f7) Relatedness (p4, f8, f11) Local context similarity(p5, f9) String edit distance (p6, f10) – Global Features Shared lemma (p7, f12) Head match (p8,f6) Acronyms (p8,f6) Cross-document n-gram feature (p9,f13)
73
Application: Jointly Disambiguating and Clustering Concepts and Entities System
74
Application: Jointly Disambiguating and Clustering Concepts and Entities System
75
Application: Jointly Disambiguating and Clustering Concepts and Entities System
76
Application: Jointly Disambiguating and Clustering Concepts and Entities System
77
Application: Jointly Disambiguating and Clustering Concepts and Entities System
78
Application: Jointly Disambiguating and Clustering Concepts and Entities Evaluation
79
Application: Jointly Disambiguating and Clustering Concepts and Entities Evaluation
80
Application: SRL Riedel & Meza-Ruiz (2008) Semantic F-score 74.59% Three stage: – 1. Predicate Identification – 2. Argument Identification – 3. Argument Classification
81
Application: SRL They used 5 hidden predicates: – Predicate Identification isPredicate(p) [p is position] Sense(p,e) [e is sense] – Argument Identification isArgument(a) [a is word] hasRole(p,a) – Argument Classification Role(p,a,r) [r is role]
82
Application: SRL Local formulae W(l2,l1) Lemma(a1,l1) W(l2,l3) hasRole(p,a1) hasRole(p,a2) Lemma(a1,l1)
83
Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Example:
84
Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Also some soft constraints:
85
Application: SRL Global formulae hasRole(p,a1) isArg(a1) Role(p,a1,r1) hasRole(p,a2) isArg(a2) Role(p,a1,r2) isPredicate(p) sense(p,e)
86
Application: SRL They compare five model: ModelWSJBrownTrain TimeTest Time Full75.72%65.38%25h24m Up76.96%63.86%11h14m Down73.48%59.34%22h23m Isolated60.49%48.12%11h14m Structural74.93%64.23%22h33m
87
Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Example:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.