Presentation is loading. Please wait.

Presentation is loading. Please wait.

Markov Logic Networks Hao Wu Mariyam Khalid. Motivation.

Similar presentations


Presentation on theme: "Markov Logic Networks Hao Wu Mariyam Khalid. Motivation."— Presentation transcript:

1 Markov Logic Networks Hao Wu Mariyam Khalid

2 Motivation

3 How would we model this scenario?

4 Motivation How would we model this scenario? – Logical Approach

5 Motivation How would we model this scenario? – Logical Approach – Statistical Approach

6 First Order Logic Four types of symbols: Constants:concrete object in the domain (e.g., people: Anna, Bob) Variables:range over the objects in the domain Functions:Mapping from tuples of objects to objects (e.g., GrandpaOf) Predicates:relations among objects in the domain (e.g., Friends) or attributes of objects (e.g. Fired)

7 First Order Logic Four types of symbols: Constants:concrete object in the domain (e.g., people: Anna, Bob) Variables:range over the objects in the domain Functions:Mapping from tuples of objects to objects (e.g., GrandpaOf) Predicates:relations among objects in the domain (e.g., Friends) or attributes of objects (e.g. Fired) Logical connectives and quantifier:

8 First Order Logic

9 Advantages: Compact representation a wide variety of knowledge Flexible and modularly incorporate a wide range of domain knowledge

10 First Order Logic Advantages: Compact representation a wide variety of knowledge Flexible and modularly incorporate a wide range of domain knowledge Disadvantages: No possibility to handle uncertainty No handling of imperfection and contradictory knowledge

11 Markov Networks Set of variables: The distribution is given by: with as normalization factor and as the potential function

12 Markov Networks Representation as log-linear model:

13 Markov Networks Representation as log-linear model: In our case there will be only binary features: – Each feature corresponds to each possible state The weight is equal to the log of the potential:

14 Markov Networks Has playing Friend Plays Fired

15 Markov Networks

16 Whether an employee A can convince another employee B to play, depends on the liability of B. For a high liability of B there is a higher probability ( ω =4) than for a low liability ( ω =2)

17 Markov Networks

18 Advantages: Efficiently handling uncertainty Tolerant against imperfection and contradictory knowledge

19 Markov Network Advantages: Efficiently handling uncertainty Tolerant against imperfection and contradictory knowledge Disadvantages: Very complex networks for a wide variety of knowledge Difficult to incorporate a wide range of domain knowledge

20 Motivation

21 Ideally we want a framework that can incorporate the advantages of both

22 Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

23 Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

24 Markov Logic Networks

25 Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

26 Markov Logic Networks Each formula matches one clique Each formula owns a weight that reflects the importance of this formula If a world violates one formula then it is less probable but not impossible

27 Markov Logic Networks Description of the problem Translation in First-Order Logic Construction of a MLN- ”Template” Derive a concrete MLN for a given Set of Constants Compute whatever you want

28 Markov Networks Markov Logic Networks28 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) Constants: Alice (A) and Bob (B)

29 Markov Logic Network Markov Logic Networks29 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) Fired(B)Fired(A) Friends(x,y)Plays(x)Plays(y)ω True 3 FalseTrue0 FalseTrue 3 False True3 False0 TrueFalse 3 TrueFalse3 3 Plays(x)Fired(x)ω True 2 False0 True0 False 2

30 MAP/MPE Inference Giving evidences, find the most possible state.

31 MAP/MPE Inference Giving evidences, find the most possible state. Let x be evidence

32 MAP/MPE Inference Let x be evidence A weighted MaxSAT problem.

33 WalkSAT for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if all clauses satisfied then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes number of satisfied clauses return failure

34 MaxWalkSAT for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if ∑ weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes ∑ weights(sat. clauses) return failure, best solution found

35 LazySAT MaxWalkSAT may need a lot of memory.

36 LazySAT MaxWalkSAT may need a lot of memory. Most network are sparse. Exploit sparseness; ground clauses lazily

37 for i ← 1 to max-tries do active_atoms ← atoms in clauses unsatisfied by DB active_clauses ← clauses activated by active_atoms soln = random truth assignment to active_atoms for j ← 1 to max-flips do if ∑ weights(sat. clauses) ≥ threshold then return soln c ← random unsatisfied clause with probability p v f ← a randomly chosen variable from c else for each variable v in c do compute DeltaGain(v), using weighted_KB if v f  active_atoms v f ← v with highest DeltaGain(v) if v f  active_atoms then activate v f and add clauses activated by v f soln ← soln with v f flipped return failure, best soln found

38 Inference What is P(Formula1|Formula2,M L,C )

39 Inference What is P(Formula1|Formula2,M L,C )

40 Inference However directly compute this equation is intractable in most of the cases

41 Inference First we need to construct a minimal network for each set of evidence.

42 Inference First we need to construct a minimal network for each set of evidence. network ← Ø queue ← query nodes repeat node ← dequeue(queue) add node to network if node not in evidence then add neighbors(node) to queue until queue = Ø

43 Example Markov Logic Networks43 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A)Fired(B)

44 Example Markov Logic Networks44 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

45 Example Markov Logic Networks45 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

46 Example Markov Logic Networks46 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

47 Example Markov Logic Networks47 Friends(A,A) Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

48 Example Markov Logic Networks48 Friends(A,A) Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

49 Example Markov Logic Networks49 Friends(A,A) Friends(B,B) Friends(A,B) Friends(B,A) Plays(A) Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A) Fired(B)

50 Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo).

51 Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). Gibbs Sampling: state ← random truth assignment for i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of x P(F) ← fraction of states in which F is true

52 Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). But there is Deterministic dependencies htat can break MCMC

53 Inference In principle, P(F 1 |F 2,M L,C ) can be approximated using MCMC (Markov Chain Monte Carlo). But there is Deterministic dependencies htat can break MCMC MC-SAT: – Combine MCMC and WalkSat

54 Learning Learning weights Learning structure(formula)

55 Learning Weight Assumption: Closed world assumption Anything I didn’t see is false – Otherwise EM

56 Learning Weight: Generative Using Pseudo-likeihood:

57 Pseudo-likelihood

58 It is efficient but bad for long range dependency

59 Voted Perceptron w i ← 0 for t ← 1 to T do y MAP ← Viterbi(x) w i ← w i + η [count i (y Data ) – count i (y MAP )] return ∑ t w i / T w i ← 0 for t ← 1 to T do y MAP ← MaxWalkSAT(x) w i ← w i + η [count i (y Data ) – count i (y MAP )] return ∑ t w i / T

60 Learning Structure Structure can also be learned. Start with hand-coded KB Add/Remove literal, flip sign Using PL + Structure prior Search

61 Example Markov Logic Networks61 Friends(A,A)Friends(B,B) Friends(A,B) Friends(B,A) Plays(A)Plays(B) query: Fired(A) Evidence: Friends(A,B), Friends(B,A), Plays(B) Fired(A)Fired(B)

62 Alchemy Open Source software package developed by University of Washington

63 Alchemy:Example Name Entity Resolution: Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) InField(i+1,+f,c) f != f ’ => (!InField(i,+f,c) v !InField(i,+f ’,c)) Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i ’,c ’ ) ^ InField(i ’,+f,c ’ ) => SameField(+f,c,c ’ ) SameField(+f,c,c ’ ) SameCit(c,c ’ ) SameField(f,c,c ’ ) ^ SameField(f,c ’,c ” ) => SameField(f,c,c ” ) SameCit(c,c ’ ) ^ SameCit(c ’,c ” ) => SameCit(c,c ” )

64 Application Information extraction Entity resolution Web Mining Natural language processing Social network analysis And more

65 Application: Jointly Disambiguating and Clustering Concepts and Entities Disambiguation

66 Application: Jointly Disambiguating and Clustering Concepts and Entities Clustering

67 Application: Jointly Disambiguating and Clustering Concepts and Entities Jointly Disambiguating and clustering

68 Application: Jointly Disambiguating and Clustering Concepts and Entities

69

70

71 Features – Local Features Prior probability (p3, f7) Relatedness (p4, f8, f11) Local context similarity(p5, f9) String edit distance (p6, f10)

72 Application: Jointly Disambiguating and Clustering Concepts and Entities Features – Local Features Prior probability (p3, f7) Relatedness (p4, f8, f11) Local context similarity(p5, f9) String edit distance (p6, f10) – Global Features Shared lemma (p7, f12) Head match (p8,f6) Acronyms (p8,f6) Cross-document n-gram feature (p9,f13)

73 Application: Jointly Disambiguating and Clustering Concepts and Entities System

74 Application: Jointly Disambiguating and Clustering Concepts and Entities System

75 Application: Jointly Disambiguating and Clustering Concepts and Entities System

76 Application: Jointly Disambiguating and Clustering Concepts and Entities System

77 Application: Jointly Disambiguating and Clustering Concepts and Entities System

78 Application: Jointly Disambiguating and Clustering Concepts and Entities Evaluation

79 Application: Jointly Disambiguating and Clustering Concepts and Entities Evaluation

80 Application: SRL Riedel & Meza-Ruiz (2008) Semantic F-score 74.59% Three stage: – 1. Predicate Identification – 2. Argument Identification – 3. Argument Classification

81 Application: SRL They used 5 hidden predicates: – Predicate Identification isPredicate(p) [p is position] Sense(p,e) [e is sense] – Argument Identification isArgument(a) [a is word] hasRole(p,a) – Argument Classification Role(p,a,r) [r is role]

82 Application: SRL Local formulae W(l2,l1) Lemma(a1,l1) W(l2,l3) hasRole(p,a1) hasRole(p,a2) Lemma(a1,l1)

83 Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Example:

84 Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Also some soft constraints:

85 Application: SRL Global formulae hasRole(p,a1) isArg(a1) Role(p,a1,r1) hasRole(p,a2) isArg(a2) Role(p,a1,r2) isPredicate(p) sense(p,e)

86 Application: SRL They compare five model: ModelWSJBrownTrain TimeTest Time Full75.72%65.38%25h24m Up76.96%63.86%11h14m Down73.48%59.34%22h23m Isolated60.49%48.12%11h14m Structural74.93%64.23%22h33m

87 Application: SRL Global formulae as structural constraints – Ensure consistency between all stage Example:


Download ppt "Markov Logic Networks Hao Wu Mariyam Khalid. Motivation."

Similar presentations


Ads by Google