Download presentation
Presentation is loading. Please wait.
Published byAlan Stanley Modified over 9 years ago
1
1 Learning the Structure of Markov Logic Networks Stanley Kok
2
2 Overview Introduction CLAUDIEN, CRFs Algorithm Evaluation Measure Clause Construction Search Strategies Speedup Techniques Experiments
3
3 Introduction Richardson & Domingoes (2004) learned MLN structure in two disjoint steps : Learn FO clauses with off-the-shelf ILP system (CLAUDIEN) Learn clause weights by optimizing pseudo- likelihood Develop algorithm : Learns FO clauses by directly optimizing pseudo-likelihood Fast enough Learns better structure than R&D, pure ILP, purely probabilistic and purely KB approaches
4
4 CLAUDIEN CLAUsal DIscovery ENgine Starts with trivially false clause Repeatedly refine current clauses by adding literals Adds clauses that satisfy min accuracy and coverage to KB true ) false m ) false f ) falseh ) false m^f ) false m ) h m ) f m^h ) false f ) hf ) mf^h ) false h ) fh ) m h ) m v f
5
5 CLAUDIEN language bias ´ clause template Refine handcrafted KB Example, Professor(P) ( AdvisedBy(S,P) in KB dlab_template(‘1-2:[Professor(P),Student(S)]<- AdvisedBy(S,P)’) Professor(P) v Student(S) ( AdvisedBy(S,P)
6
6 Conditional Random Fields Markov networks used to compute P(y|x) (McCallum2003) Model: Features, f k e.g. “current word is capitalized and next word is Inc” y1y1 y2y2 y3y3 y n-1 ynyn x 1,x 2,…,x n … IBM hired Alice…. Org PersonMisc
7
7 CRF – Feature Induction Set of atomic features (word=the, capitalized etc) Starts from empty CRF While convergence criteria is not met Create list of new features consisting of Atomic features Binary conjunctions of atomic features Conjunctions of atomic features with features already in model Evaluate gain in P(y|x) of adding each feature to model Add best K features to model (100s-1000s features)
8
8 Algorithm High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses = FindBestClauses(MLN) Search for, For each candidate clause c Compute gain evaluation measure of adding c to MLN Return k clauses with highest gain and create candidate clauses
9
9 Evaluation Measure Ideally use log-likelihood, but slow Recall: Value: Gradient:
10
10 Evaluation Measure Use pseudo-log-likelihood (R&D(2004)), but Undue weight to predicates with large # of groundings Recall: E.g.:
11
11 Evaluation Measure Use weighted pseudo-log-likelihood (WPLL) E.g.:
12
12 Algorithm High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses = FindBestClauses(MLN) Search for, For each candidate clause c Compute gain evaluation measure of adding c to MLN Return k clauses with highest gain and create candidate clauses
13
13 Clause Construction Add a literal (negative/positive) All possible ways variables of new literal can be shared with those of clause !Student(S) v AdvBy(S,P) Remove a literal (when refining MLN) Remove spurious conditions from rules !Student(S) v !YrInPgm(S,5) v TA(S,C) v TmpAdvBy(S,P)
14
14 Clause Construction Flip signs of literals (when refining MLN) Move literals on wrong side of implication !CseQtr(C1,Q1) v !CseQtr(C2,Q2) v !SameCse(C1,C2) v !SameQtr(Q1,Q2) Beginning of algorithm Expensive, optional Limit # of distinct variables to restrict search space
15
15 Algorithm High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses = FindBestClauses(MLN) Search for, For each candidate clause c Compute gain evaluation measure of adding c to MLN Return k clauses with highest gain and create candidate clauses
16
16 Search Strategies Shortest-first search (SFS) 1.Find gain of each clause 2.Sort clauses by gain 3.Return top 5 with positive gain MLN wt1, !AdvBy(S,P) wt2, clause2 … 4.Add 5 clauses to MLN 5.Retrain wts of MLN candidate set 1.Find gain of each clause 2.Sort them by gain (Yikes! All length-2 clauses have gains · 0) !AdvBy(S,P) v Stu(S)
17
17 Shortest-First Search a.Extend 20 length-2 clause with highest gains b.Form new candidate set c.Keep 1000 clauses with highest gains MLN wt1, !AdvBy(S,P) wt2, clause2 … !AdvBy(S,P) v Stu(S) !AdvBy(S,P) v Stu(S) v Prof(P)
18
18 Shortest-First Search Shortest-first search (SFS) Repeat process Extend all length-2 clauses before length-3 ones MLN wt1, clause1 wt2, clause2 … candidate set How do you refine a non-empty MLN?
19
19 SFS – MLN Refinement a.Extend 20 length-2 clause with highest gains b.Extend length-2 clauses in MLN c.Remove a predicate from length-4 clauses in MLN d.Flip signs of length-3 clauses in MLN (optional) e.b,c,d replaces original clause in MLN MLN wt1, !AdvBy(S,P) wt2, clause2 … wtA, clauseA wtB, clauseB …
20
20 Search Strategies Beam Search 1.Keep a beam of 5 clauses with highest gains 2.Track best clause 3.Stop when best clause does not change after two consecutive iterations MLN wt1, clause1 wt2, clause2 … wtA, clauseA wtB, clauseB … How do you refine a non-empty MLN?
21
21 Algorithm High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses = FindBestClauses(MLN) Search for, For each candidate clause c Compute gain evaluation measure of adding c to MLN Return k clauses with highest gain and create candidate clauses
22
22 Difference from CRF – Feature Induction Set of atomic features (word=the, capitalized etc) Start from empty CRF While convergence criteria is not met Create list of new features consisting of Atomic features Binary conjunctions of atomic features Conjunctions of atomic features with features already in model Evaluate gain in P(y|x) of adding each feature to model Add best K features to model (100s-1000s features) We can refine non-empty MLN We use pseudo-likelihood; different optimizations. Applicable to arbitrary MN (not only linear chains) Maintain separate candidate set Add best ¼ 10s in model Flexible enough to fit in different search algms
23
23 Overview Introduction CLAUDIEN, CRFs Algorithm Evaluation Measure Clause Construction Search Strategies Speedup Techniques Experiments
24
24 Speedup Techniques Recall: FindBestClauses(MLN) Search for, and create candidate clauses For each candidate clause c Compute gain WPLL of adding c to MLN Return k clauses with highest gain LearnWeights(MLN+c) to optimize WPLL with L-BFGS L-BFGS computes value and gradient of WPLL Many candidate clauses; important to compute WPLL and its gradient efficiently
25
25 Speedup Techniques WPLL: Ignore clauses in which predicate does not appear in e.g. predicate l does not appear in clause 1 CLL
26
26 Speedup Techniques Gnd pred’s CLL affected by clauses that contains it Most clause weights do not significantly Most CLLs do not much Don’t have to recompute all CLLs Store WPLL and CLLs Recompute CLLs only if weights affecting it beyond some threshold Subtract old CLLs and add new CLLs to WPLL
27
27 Speedup Techniques WPLL is a sum over all ground predicates Estimate WPLL Uniformly sampling grounding of each FO predicates Sample x% of # groundings subject to min, max Extrapolate the average
28
28 Speedup Techniques WPLL and its gradient Compute # true groundings of a clause #P-complete problem Karp & Luby (1983)’s Monte-Carlo algorithm Gives estimate that is within of true value with probability 1- Draws samples of a clause Found that estimate converges faster than algorithm specifies Use convergence test (DeGroot & Schervish 2002) after every 100 samples Earlier termination
29
29 Speedup Techniques L-BFGS used to learn clause weights to optimize WPLL Two parameters: Max number of iterations Convergence Threshold Use smaller # max iterations and looser convergence thresholds When evaluating candidate clause’s gain Faster termination
30
30 Speedup Technique Lexicographic ordering on clauses Avoid redundant computations for clauses that are syntactically the same Don’t detect semantically identical but syntactically different clauses (NP-complete problem) Cache new clauses Avoid recomputation
31
31 Speedup Techniques Also used R&D04 techniques for WPLL gradient : Ignore predicates that don’t appear in i th formula Ignore ground formulas with truth value unaffected by changing truth value of any literal # true groundings of a clause computed once and cached
32
32 Overview Introduction CLAUDIEN, CRFs Algorithm Evaluation Measure Clause Construction Search Strategies Speedup Techniques Experiments
33
33 Experiments UW-CSE domain 22 predicates e.g. AdvisedBy, Professor etc 10 types e.g. Person, Course, Quarter etc Total # ground predicates about 4 million # true ground predicates (in DB) = 3212 Handcrafted KB with 94 formulas Each student has at most one advisor If a student is an author of a paper, so is her advisor etc
34
34 Experiments Cora domain 1295 citations to 112 CS research papers Author, Venue, Title, Year fields 5 Predicates viz. SameCitation, SameAuthor, SameVenue, SameTitle, SameYear Evidence Predicates e.g. WordsInCommonInTitle20%(title1, title2) Total # ground predicates about 5 million # true ground predicates (in DB) = 378,589 Handcrafted KB with 26 clauses If two citations same, then they have same authors, titles etc, and vice versa If two titles have many words in common, then they are the same, etc
35
35 Systems MLN(KB): weight-learning applied to handcrafted KB MLN(CL): structure-learning with CLAUDIEN; weight-learning MLN(KB+CL): structure-learning with CLAUDIEN, using the handcrafted KB as its language bias; weight-learning MLN(SLB): structure-learning with beam search, start from empty MLN MLN(KB+SLB): ditto, start from handcrafted KB MLN(SLB+KB): structure-learning with beam search, start from empty MLN, allow handcrafted clauses to be added in a first search step MLN(SLS): structure-learning with SFS, start from empty MLN
36
36 Systems CL: CLAUDIEN alone KB: handcrafted KB alone KB+CL: CLAUDIEN with KB as its language bias NB: naïve bayes BN: Bayesian networks
37
37 Methodology UW-CSE domain DB divided into 5 areas: ai, graphics, languages, systems, theory Leave-one-out testing by area Cora domain 5 different train-test splits Measured average CLL of the predicates average area under the precision-recall curve of the predicates (AUC)
38
38 Results MLN(SLS), MLN(SLB) better than MLN(CL), MLN(KB), CL, KB, NB, BN CLL (-ve) AUC
39
39 Results MLN(SLS), MLN(SLB) better than MLN(CL), MLN(KB), CL, KB, NB, BN CLL AUC CLL (-ve)
40
40 Results MLN(SLB+KB) better than MLN(KB+CL), KB+CL CLL (-ve) AUC
41
41 Results MLN(SLB+KB) better than MLN(KB+CL), KB+CL CLL AUC CLL (-ve)
42
42 Results MLN( ) does better than corresponding CLL (-ve) AUC
43
43 Results MLN( ) does better than corresponding CLL AUC CLL (-ve)
44
44 Results MLN(SLS) on UW-CSE; cluster of 15 dual- CPUs 2.8 GHz Pentium 4 machines With speed-ups: 5.3 hrs Without speed-ups: didn’t finish running in 24 hrs MLN(SLB) on UW-CSE; on single 2.8 GHz Pentium 4 machine With speedups: 8.8 hrs Without speedups: 13.7 hrs
45
45 Future Work Speeding up counting of # true groundings of clause Probabilistically bounding the loss in accuracy due to subsampling Probabilistic predicate discovery
46
46 Conclusion Develop algorithm : Learns FO clauses by directly optimizing pseudo-likelihood Fast enough Learns better structure than R&D, pure ILP, purely probabilistic and purely KB approaches
47
47 End Before This Slide
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.