Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Learning the Structure of Markov Logic Networks Stanley Kok.

Similar presentations


Presentation on theme: "1 Learning the Structure of Markov Logic Networks Stanley Kok."— Presentation transcript:

1 1 Learning the Structure of Markov Logic Networks Stanley Kok

2 2 Overview  Introduction  CLAUDIEN, CRFs  Algorithm Evaluation Measure Clause Construction Search Strategies Speedup Techniques  Experiments

3 3 Introduction  Richardson & Domingoes (2004) learned MLN structure in two disjoint steps : Learn FO clauses with off-the-shelf ILP system (CLAUDIEN) Learn clause weights by optimizing pseudo- likelihood  Develop algorithm : Learns FO clauses by directly optimizing pseudo-likelihood Fast enough Learns better structure than R&D, pure ILP, purely probabilistic and purely KB approaches

4 4 CLAUDIEN  CLAUsal DIscovery ENgine  Starts with trivially false clause  Repeatedly refine current clauses by adding literals  Adds clauses that satisfy min accuracy and coverage to KB true ) false m ) false f ) falseh ) false m^f ) false m ) h m ) f m^h ) false f ) hf ) mf^h ) false h ) fh ) m h ) m v f

5 5 CLAUDIEN  language bias ´ clause template Refine handcrafted KB  Example, Professor(P) ( AdvisedBy(S,P) in KB dlab_template(‘1-2:[Professor(P),Student(S)]<- AdvisedBy(S,P)’) Professor(P) v Student(S) ( AdvisedBy(S,P)

6 6 Conditional Random Fields  Markov networks used to compute P(y|x) (McCallum2003)  Model:  Features, f k e.g. “current word is capitalized and next word is Inc” y1y1 y2y2 y3y3 y n-1 ynyn x 1,x 2,…,x n … IBM hired Alice…. Org PersonMisc

7 7 CRF – Feature Induction  Set of atomic features (word=the, capitalized etc)  Starts from empty CRF  While convergence criteria is not met Create list of new features consisting of  Atomic features  Binary conjunctions of atomic features  Conjunctions of atomic features with features already in model Evaluate gain in P(y|x) of adding each feature to model Add best K features to model (100s-1000s features)

8 8 Algorithm  High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses =   FindBestClauses(MLN) Search for, For each candidate clause c Compute gain evaluation measure of adding c to MLN Return k clauses with highest gain and create candidate clauses

9 9 Evaluation Measure  Ideally use log-likelihood, but slow Recall: Value: Gradient:

10 10 Evaluation Measure  Use pseudo-log-likelihood (R&D(2004)), but Undue weight to predicates with large # of groundings Recall: E.g.:

11 11 Evaluation Measure  Use weighted pseudo-log-likelihood (WPLL) E.g.:

12 12 Algorithm  High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses =   FindBestClauses(MLN) Search for, For each candidate clause c Compute gain evaluation measure of adding c to MLN Return k clauses with highest gain and create candidate clauses

13 13 Clause Construction  Add a literal (negative/positive) All possible ways variables of new literal can be shared with those of clause !Student(S) v AdvBy(S,P)  Remove a literal (when refining MLN) Remove spurious conditions from rules !Student(S) v !YrInPgm(S,5) v TA(S,C) v TmpAdvBy(S,P)

14 14 Clause Construction  Flip signs of literals (when refining MLN) Move literals on wrong side of implication !CseQtr(C1,Q1) v !CseQtr(C2,Q2) v !SameCse(C1,C2) v !SameQtr(Q1,Q2) Beginning of algorithm Expensive, optional  Limit # of distinct variables to restrict search space

15 15 Algorithm  High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses =   FindBestClauses(MLN) Search for, For each candidate clause c Compute gain evaluation measure of adding c to MLN Return k clauses with highest gain and create candidate clauses

16 16 Search Strategies  Shortest-first search (SFS) 1.Find gain of each clause 2.Sort clauses by gain 3.Return top 5 with positive gain MLN wt1, !AdvBy(S,P) wt2, clause2 … 4.Add 5 clauses to MLN 5.Retrain wts of MLN candidate set 1.Find gain of each clause 2.Sort them by gain (Yikes! All length-2 clauses have gains · 0) !AdvBy(S,P) v Stu(S)

17 17 Shortest-First Search a.Extend 20 length-2 clause with highest gains b.Form new candidate set c.Keep 1000 clauses with highest gains MLN wt1, !AdvBy(S,P) wt2, clause2 … !AdvBy(S,P) v Stu(S) !AdvBy(S,P) v Stu(S) v Prof(P)

18 18 Shortest-First Search  Shortest-first search (SFS) Repeat process Extend all length-2 clauses before length-3 ones MLN wt1, clause1 wt2, clause2 … candidate set How do you refine a non-empty MLN?

19 19 SFS – MLN Refinement a.Extend 20 length-2 clause with highest gains b.Extend length-2 clauses in MLN c.Remove a predicate from length-4 clauses in MLN d.Flip signs of length-3 clauses in MLN (optional) e.b,c,d replaces original clause in MLN MLN wt1, !AdvBy(S,P) wt2, clause2 … wtA, clauseA wtB, clauseB …

20 20 Search Strategies  Beam Search 1.Keep a beam of 5 clauses with highest gains 2.Track best clause 3.Stop when best clause does not change after two consecutive iterations MLN wt1, clause1 wt2, clause2 … wtA, clauseA wtB, clauseB … How do you refine a non-empty MLN?

21 21 Algorithm  High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses =   FindBestClauses(MLN) Search for, For each candidate clause c Compute gain evaluation measure of adding c to MLN Return k clauses with highest gain and create candidate clauses

22 22 Difference from CRF – Feature Induction  Set of atomic features (word=the, capitalized etc)  Start from empty CRF  While convergence criteria is not met Create list of new features consisting of  Atomic features  Binary conjunctions of atomic features  Conjunctions of atomic features with features already in model Evaluate gain in P(y|x) of adding each feature to model Add best K features to model (100s-1000s features) We can refine non-empty MLN We use pseudo-likelihood; different optimizations. Applicable to arbitrary MN (not only linear chains) Maintain separate candidate set Add best ¼ 10s in model Flexible enough to fit in different search algms

23 23 Overview Introduction CLAUDIEN, CRFs Algorithm Evaluation Measure Clause Construction Search Strategies Speedup Techniques  Experiments

24 24 Speedup Techniques  Recall: FindBestClauses(MLN) Search for, and create candidate clauses For each candidate clause c Compute gain WPLL of adding c to MLN Return k clauses with highest gain  LearnWeights(MLN+c) to optimize WPLL with L-BFGS L-BFGS computes value and gradient of WPLL  Many candidate clauses; important to compute WPLL and its gradient efficiently

25 25 Speedup Techniques  WPLL:   Ignore clauses in which predicate does not appear in e.g. predicate l does not appear in clause 1 CLL

26 26 Speedup Techniques  Gnd pred’s CLL affected by clauses that contains it  Most clause weights do not  significantly Most CLLs do not much  Don’t have to recompute all CLLs Store WPLL and CLLs Recompute CLLs only if weights affecting it  beyond some threshold Subtract old CLLs and add new CLLs to WPLL

27 27 Speedup Techniques  WPLL is a sum over all ground predicates  Estimate WPLL Uniformly sampling grounding of each FO predicates  Sample x% of # groundings subject to min, max Extrapolate the average

28 28 Speedup Techniques  WPLL and its gradient Compute # true groundings of a clause #P-complete problem  Karp & Luby (1983)’s Monte-Carlo algorithm Gives estimate that is within  of true value with probability 1- Draws samples of a clause  Found that estimate converges faster than algorithm specifies Use convergence test (DeGroot & Schervish 2002) after every 100 samples Earlier termination

29 29 Speedup Techniques  L-BFGS used to learn clause weights to optimize WPLL  Two parameters: Max number of iterations Convergence Threshold  Use smaller # max iterations and looser convergence thresholds When evaluating candidate clause’s gain Faster termination

30 30 Speedup Technique  Lexicographic ordering on clauses Avoid redundant computations for clauses that are syntactically the same Don’t detect semantically identical but syntactically different clauses (NP-complete problem)  Cache new clauses Avoid recomputation

31 31 Speedup Techniques  Also used R&D04 techniques for WPLL gradient : Ignore predicates that don’t appear in i th formula Ignore ground formulas with truth value unaffected by changing truth value of any literal # true groundings of a clause computed once and cached

32 32 Overview Introduction CLAUDIEN, CRFs Algorithm Evaluation Measure Clause Construction Search Strategies Speedup Techniques  Experiments

33 33 Experiments  UW-CSE domain 22 predicates e.g. AdvisedBy, Professor etc 10 types e.g. Person, Course, Quarter etc Total # ground predicates about 4 million # true ground predicates (in DB) = 3212 Handcrafted KB with 94 formulas  Each student has at most one advisor  If a student is an author of a paper, so is her advisor etc

34 34 Experiments  Cora domain 1295 citations to 112 CS research papers Author, Venue, Title, Year fields 5 Predicates viz. SameCitation, SameAuthor, SameVenue, SameTitle, SameYear Evidence Predicates e.g.  WordsInCommonInTitle20%(title1, title2) Total # ground predicates about 5 million # true ground predicates (in DB) = 378,589 Handcrafted KB with 26 clauses  If two citations same, then they have same authors, titles etc, and vice versa  If two titles have many words in common, then they are the same, etc

35 35 Systems  MLN(KB): weight-learning applied to handcrafted KB  MLN(CL): structure-learning with CLAUDIEN; weight-learning  MLN(KB+CL): structure-learning with CLAUDIEN, using the handcrafted KB as its language bias; weight-learning  MLN(SLB): structure-learning with beam search, start from empty MLN  MLN(KB+SLB): ditto, start from handcrafted KB  MLN(SLB+KB): structure-learning with beam search, start from empty MLN, allow handcrafted clauses to be added in a first search step  MLN(SLS): structure-learning with SFS, start from empty MLN

36 36 Systems  CL: CLAUDIEN alone  KB: handcrafted KB alone  KB+CL: CLAUDIEN with KB as its language bias  NB: naïve bayes  BN: Bayesian networks

37 37 Methodology  UW-CSE domain DB divided into 5 areas: ai, graphics, languages, systems, theory Leave-one-out testing by area  Cora domain 5 different train-test splits  Measured average CLL of the predicates average area under the precision-recall curve of the predicates (AUC)

38 38 Results  MLN(SLS), MLN(SLB) better than MLN(CL), MLN(KB), CL, KB, NB, BN CLL (-ve) AUC

39 39 Results  MLN(SLS), MLN(SLB) better than MLN(CL), MLN(KB), CL, KB, NB, BN CLL AUC CLL (-ve)

40 40 Results  MLN(SLB+KB) better than MLN(KB+CL), KB+CL CLL (-ve) AUC

41 41 Results  MLN(SLB+KB) better than MLN(KB+CL), KB+CL CLL AUC CLL (-ve)

42 42 Results  MLN( ) does better than corresponding CLL (-ve) AUC

43 43 Results  MLN( ) does better than corresponding CLL AUC CLL (-ve)

44 44 Results  MLN(SLS) on UW-CSE; cluster of 15 dual- CPUs 2.8 GHz Pentium 4 machines With speed-ups: 5.3 hrs Without speed-ups: didn’t finish running in 24 hrs  MLN(SLB) on UW-CSE; on single 2.8 GHz Pentium 4 machine With speedups: 8.8 hrs Without speedups: 13.7 hrs

45 45 Future Work  Speeding up counting of # true groundings of clause  Probabilistically bounding the loss in accuracy due to subsampling  Probabilistic predicate discovery

46 46 Conclusion  Develop algorithm : Learns FO clauses by directly optimizing pseudo-likelihood Fast enough Learns better structure than R&D, pure ILP, purely probabilistic and purely KB approaches

47 47 End Before This Slide


Download ppt "1 Learning the Structure of Markov Logic Networks Stanley Kok."

Similar presentations


Ads by Google