Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa.

Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz

Modeling Relational Data with Markov Random Fields

3 Markov Random Fields  Probabilistic model for high-dimensional data:  The random variables represent the data, such as whether a person has an attribute or whether a link exists  The potentials score different configurations of the data  The weights scale the influence of different potentials

4 Markov Random Fields  Variables and potentials form graphical structure:

5 MRFs with Logic  One way to compactly define MRFs is with first-order logic, e.g., Markov logic networks  Use first-order logic to define templates for potentials -Ground out weighted rules over graph data -The truth table of each ground rule is a potential -Each first-order rule has a weight that becomes the potential’s Richardson and Domingos, 2006

6 MRFs with Logic  Let be a set of weighted logical rules, where each rule has the general form -Weights and sets and index variables  Equivalent clausal form:

7 MRFs with Logic  Probability distribution:

8  MAP (maximum a posteriori) inference seeks a most- probable assignment to the unobserved variables  MAP inference is  This MAX SAT problem is combinatorial and NP-hard! MAP Inference

MAX SAT Relaxation

10 Approximate Inference  View MAP inference as optimizing rounding probabilities  Expected score of a clause is a weighted noisy-or function:  Then expected total score is  But, is highly non-convex!

11 Approximate Inference  It is the products in the objective that make it non-convex  The expected score can be lower bounded using the relationship between arithmetic and harmonic means:  This leads to the lower bound Goemans and Williamson, 1994

12  So, we solve the linear program  If we set, a greedy rounding method will find a -optimal discrete solution  If we set, it improves to ¾-optimal Approximate Inference Goemans and Williamson, 1994

Local Consistency Relaxation

14 Local Consistency Relaxation  LCR is a popular technique for approximating MAP in MRFs -Often simply called linear programming (LP) relaxation -Dual decomposition solves dual to LCR objective  Idea: relax search over consistent marginals to simpler set  LCR admits fast message-passing algorithms, but no quality guarantees in general Wainwright and Jordan 2008

15 Local Consistency Relaxation : pseudomarginals over variable states : pseudomarginals over joint potential states Wainwright and Jordan 2008

Unifying the Relaxations

17 Analysis Bach et al. AISTATS 2015 j=1 j=2 j=3 and so on…

18 Analysis Bach et al. AISTATS 2015

19 Analysis  We can now analyze each potential’s parameterized subproblem in isolation:  Using the KKT conditions, we can find a simplified expression for each solution based on the parameters : Bach et al. AISTATS 2015

20 Analysis Bach et al. AISTATS 2015 Substitute back into outer objective

21 Analysis  Leads to simplified, projected LCR over : Bach et al. AISTATS 2015

22 Analysis Bach et al. AISTATS 2015 Local Consistency Relaxation MAX SAT Relaxation

23 Consequences  MAX SAT relaxation solved with choice of algorithms  Rounding guarantees apply to LCR! Bach et al. AISTATS 2015

Soft Logic and Continuous Values

25 Continuous Values  Continuous values can also be interpreted as similarities  Or degrees of truth. Łukasiewicz logic is a fuzzy logic for reasoning about imprecise concepts Bach et al. In Preparation

26 All Three are Equivalent Local Consistency Relaxation MAX SAT Relaxation Exact MAX SAT for Łukasiewicz logic Bach et al. In Preparation

27 Consequences  Exact MAX SAT for Łukasiewicz logic is equivalent to relaxed Boolean MAX SAT and local consistency relaxation for logical MRFs  So these scalable message-passing algorithms can also be used to reason about similarity, imprecise concepts, etc.! Bach et al. In Preparation

Hinge-Loss Markov Random Fields

29 Generalizing Relaxed MRFs  Relaxed, logic-based MRFs can reason about both discrete and continuous relational data scalably and accurately  Define a new distribution over continuous variables:  We can generalize this inference objective to be the energy of a new type of MRF that does even more Bach et al. NIPS 12, Bach et al. UAI 13

30 Generalizations  Arbitrary hinge-loss functions (not just logical clauses)  Hard linear constraints  Squared hinge losses Bach et al. NIPS 12, Bach et al. UAI 13

31 Hinge-Loss MRFs  Define hinge-loss MRFs by using this generalized objective as the energy function Bach et al. NIPS 12, Bach et al. UAI 13

32 HL-MRF Inference and Learning  MAP inference for HL-MRFs always a convex optimization -Highly scalable ADMM algorithm for MAP  Supervised Learning -No need to hand-tune weights -Learn from training data -Also highly scalable  Unsupervised and Semi-Supervised Learning -New learning algorithm that interleaves inference and parameter updates to cut learning time by as much as 90% (under review) Bach et al. NIPS 12, Bach et al. UAI 13 More in Bert’s talk

Probabilistic Soft Logic

34 Probabilistic Soft Logic (PSL)  Probabilistic programming language for defining HL-MRFs  PSL components -Predicates:relationships or properties -Atoms:(continuous) random variables -Rules:potentials

35  Example: Voter Identification ? $$ Tweet Status update 5.0 : Donates(A, “Republican”) -> Votes(A, “Republican”) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0.

36   Example: Voter Identification     0.8 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) spouse colleague spouse friend

37 Example: Voter Identification /* Predicate definitions */ Votes(Person, Party) Donates(Person, Party)(closed) Mentions(Person, Term)(closed) Colleague(Person, Person)(closed) Friend(Person, Person)(closed) Spouse(Person, Person)(closed) /* Local rules */ 5.0 : Donates(A, P) -> Votes(A, P) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) 0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)... /* Relational rules */ 1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) 0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0.

38 PSL Defines HL-MRFs /* Predicate definitions */ Votes(Person, Party) Donates(Person, Party)(closed) Mentions(Person, Term)(closed) Colleague(Person, Person)(closed) Friend(Person, Person)(closed) Spouse(Person, Person)(closed) /* Local rules */ 5.0 : Donates(A, P) -> Votes(A, P) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) 0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)... /* Relational rules */ 1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) 0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”)=1. +=

39 Open Source Implementation  PSL is implemented as an open source library and programming language interface  It’s ready to use for your next project  Some of the other groups already using PSL: -Jure Leskovec (Stanford)[West et al., TACL 14] -Dan Jurafsky (Stanford)[Li et al., ArXiv 14] -Ray Mooney (UT Austin)[Beltagy et al., ACL 14] -Kevin Murphy (Google)[Pujara et al., BayLearn 14]

40 Other PSL Topics  Not discussed here: -Smart grounding -Lazy inference -Distributed inference and learning  Future work: -Lifted inference -Generalized rounding guarantees

41 psl.cs.umd.edu

Applications

43 PSL Empirical Highlights  Compared with discrete MRFs:  Predicting MOOC outcomes via latent engagement (AuPR): Collective ClassificationTrust Prediction PSL81.8%0.7 sec.482 AuPR0.32 sec Discrete79.7%184.3 sec.441 AuPR212.36 sec TechWomen-CivilGenes Lecture Rank0.3330.5080.688 PSL-Direct0.3930.5650.757 PSL-Latent0.5460.8160.818 Bach et al. UAI 13, Ramesh et al. AAAI 14

44 PSL Empirical Highlights  Improved activity recognition in video:  Compared on drug-target interaction prediction: 5 Activities6 Activities HOG47.4%.481 F159.6%.582 F1 PSL + HOG59.8%.603 F179.3%.789 F1 ACD67.5%.678 F183.5%.835 F1 PSL + ACD69.2%.693 F186.0%.860 F1 London et al. CVPR WS 13, Fakhraei et al. TCBB 14 AuPRP@130 Perlman’s Method.564 ±.05.594 ±.04 PSL.617 ±.05.616 ±.04

Conclusion

46 Conclusion  HL-MRFs unite and generalize different ways of viewing fundamental AI problems, including MAX SAT, probabilistic graphical models, and fuzzy logic  PSL’s mix of expressivity and scalability makes it a general-purpose tool for network analysis, bioinformatics, NLP, computer vision, and more  Many exciting open problems, from algorithms, to theory, to applications

Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa.

Similar presentations

Presentation on theme: "Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa.

Similar presentations

Presentation on theme: "Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa."— Presentation transcript:

Similar presentations

About project

Feedback