Download presentation
Presentation is loading. Please wait.
Published byFaith Plair Modified over 9 years ago
1
Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Bert Huang Lise Getoor Maryland Virginia Tech UC Santa Cruz
2
Modeling Relational Data with Markov Random Fields
3
3 Markov Random Fields Probabilistic model for high-dimensional data: The random variables represent the data, such as whether a person has an attribute or whether a link exists The potentials score different configurations of the data The weights scale the influence of different potentials
4
4 Markov Random Fields Variables and potentials form graphical structure:
5
5 MRFs with Logic One way to compactly define MRFs is with first-order logic, e.g., Markov logic networks Use first-order logic to define templates for potentials -Ground out weighted rules over graph data -The truth table of each ground rule is a potential -Each first-order rule has a weight that becomes the potential’s Richardson and Domingos, 2006
6
6 MRFs with Logic Let be a set of weighted logical rules, where each rule has the general form -Weights and sets and index variables Equivalent clausal form:
7
7 MRFs with Logic Probability distribution:
8
8 MAP (maximum a posteriori) inference seeks a most- probable assignment to the unobserved variables MAP inference is This MAX SAT problem is combinatorial and NP-hard! MAP Inference
9
MAX SAT Relaxation
10
10 Approximate Inference View MAP inference as optimizing rounding probabilities Expected score of a clause is a weighted noisy-or function: Then expected total score is But, is highly non-convex!
11
11 Approximate Inference It is the products in the objective that make it non-convex The expected score can be lower bounded using the relationship between arithmetic and harmonic means: This leads to the lower bound Goemans and Williamson, 1994
12
12 So, we solve the linear program If we set, a greedy rounding method will find a -optimal discrete solution If we set, it improves to ¾-optimal Approximate Inference Goemans and Williamson, 1994
13
Local Consistency Relaxation
14
14 Local Consistency Relaxation LCR is a popular technique for approximating MAP in MRFs -Often simply called linear programming (LP) relaxation -Dual decomposition solves dual to LCR objective Idea: relax search over consistent marginals to simpler set LCR admits fast message-passing algorithms, but no quality guarantees in general Wainwright and Jordan 2008
15
15 Local Consistency Relaxation : pseudomarginals over variable states : pseudomarginals over joint potential states Wainwright and Jordan 2008
16
Unifying the Relaxations
17
17 Analysis Bach et al. AISTATS 2015 j=1 j=2 j=3 and so on…
18
18 Analysis Bach et al. AISTATS 2015
19
19 Analysis We can now analyze each potential’s parameterized subproblem in isolation: Using the KKT conditions, we can find a simplified expression for each solution based on the parameters : Bach et al. AISTATS 2015
20
20 Analysis Bach et al. AISTATS 2015 Substitute back into outer objective
21
21 Analysis Leads to simplified, projected LCR over : Bach et al. AISTATS 2015
22
22 Analysis Bach et al. AISTATS 2015 Local Consistency Relaxation MAX SAT Relaxation
23
23 Consequences MAX SAT relaxation solved with choice of algorithms Rounding guarantees apply to LCR! Bach et al. AISTATS 2015
24
Soft Logic and Continuous Values
25
25 Continuous Values Continuous values can also be interpreted as similarities Or degrees of truth. Łukasiewicz logic is a fuzzy logic for reasoning about imprecise concepts Bach et al. In Preparation
26
26 All Three are Equivalent Local Consistency Relaxation MAX SAT Relaxation Exact MAX SAT for Łukasiewicz logic Bach et al. In Preparation
27
27 Consequences Exact MAX SAT for Łukasiewicz logic is equivalent to relaxed Boolean MAX SAT and local consistency relaxation for logical MRFs So these scalable message-passing algorithms can also be used to reason about similarity, imprecise concepts, etc.! Bach et al. In Preparation
28
Hinge-Loss Markov Random Fields
29
29 Generalizing Relaxed MRFs Relaxed, logic-based MRFs can reason about both discrete and continuous relational data scalably and accurately Define a new distribution over continuous variables: We can generalize this inference objective to be the energy of a new type of MRF that does even more Bach et al. NIPS 12, Bach et al. UAI 13
30
30 Generalizations Arbitrary hinge-loss functions (not just logical clauses) Hard linear constraints Squared hinge losses Bach et al. NIPS 12, Bach et al. UAI 13
31
31 Hinge-Loss MRFs Define hinge-loss MRFs by using this generalized objective as the energy function Bach et al. NIPS 12, Bach et al. UAI 13
32
32 HL-MRF Inference and Learning MAP inference for HL-MRFs always a convex optimization -Highly scalable ADMM algorithm for MAP Supervised Learning -No need to hand-tune weights -Learn from training data -Also highly scalable Unsupervised and Semi-Supervised Learning -New learning algorithm that interleaves inference and parameter updates to cut learning time by as much as 90% (under review) Bach et al. NIPS 12, Bach et al. UAI 13 More in Bert’s talk
33
Probabilistic Soft Logic
34
34 Probabilistic Soft Logic (PSL) Probabilistic programming language for defining HL-MRFs PSL components -Predicates:relationships or properties -Atoms:(continuous) random variables -Rules:potentials
35
35 Example: Voter Identification ? $$ Tweet Status update 5.0 : Donates(A, “Republican”) -> Votes(A, “Republican”) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0.
36
36 Example: Voter Identification 0.8 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) spouse colleague spouse friend
37
37 Example: Voter Identification /* Predicate definitions */ Votes(Person, Party) Donates(Person, Party)(closed) Mentions(Person, Term)(closed) Colleague(Person, Person)(closed) Friend(Person, Person)(closed) Spouse(Person, Person)(closed) /* Local rules */ 5.0 : Donates(A, P) -> Votes(A, P) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) 0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)... /* Relational rules */ 1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) 0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0.
38
38 PSL Defines HL-MRFs /* Predicate definitions */ Votes(Person, Party) Donates(Person, Party)(closed) Mentions(Person, Term)(closed) Colleague(Person, Person)(closed) Friend(Person, Person)(closed) Spouse(Person, Person)(closed) /* Local rules */ 5.0 : Donates(A, P) -> Votes(A, P) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) 0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)... /* Relational rules */ 1.0 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) 0.1 : Votes(A,P) && Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”)=1. +=
39
39 Open Source Implementation PSL is implemented as an open source library and programming language interface It’s ready to use for your next project Some of the other groups already using PSL: -Jure Leskovec (Stanford)[West et al., TACL 14] -Dan Jurafsky (Stanford)[Li et al., ArXiv 14] -Ray Mooney (UT Austin)[Beltagy et al., ACL 14] -Kevin Murphy (Google)[Pujara et al., BayLearn 14]
40
40 Other PSL Topics Not discussed here: -Smart grounding -Lazy inference -Distributed inference and learning Future work: -Lifted inference -Generalized rounding guarantees
41
41 psl.cs.umd.edu
42
Applications
43
43 PSL Empirical Highlights Compared with discrete MRFs: Predicting MOOC outcomes via latent engagement (AuPR): Collective ClassificationTrust Prediction PSL81.8%0.7 sec.482 AuPR0.32 sec Discrete79.7%184.3 sec.441 AuPR212.36 sec TechWomen-CivilGenes Lecture Rank0.3330.5080.688 PSL-Direct0.3930.5650.757 PSL-Latent0.5460.8160.818 Bach et al. UAI 13, Ramesh et al. AAAI 14
44
44 PSL Empirical Highlights Improved activity recognition in video: Compared on drug-target interaction prediction: 5 Activities6 Activities HOG47.4%.481 F159.6%.582 F1 PSL + HOG59.8%.603 F179.3%.789 F1 ACD67.5%.678 F183.5%.835 F1 PSL + ACD69.2%.693 F186.0%.860 F1 London et al. CVPR WS 13, Fakhraei et al. TCBB 14 AuPRP@130 Perlman’s Method.564 ±.05.594 ±.04 PSL.617 ±.05.616 ±.04
45
Conclusion
46
46 Conclusion HL-MRFs unite and generalize different ways of viewing fundamental AI problems, including MAX SAT, probabilistic graphical models, and fuzzy logic PSL’s mix of expressivity and scalability makes it a general-purpose tool for network analysis, bioinformatics, NLP, computer vision, and more Many exciting open problems, from algorithms, to theory, to applications
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.