Anytime Lifted Belief Propagation

Slides:



Advertisements
Similar presentations
Probabilistic Resolution. Logical reasoning Absolute implications office meeting office talk office pick_book But what if my rules are not absolute?
Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Efficient Inference Methods for Probabilistic Logical Models
Lauritzen-Spiegelhalter Algorithm
Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea
1 MPE and Partial Inversion in Lifted Probabilistic Variable Elimination Rodrigo de Salvo Braz University of Illinois at Urbana-Champaign with Eyal Amir.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
EE 553 Integer Programming
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Dynamic Bayesian Networks (DBNs)
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Lifted First-Order Probabilistic Inference Rodrigo de Salvo Braz, Eyal Amir and Dan Roth This research is supported by ARDA’s AQUAINT Program, by NSF grant.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Carnegie Mellon Focused Belief Propagation for Query-Specific Inference Anton Chechetka Carlos Guestrin 14 May 2010.
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.
Global Approximate Inference Eran Segal Weizmann Institute.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Building Low-Diameter P2P Networks Eli Upfal Department of Computer Science Brown University Joint work with Gopal Pandurangan and Prabhakar Raghavan.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Belief Propagation, Junction Trees, and Factor Graphs
Algorithm Efficiency and Sorting
The Byzantine Generals Strike Again Danny Dolev. Introduction We’ll build on the LSP presentation. Prove a necessary and sufficient condition on the network.
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.
Two Dimensions and Beyond From: “ A New Kind of Science” by Stephen Wolfram Presented By: Hridesh Rajan.
 Jim has six children.  Chris fights with Bob,Faye, and Eve all the time; Eve fights (besides with Chris) with Al and Di all the time; and Al and Bob.
Survey Propagation. Outline Survey Propagation: an algorithm for satisfiability 1 – Warning Propagation – Belief Propagation – Survey Propagation Survey.
Lifted First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International joint work with Eyal Amir and Dan Roth.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Announcements Project 4: Ghostbusters Homework 7
1 Lifted First-Order Probabilistic Inference Rodrigo de Salvo Braz University of Illinois at Urbana-Champaign with Eyal Amir and Dan Roth.
4. Particle Filtering For DBLOG PF, regular BLOG inference in each particle Open-Universe State Estimation with DBLOG Rodrigo de Salvo Braz*, Erik Sudderth,
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Inference Algorithms for Bayes Networks
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Clustering Data Streams A presentation by George Toderici.
Reasoning Under Uncertainty: Belief Networks
Qian Liu CSE spring University of Pennsylvania
A New Algorithm for Computing Upper Bounds for Functional EmajSAT
Read R&N Ch Next lecture: Read R&N
Perturbation method, lexicographic method
Pushdown Automata.
Bayesian Networks: A Tutorial
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Some basic electronics and truth tables
Craig Schroeder October 26, 2004
Lifted First-Order Probabilistic Inference [de Salvo Braz, Amir, and Roth, 2005] Daniel Lowd 5/11/2005.
Instructor: Shengyu Zhang
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
CAP 5636 – Advanced Artificial Intelligence
What’s slope got to do with it?
CS 188: Artificial Intelligence
What’s slope got to do with it?
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
CS 188: Artificial Intelligence Fall 2008
Artificial Intelligence
Sample Surveys Idea 1: Examine a part of the whole.
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Clique Tree Algorithm: Computation
Junction Trees 3 Undirected Graphical Models
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Top 10 maths topics that GCSE students struggle with
Switching Lemmas and Proof Complexity
Mean Field and Variational Methods Loopy Belief Propagation
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

Anytime Lifted Belief Propagation Rodrigo de Salvo Braz Hung Bui Sriraam Natarajan Jude Shavlik Stuart Russell SRI International SRI International University of Wisconsin University of Wisconsin UC Berkeley

Slides online http://www.ai.sri.com/~braz/ and go to “Presentations”

What we are doing Regular lifted inference (de Salvo Braz et al, MLNs, Milch) shatters models (exhaustively splits them) before inference starts Needs to consider entire model before giving a result In this work, we interleave splitting and inference, obtaining exact bounds on query as we go We usually will not shatter or consider entire model before yielding answer

Outline Background Anytime Lifted Belief Propagation Final Remarks Relational Probabilistic Models Propositionalization Belief Propagation Lifted Belief Propagation and shattering Anytime Lifted Belief Propagation Intuition Box propagation Example Final Remarks Connection to Theorem Proving Conclusion Future directions

Background

Relational Probabilistic Model Compact representation for graphical models A parameterized factor (parfactor) stands for all its instantiations for all Y 1(funny(Y)) for all X,Y, X≠Y : 2(funny(Y),likes(X,Y)) stands for 1(funny(a)), 1(funny(b)), 1(funny(c)),…, 2(funny(a),likes(b,a)), 2(funny(a),likes(c,a)),…, 2(funny(z),likes(a,z)),…

Propositionalization for all Y 1(funny(Y)) for all X,Y, X≠Y : 2(funny(Y),likes(X,Y)) P(funny(fred) | likes(tom,fred)) = ? Evidence likes(tom,fred) 2 2 likes(alice,fred) Query funny(fred) 1 2 likes(bob,fred) … 2 likes(zoe,fred)

Propagates messages all the way to query Belief Propagation Propagates messages all the way to query for all Y 1(funny(Y)) for all X,Y, X≠Y : 2(funny(Y),likes(X,Y)) P(funny(fred) | likes(tom,fred)) = ? Evidence sends a different message because it has evidence on it likes(tom,fred) 2 2 likes(alice,fred) Query funny(fred) 1 2 likes(bob,fred) … groups of identical messages 2 likes(zoe,fred)

Lifted Belief Propagation Groups identical messages and computes them once for all Y 1(funny(Y)) for all X,Y, X≠Y : 2(funny(Y),likes(X,Y)) P(funny(fred) | likes(tom,fred)) = ? Evidence Evidence likes(tom, fred) likes(tom,fred) Query Query likes(alice,fred) funny(fred) funny(fred) likes(Person, fred) likes(bob,fred) … messages exponentiated by the number of individual identical messages likes(zoe,fred) cluster of symmetric random variables

The Need for Shattering Lifted BP depends on clusters of variables being symmetric, that is, sending and receiving identical messages. In other words, it is about dividing random variables in cases funny(Y) likes(X,Y) neighbors(X,Y) classmates(X,Y) Evidence: neighbors(tom,fred), classmates(mary,fred)

The Need for Shattering Evidence on neighbors(tom,fred) makes it distinct from others in “neighbors” cluster neighbors(tom,fred) funny(fred) likes(tom,fred) classmates(tom,fred) neighbors(X,fred) likes(X,fred) classmates(X,fred) neighbors(X,Y) funny(Y) likes(X,Y) Even clusters without evidence need to be split because distinct messages make their destinations distinct as well classmates(X,Y) x Even clusters without evidence need to be split Y not fred X not tom

The Need for Shattering In regular lifted BP, we only get to cluster perfectly interchangeable objects (everyone who is not tom or mary ”behaves the same”). If they are just similar, they still need to be considered separately. neighbors(tom,fred) funny(fred) likes(tom,fred) classmates(tom,fred) neighbors(mary,fred) likes(mary,fred) classmates(mary,fred) X not in {tom,mary} neighbors(X,fred) likes(X,fred) classmates(X,fred) Evidence on classmates(mary,fred) further splits clusters Y not fred neighbors(X,Y) funny(Y) likes(X,Y) classmates(X,Y)

Anytime Lifted Belief Propagation

Intuition for Anytime Lifted BP in(House, Town) next(House,Another) earthquake(Town) lives(Another,Neighbor) Alarm can go off due to an earthquake alarm(House) saw(Neighbor,Someone) burglary(House) masked(Someone) A “prior” factor makes alarm going off unlikely without those causes Alarm can go off due to burglary partOf(Entrance,House) in(House,Item) broken(Entrance) missing(Item)

Intuition for Anytime Lifted BP alarm(House) earthquake(Town) in(House, Town) burglary(House) next(House,Another) lives(Another,Neighbor) saw(Neighbor,Someone) masked(Someone) in(House,Item) missing(Item) partOf(Entrance,House) broken(Entrance) Given a home in sf with home2 and home3 next to it with neighbors jim and mary, each seeing person1 and person2, several items in home, including a missing ring and non-missing cash, broken front but not broken back entrances to home, an earthquake in sf, what is the probability that home’s alarm goes off?

Lifted Belief Propagation next(home,home2) Message passing over entire model before obtaining query answer in(home, sf) Complete shattering before belief propagation starts lives(home2,jim) … earthquake(sf) saw(jim,person1) masked(person1) next(home,home3) alarm(home) lives(home2,mary) saw(mary,person2) burglary(home) masked(person2) in(home,cash) partOf(front,home) … missing(cash) broken(front) in(home,Item) in(home,ring) partOf(back,home) Item not in { ring,cash,…} … missing(ring) missing(Item) broken(back)

Intuition for Anytime Lifted BP next(home,home2) Evidence in(home, sf) lives(home2,jim) … earthquake(sf) saw(jim,person1) Given earthquake, we already have a good lower bound, regardless of burglary branch Query masked(person1) next(home,home3) alarm(home) lives(home2,mary) saw(mary,person2) burglary(home) Wasted shattering! Wasted shattering! Wasted shattering! Wasted shattering! Wasted shattering! masked(person2) in(home,cash) partOf(front,home) … missing(cash) broken(front) in(home,Item) in(home,ring) partOf(back,home) Item not in { ring,cash,…} … missing(ring) missing(Item) broken(back)

Using only a portion of a model By using only a portion, I don’t have to shatter other parts of the model. How can we use only a portion? A solution for propositional models already exists: box propagation.

Box Propagation A way of getting bounds on query without examining entire network. [0, 1] A

Box Propagation A way of getting bounds on query without examining entire network. [0.36, 0.67] [0, 1] A B f1

Box Propagation A way of getting bounds on query without examining entire network. [0,1] [0.1, 0.6] [0.38, 0.50] [0.05, 0.5] f2 ... A B f1 [0,1] f3 ... [0.32, 0.4]

Box Propagation A way of getting bounds on query without examining entire network. [0.2,0.8] [0.3, 0.4] [0.41, 0.44] [0.17, 0.3] f2 ... A B f1 [0,1] f3 ... [0.32, 0.4]

Box Propagation A way of getting bounds on query without examining entire network. 0.45 0.32 0.42 0.21 f2 ... A B f1 0.3 f3 ... 0.36 Convergence after all messages are collected

Anytime Lifted BP Incremental shattering + box propagation

Anytime Lifted Belief Propagation Start from query alone [0,1] alarm(home) The algorithm works by picking a cluster variable and including the factors in its blanket

Anytime Lifted Belief Propagation in(home, Town) earthquake(Town) [0.1, 0.9] alarm(home) burglary(home) (alarm(home), in(home,Town), earthquake(Town)) after unifying alarm(home) and alarm(House) in (alarm(House), in(House,Town), earthquake(Town)) producing constraint House = home Again, through unification Blanket factors alone can determine a bound on query (if alarm always has a probability of going off of at least 0.1 and at most 0.9 regarless of burglary or earthquakes)

Anytime Lifted Belief Propagation (in(home, sf)) in(home, sf) earthquake(sf) Cluster in(home, Town) unifies with in(home, sf) in (in(home, sf)) (which represents evidence) splitting cluster around Town = sf [0.1, 0.9] alarm(home) burglary(home) in(home, Town) Bound remains the same because we still haven’t considered evidence on earthquakes Town ≠ sf earthquake(Town)

Anytime Lifted Belief Propagation in(home, sf) (earthquake(sf)) represents the evidence that there was an earthquake earthquake(sf) [0.8, 0.9] alarm(home) burglary(home) Now query bound becomes narrow No need to further expand (and shatter) other branches If bound is good enough, there is no need to further expand (and shatter) other branches in(home, Town) Town ≠ sf earthquake(Town)

Anytime Lifted Belief Propagation in(home, sf) earthquake(sf) [0.85, 0.9] partOf(front,home) alarm(home) burglary(home) broken(front) in(home, Town) We can keep expanding at will for narrower bounds… Now query bound becomes narrow Town ≠ sf earthquake(Town)

Anytime Lifted Belief Propagation next(home,home2) in(home, sf) … until convergence, if desired. lives(home2,jim) … earthquake(sf) saw(jim,person1) 0.8725 masked(person1) In this example, it doesn’t seem worth it since we reach a narrow bound very early; it would be a lot of further processing for relatively little extra information next(home,home3) alarm(home) lives(home2,mary) saw(mary,person2) burglary(home) masked(person2) in(home,cash) partOf(front,home) … missing(cash) broken(front) in(home,Item) in(home,ring) partOf(back,home) Item not in { ring,cash,…} … missing(ring) missing(Item) broken(back)

Another Anytime Lifted BP example A more realistic example Large commonsense knowledge base Large number of facts on many different constants, making shattering very expensive

Anytime Lifted BP Intuition Let’s consider a large knowledge base formed by parfactors. (hasGoodOffer(Person), offer(Job,Person),goodFor(Person,Job)) (goodFor(Person,Job), cityPerson(Person),inCity(Job)) (goodFor(Person,Job), goodEmployer(Job)) (goodFor(Person,Job), involves(Subject,Job),likes(Subject,Person)) (goodEmployer(Job), in(Subject,Job),profitable(Subject)) (likes(Subject,Person), takesTeamWork(Subject),social(Person)) ... <many more parfactors representing rules> 0.9: offer(mary,Job), Job in {a,b,c}. 1.0: not offer(mary,Job), Job not in {a,b,c}. 0.8: goodEmployer(Job), Job in {a,c}. 1.0: social(mary). 0.7: involves(ai,a). 1.0: likes(theory,frank). 1.0: likes(graphics, john). 1.0: inCity(c). ... <and many more such facts from a database, for example> This is shorthand for a parfactor placing potentials 0.9 and 0.1 on offer(mary,Job) being true or false

And that’s just a single parfactor! Expensive shattering Any two constants among the ones shown have distinct properties, so their clusters become singletons, so these singleton clusters appear isolated for each parfactor. For example (goodFor(Person,Job), involves(Subject,Job),likes(Subject,Person)) gets shattered into (goodFor(mary,a), involves(theory,a),likes(theory,mary)) (goodFor(mary,b), involves(theory,b),likes(theory,mary)) (goodFor(mary,c), involves(theory,c),likes(theory,mary)) … (goodFor(mary,a), involves(ai,a),likes(ai,mary)) (goodFor(mary,b), involves(ai,b),likes(ai,mary)) (goodFor(mary,c), involves(ai,c),likes(ai,mary)) (goodFor(frank,a), involves(theory,a),likes(theory,frank)) (goodFor(frank,b), involves(theory,b),likes(theory,frank)) (goodFor(frank,c), involves(theory,c),likes(theory,frank)) (goodFor(Person,Job), involves(Subject,Job),likes(Subject,Person)), Person not in {mary,frank, …},Subject not in {theory,ai, …},Job not in {a,b,c, …} And that’s just a single parfactor!

Anytime Lifted BP Intuition We can usually tell a lot from a tiny fraction of a model. (hasGoodOffer(Person), offer(Job,Person),goodFor(Person,Job)) (goodFor(Person,Job), cityPerson(Person),inCity(Job)) (goodFor(Person,Job), goodEmployer(Job)) (goodFor(Person,Job), involves(Subject,Job),likes(Subject,Person)) (goodEmployer(Job), in(Subject,Job),profitable(Subject)) (likes(Subject,Person), takesTeamWork(Subject),social(Person)) ... <many more parfactors representing rules> 0.9: offer(mary,Job), Job in {a,b,c}. 1.0: not offer(mary,Job), Job not in {a,b,c}. 0.8: goodEmployer(Job), Job in {a,c}. 1.0: social(mary). 0.7: involves(ai,a). 1.0: likes(theory,frank). 1.0: likes(graphics, john). 1.0: inCity(c). ... <and many more such facts from a database, for example>

Anytime Lifted BP Intuition We can usually tell a lot from a tiny fraction of a model. (hasGoodOffer(Person), offer(Job,Person),goodFor(Person,Job)) (goodFor(Person,Job), cityPerson(Person),inCity(Job)) (goodFor(Person,Job), goodEmployer(Job)) (goodFor(Person,Job), involves(Subject,Job),likes(Subject,Person)) (goodEmployer(Job), in(Subject,Job),profitable(Subject)) (likes(Subject,Person), takesTeamWork(Subject),social(Person)) ... <many more parfactors representing rules> 0.9: offer(mary,Job), Job in {a,b,c}. 1.0: not offer(mary,Job), Job not in {a,b,c}. 0.8: goodEmployer(Job), Job in {a,c}. 1.0: social(mary). 0.7: involves(ai,a). 1.0: likes(theory,frank). 1.0: likes(graphics, john). 1.0: inCity(c). ... <and many more such facts from a database, for example> If either a or c is indeed a good employer (0.8), has made an offer to mary (0.9), then it is likely that she has a good offer So we can say that mary is likely to have a good offer without even looking, and much less shattering, the rest of the huge model!

Anytime Lifted BP Example [0,1] hasGoodOffer(P)

Anytime Lifted BP Example offer(J,P) [0.1, 1.0] hasGoodOffer(P) goodFor(P,J)

Anytime Lifted BP Example offer(J,P) goodFor(P,J) [0.1, 1.0] hasGoodOffer(P)

Anytime Lifted BP Example offer(J,mary), J in {a,b,c} goodFor(mary,J), J in {a,b,c} 0.9: offer(mary,J), J in {a,b,c}. [0.1, 1.0] hasGoodOffer(mary) Let’s leave this tree aside from now on to concentrate on mary (it is still going to be part of the network, we are just not going to show it for now) offer(J,mary), J not in {a,b,c} goodFor(mary,J), J not in {a,b,c} [0.1, 1.0] offer(J,P), P not mary hasGoodOffer(P), P not mary goodFor(P,J), P not mary

Anytime Lifted BP Example offer(J,mary), J in {a,b,c} goodFor(mary,J), J in {a,b,c} [0.1, 1.0] hasGoodOffer(mary) offer(J,mary), J not in {a,b,c} goodFor(mary,J), J not in {a,b,c}

Anytime Lifted BP Example offer(J,mary), J in {a,b,c} goodFor(mary,J), J in {a,b,c} goodEmployer(J), J in {a,b,c} [0.1, 1.0] hasGoodOffer(mary) (goodFor(mary,J), goodEmployer(J)), J in {a,b,c} split from (goodFor(P,J), goodEmployer(J)) by using current constraints on P and J (P = mary and J in {a,b,c}) offer(J,mary), J not in {a,b,c} goodFor(mary,J), J not in {a,b,c}

Anytime Lifted BP Example offer(J,mary), J in {a,c} goodFor(mary,J), J in {a,c} goodEmployer(J), J in {a,c} [0.82, 1.0] hasGoodOffer(mary) 0.8: goodEmployer(J), J in {a,c}. a and c may not be interchangeable given the whole model, but for this bound they were kept as a group all along. They are approximately interchangeable offer(b,mary) In regular lifted BP, we need to separate random variables on objects that are not perfectly interchangeable goodFor(mary,b) goodEmployer(b) offer(J,mary), J not in {a,b,c} goodFor(mary,J), J not in {a,b,c}

Final Remarks

Connection to Theorem Proving Incremental shattering corresponds to building a proof tree offer(J,mary), J in {a,c} goodFor(mary,J), J in {a,c} goodEmployer(J), J in {a,c} hasGoodOffer(mary) offer(b,mary) goodFor(mary,b) offer(J,mary), J not in {a,b,c} goodFor(mary,J), J not in {a,b,c}

Connection to Theorem Proving goodFor(mary,J), J in {a,b,c} goodEmployer(J), J in {a,b,c} (goodFor(mary,J), goodEmployer(J)), J in {a,b,c} split from (goodFor(P,J), goodEmployer(J)) This results from a unification step between (goodFor(P,J), goodEmployer(J)) and goodFor(mary,J), J in {a,b,c} where goodFor(P,J) is unified with goodFor(mary,J) through P = mary

Conclusions Most of query answer computed (potentially) much sooner than in Lifted BP Only the most necessary fraction of model considered and shatttered Sets of non-interchangeable objects still treated as groups Theorem proving-like probabilistic inference narrowing the gap from logic More intuitive algorithm, with natural explanations and proofs.

Future directions Which factor to expand next (for example using utilities). More flexible bounds (belief may be outside bounds but only with a small probability, for example)

Questions?