Efficient Inference Methods for Probabilistic Logical Models

Slides:



Advertisements
Similar presentations
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Advertisements

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Discriminative Training of Markov Logic Networks
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
CPSC 322, Lecture 30Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30 March, 25, 2015 Slide source: from Pedro Domingos UW.
Sriraam Natarajan Introduction to Probabilistic Logical Models Slides based on tutorials by Kristian Kersting, James Cussens, Lise Getoor & Pedro Domingos.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Dynamic Bayesian Networks (DBNs)
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network Jude Shavlik Sriraam Natarajan Computer.
Approximate Counting via Correlation Decay Pinyan Lu Microsoft Research.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
Boosting Markov Logic Networks
Practical Probabilistic Relational Learning Sriraam Natarajan.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Markov Logic And other SRL Approaches
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Made by: Maor Levy, Temple University  Inference in Bayes Nets ◦ What is the probability of getting a strong letter? ◦ We want to compute the.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Lifted First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International joint work with Eyal Amir and Dan Roth.
Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
1 Lifted First-Order Probabilistic Inference Rodrigo de Salvo Braz University of Illinois at Urbana-Champaign with Eyal Amir and Dan Roth.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
4. Particle Filtering For DBLOG PF, regular BLOG inference in each particle Open-Universe State Estimation with DBLOG Rodrigo de Salvo Braz*, Erik Sudderth,
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
CPSC 322, Lecture 31Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 25, 2015 Slide source: from Pedro Domingos UW & Markov.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
CPSC 322, Lecture 30Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30 Nov, 23, 2015 Slide source: from Pedro Domingos UW.
Pattern Recognition and Machine Learning
Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Happy Mittal Advisor : Parag Singla IIT Delhi Lifted Inference Rules With Constraints.
Reasoning Under Uncertainty: Belief Networks
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 29
Anytime Lifted Belief Propagation
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
CS 188: Artificial Intelligence Fall 2008
Expectation-Maximization & Belief Propagation
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Markov Networks.
Statistical Relational AI
Mean Field and Variational Methods Loopy Belief Propagation
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

Efficient Inference Methods for Probabilistic Logical Models Sriraam Natarajan Dept of Computer Science, University of Wisconsin-Madison

Take-Away Message Inference in SRL Models is very hard!!!! This talk – Presents 3 different yet related inference methods The methods are independent of the underlying formalism They have been applied to different kinds of problems

Propositional Model! The World is inherently Uncertain Graphical Models (here e.g. a Bayesian network) - Model uncertainty explicitly by representing the joint distribution Fever Ache Random Variables Influenza Direct Influences Propositional Model!

Real-World Data (Dramatically Simplified) Non- i.i.d PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza PatientID Gender Birthdate P1 M 3/22/63 Shared Parameters Solution: First-Order Logic / Relational Databases PatientID Date Lab Test Result PatientID SNP1 SNP2 … SNP500K P1 AA AB BB P2 AB BB AA P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45 Multi-Relational PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months

Statistical Relational Learning (SRL) Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models Logic Statistical Relational Learning (SRL) Add Probabilities Probabilities Add Relations Uncertainty in SRL Models is captured by probabilities, weights or potential functions

Alphabetic Soup => Endless Possibilities Probabilistic Relational Models (PRM) Bayesian Logic Programs (BLP) PRISM Stochastic Logic Programs (SLP) Independent Choice Logic (ICL) Markov Logic Networks (MLN) Relational Markov Nets (RMN) CLP-BN Relational Bayes Nets (RBN) Probabilistic Logic Progam (PLP) ProbLog …. Web data (web) Biological data (bio) Social Network Analysis (soc) Bibliographic data (cite) Epidimiological data (epi) Communication data (comm) Customer networks (cust) Collaborative filtering problems (cf) Trust networks (trust) … Fall 2003– Dietterich @ OSU, Spring 2004 –Page @ UW, Spring 2007-Neville @ Purdue, Fall 2008 – Pedro @ CMU

Key Problem - Inference Equivalent to counting 3SAT Models => #P-complete More pronounced in SRL Models Prohibitively large number of Objects and Relations Inference has been the biggest bottleneck for the use of SRL Models in practice

Grounding / Propositionalization Difficulty(C,D), Grade(S,C,G) :- Satisfaction(S) 1 student s1, 10 Courses Diff(c6,d3) Diff(c1,d1) Diff(c5,d1) Grade(s1,c5,A) Grade(s1,c6,B) Grade(s1,c1,B) Diff(c4,d4) Diff(c7,d2) Satisfaction(S) Grade(s1,c4,A) Grade(s1,c7,A) Diff(c8,d2) Diff(c3,d2) Grade(s1,c3,B) Grade(s1,c8,A) Grade(s1,c10,A) Diff(c2,d1) Diff(c9,d4) Grade(s1,c2,A) Diff(c10,d2) Grade(s1,c9,A)

Realistic Example – Gene-fold Prediction Thanks to Irene Ong

Recent Advances in SRL Inference Preprocessing for Inference FROG – Shavlik & Natarajan (2009) Lifted Exact Inference Lifted Variable Elimination – Poole (2003), Braz et al(2005) Milch et al (2008) Lifted VE + Aggregation – Kisynski & Poole (2009) Sampling Methods MCMC techniques – Milch & Russell (2006) Logical Particle Filter – Natarajan et al (2008), ZettleMoyer et al (2007) Lazy Inference – Poon et al (2008) Approximate Methods Lifted First-Order Belief Propagation – Singla & Domingos (2008) Counting Belief Propagation – Kersting et al (2009) MAP Inference – Riedel (2008) Bounds Propagation Anytime Belief Propagation – Braz et al (2009)

Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion

Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion

Markov Logic Networks Weighted logic Standard approach 1) Assume finite number of constants 2) Create all possible groundings 3) Perform statistical inference (often via sampling) Weight of formula i No. of true groundings of formula i in x (Richardson & Domingos, MLJ 2005)

Counting Satisfied Groundings Typically lots of redundancy in FOL sentences  x, y, z p(x) ⋀ q(x, y, z) ⋀ r(z)  w(x, y, z) If p(John) = false, then formula = true for all Y and Z values

Factoring Out the Evidence Let A = weighted sum of formula satisfied by evidence Let Bi = weighted sum of formula in world i not satisfied by evidence Prob(world i ) = e Bi e B1 + … + e Bn e A + Bi e A + B1 + … + e A + Bn

Take-Away Message - I Efficiently factor out those formula groundings that evidence satisfies Can potentially eliminate the need for approximate inference

Worked Example 1012 The Evidence  x, y, z GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x, z) ⋀ SameGroup(y, z)  AdvisedBy(x, y) 10,000 People at some school 2000 Graduate students 1000 Professors 1000 TAs 500 Pairs of professors in the same group The Evidence Total Num of Groundings = |x|  |y|  |z| = 1012 1012

GradStudent(x) FROG keeps only these X values 2000 Grad Students 8000 GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) FROG keeps only these X values GradStudent(P1) GradStudent(P3) … 2000 Grad Students GradStudent(x) True GradStudent(P1) ¬ GradStudent(P2) GradStudent(P3) … ¬ GradStudent(P2) ¬ GradStudent(P4) … 8000 Others False All these values for X satisfy the clause, regardless of Y and Z 1012 2 × 1011 Instead of 104 values for X, have 2 x 103

Prof(y) 1000 Professors 9000 Others 2 × 1011 2 × 1010 GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) 1000 Professors Prof(P2) … Prof(y) True ¬ Prof(P1) Prof(P2) … 9000 Others False ¬ Prof(P1) … 2 × 1011 2 × 1010

<<< Same as Prof(y) >>> GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) <<< Same as Prof(y) >>> 2 × 1010 2 × 109

1000 true SameGroup’s SameGroup(y, z) 106 – 1000 Others 2 × 109 GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) SameGroup(P1, P2) … 1000 true SameGroup’s SameGroup(y, z) True 106 Combinations 106 – 1000 Others ¬ SameGroup(P2, P5) … False 2000 values of X 1000 Y:Z combinations 2 × 109 2 × 106

GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) 1000 TA’s TA(P7,P5) … TA(x, z) True 2 × 106 Combinations 2 × 106 – 1000 Others ¬ TA(P8,P4) … False ≤ 1000 values of X ≤ 1000 Y:Z combinations ≤ 106

Original number of groundings = 1012 GradStudent(x) ⋀ Prof(y) ⋀ Prof(z) ⋀ TA(x,z) ⋀ SameGroup(y,z)  AdvisedBy(x,y) 1012 Original number of groundings = 1012 106 Final number of groundings ≤ 106

Sample Results: UWash-CSE Fully Grounded Net FROG’s Reduced Net FROG’s Reduced Net without One Challenging Rule advisedBy(x,y)  advisedBy(x,z)  samePerson(y,z))

Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion

Belief Propagation Message passing algorithm – Inference on graphical models For factor graphs Exact – if the factor graph is a tree Approximate when it has cycles Loopy BP does not guarantee convergence, but is found to be very useful in practice X3 f1 X1 f2 X2

Belief Propagation Identical Factors

Take-Away Message – II Counting shared factors can result in great efficiency gains for (loopy) belief propagation

Counting Belief Propagation Two Steps Compress Factor Graph Run modified BP

Step 1: Compression

Step 2: Modified Belief Propagation

Factored Frontier (FF) Probabilistic inference over time is central to many AI problems In contrast to static domains, we need approximation Variables easily become correlated over time by virtue of sharing common influences in the past Factored Frontier [Murphy and Weiss 01] Unroll DBN Run (loopy) BP Lifted First-Order FF: Use CBP in place of BP

Lifted First-order Factored Frontier Successor fluent 20 people over 10 time steps Max number of friends 5 Cancer never observed Time step randomly selected

Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion

The Need for Shattering Lifted BP depends on clusters of variables being symmetric, that is, sending and receiving identical messages In other words, it is about dividing random variables in cases – called as “shattering”

Intuition for Anytime Lifted BP in(House, Town) next(House,Another) earthquake(Town) Alarm can go off due to an earthquake lives(Another,Neighbor) alarm(House) saw(Neighbor,Someone) burglary(House) masked(Someone) A “prior” factor makes alarm going off unlikely without those causes Alarm can go off due to burglary partOf(Entrance,House) in(House,Item) broken(Entrance) missing(Item)

Intuition for Anytime Lifted BP alarm(House) earthquake(Town) in(House, Town) burglary(House) next(House,Another) lives(Another,Neighbor) saw(Neighbor,Someone) masked(Someone) in(House,Item) missing(Item) partOf(Entrance,House) broken(Entrance) Given a home in sf with home2 and home3 next to it with neighbors jim and mary, each seeing person1 and person2, several items in home, including a missing ring and non-missing cash, broken front but not broken back entrances to home, an earthquake in sf, what is the probability that home’s alarm goes off?

Lifted Belief Propagation Model for house ≠ home and town ≠ sf not shown Message passing over entire model before obtaining query answer next(home,home2) in(home, sf) Complete shattering before belief propagation starts lives(home2,jim) … earthquake(sf) saw(jim,person1) masked(person1) next(home,home3) alarm(home) lives(home2,mary) saw(mary,person2) burglary(home) masked(person2) in(home,cash) partOf(front,home) … missing(cash) broken(front) in(home,Item) in(home,ring) partOf(back,home) Item not in { ring,cash,…} … missing(ring) missing(Item) broken(back)

Intuition for Anytime Lifted BP next(home,home2) Evidence in(home, sf) lives(home2,jim) … earthquake(sf) saw(jim,person1) Given earthquake, we already have a good lower bound, regardless of burglary branch Query masked(person1) next(home,home3) alarm(home) lives(home2,mary) saw(mary,person2) burglary(home) Wasted shattering! Wasted shattering! Wasted shattering! Wasted shattering! Wasted shattering! masked(person2) in(home,cash) partOf(front,home) … missing(cash) broken(front) in(home,Item) in(home,ring) partOf(back,home) Item not in { ring,cash,…} … missing(ring) missing(Item) broken(back)

Using only a portion of a model By using only a portion, we don’t have to shatter other parts of the model How can we use only a portion? A solution for propositional models already exists: box propagation (Mooij & Kappen NIPS ‘08)

Box Propagation A way of getting bounds on query without examining entire network. [0, 1] A

Box Propagation A way of getting bounds on query without examining entire network. [0.36, 0.67] [0, 1] A B f1

Box Propagation A way of getting bounds on query without examining entire network. [0,1] [0.1, 0.6] [0.38, 0.50] [0.05, 0.5] f2 ... A B f1 [0,1] f3 ... [0.32, 0.4]

Box Propagation A way of getting bounds on query without examining entire network. [0.2,0.8] [0.3, 0.4] [0.41, 0.44] [0.17, 0.3] f2 ... A B f1 [0,1] f3 ... [0.32, 0.4]

Box Propagation A way of getting bounds on query without examining entire network. 0.45 0.32 0.42 0.21 f2 ... A B f1 0.3 f3 ... 0.36 Convergence after all messages are collected

Take-Away Message - III Anytime BP = Incremental Shattering + Box Propagation

Anytime Lifted Belief Propagation Start from query alone [0,1] alarm(home) The algorithm works by picking a cluster variable and including the factors in its blanket

Anytime Lifted Belief Propagation in(home, Town) earthquake(Town) [0.1, 0.9] alarm(home) burglary(home) (alarm(home), in(home,Town), earthquake(Town)) after unifying alarm(home) and alarm(House) in (alarm(House), in(House,Town), earthquake(Town)) producing constraint House = home Again, through unification Blanket factors alone can determine a bound on query

Anytime Lifted Belief Propagation (in(home, sf)) in(home, sf) earthquake(sf) Cluster in(home, Town) unifies with in(home, sf) in (in(home, sf)) (which represents evidence) splitting cluster around Town = sf [0.1, 0.9] alarm(home) burglary(home) in(home, Town) Bound remains the same because we still haven’t considered evidence on earthquakes Town ≠ sf earthquake(Town)

Anytime Lifted Belief Propagation in(home, sf) (earthquake(sf)) represents the evidence that there was an earthquake earthquake(sf) [0.8, 0.9] alarm(home) burglary(home) Now query bound becomes narrow No need to further expand (and shatter) other branches If bound is good enough, there is no need to further expand (and shatter) other branches in(home, Town) Town ≠ sf earthquake(Town)

Anytime Lifted Belief Propagation in(home, sf) earthquake(sf) [0.85, 0.9] partOf(front,home) alarm(home) burglary(home) broken(front) in(home, Town) We can keep expanding at will for narrower bounds… Now query bound becomes narrow Town ≠ sf earthquake(Town)

Anytime Lifted Belief Propagation next(home,home2) in(home, sf) … until convergence, if desired. lives(home2,jim) … earthquake(sf) saw(jim,person1) 0.8725 masked(person1) next(home,home3) alarm(home) lives(home2,mary) saw(mary,person2) burglary(home) masked(person2) in(home,cash) partOf(front,home) … missing(cash) broken(front) in(home,Item) in(home,ring) partOf(back,home) Item not in { ring,cash,…} … missing(ring) missing(Item) broken(back)

Connection to Resolution Refutation Incremental shattering corresponds to building a proof tree in(home, sf) earthquake(sf) true alarm(home) burglary(home) … in(home,L), L not in {sf} earthquake(L), L not in {sf}

Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation Conclusion

Conclusion Inference is the key issue in several SRL formalisms FROG - Keeps the count of unsatisfied groundings Order of Magnitude reduction in number of groundings Compares favorably to Alchemy in different domains Counting BP - BP + grouping nodes sending and receiving identical messages Conceptually easy, scaleable BP algorithm Applications to challenging AI tasks Anytime BP – Incremental Shattering + Box Propagation Only the most necessary fraction of model considered and shattered Status – Implementation and evaluation

Conclusion Algorithms are independent of representation Variety of Applications Parameter Learning of Relational Models Social Networks Object Recognition Link Prediction Activity Recognition Model Counting Bio-Medical Applications Relational RL

Future Work FROG CBP Anytime BP SRL Models Combine with Lifted Inference Exploit commonality across rules CBP Integrate with Parameter Learning in SRL Models Extend to Multi-Agent RL, Lifted Pairwise BP Anytime BP Heuristic to expand the network Understand closer connections to Resolution SRL Models Learning Dynamic SRL Models Structure Learning remains an open issue

Acknowledgements* Babak Ahmadi - Fraunhofer Institute Rodrigo de Salvo Braz – SRI International Hung Bui – SRI International Vitor Santos Costa – U Porto Kristian Kersting - Fraunhofer Institute Gautam Kunapuli – UW Madison David Page – UW Madison Stuart Russell – UC Berkeley Jude Shavlik – UW Madison Prasad Tadepalli – Oregon State University * Ordered by Last name

Thanks!