1 First-Order Probabilistic Models Brian Milch 9.66: Computational Cognitive Science December 7, 2006.

Slides:



Advertisements
Similar presentations
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Advertisements

Hierarchical Dirichlet Processes
Knowledge Representation and Reasoning University "Politehnica" of Bucharest Department of Computer Science Fall 2010 Adina Magda Florea
Exact Inference in Bayes Nets
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Gibbs sampling in open-universe stochastic languages Nimar S. Arora Rodrigo de Salvo Braz Erik Sudderth Stuart Russell.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
1 Department of Computer Science and Engineering, University of South Carolina Issues for Discussion and Work Jan 2007  Choose meeting time.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
BLOG: Probabilistic Models with Unknown Objects Brian Milch CS /6/04 Joint work with Bhaskara Marthi, David Sontag, Daniel Ong, Andrey Kolobov, and.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Approximate Inference 2: Monte Carlo Markov Chain
Topics Combining probability and first-order logic –BLOG and DBLOG Learning very complex behaviors –ALisp: hierarchical RL with partial programs State.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Relational Probability Models Brian Milch MIT 9.66 November 27, 2007.
First-Order Probabilistic Languages: Into the Unknown Brian Milch and Stuart Russell University of California at Berkeley, USA August 27, 2006 Based on.
IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.
Markov Logic And other SRL Approaches
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
Unknown Objects and BLOG Brian Milch MIT IPAM Summer School July 16, 2007.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
4. Particle Filtering For DBLOG PF, regular BLOG inference in each particle Open-Universe State Estimation with DBLOG Rodrigo de Salvo Braz*, Erik Sudderth,
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
BLOG: Probabilistic Models with Unknown Objects Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov University of.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
BLOG: Probabilistic Models with Unknown Objects Brian Milch Harvard CS 282 November 29,
Learning and Structural Uncertainty in Relational Probability Models Brian Milch MIT 9.66 November 29, 2007.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Artificial Intelligence
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Relational Probability Models
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Machine learning, probabilistic modelling
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
Gibbs sampling in open-universe stochastic languages
CS 188: Artificial Intelligence Fall 2008
Expectation-Maximization & Belief Propagation
Class #16 – Tuesday, October 26
Statistical Relational AI
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

1 First-Order Probabilistic Models Brian Milch : Computational Cognitive Science December 7, 2006

2 Theories Theory Prior over theories/ inductive bias Possible worlds/ outcomes (partially observed)

3 How Can Theories be Represented? Deterministic Probabilistic Propositional formulasBayesian network Finite state automaton N-gram model Hidden Markov model Context-free grammar Probabilistic context-free grammar First-order formulasFirst-order probabilistic model

4 Outline Motivation: Why first-order models? Models with known objects and relations –Representation with probabilistic relational models (PRMs) –Inference (not much to say) –Learning by local search Models with unknown objects and relations –Representation with Bayesian logic (BLOG) –Inference by likelihood weighting and MCMC –Learning (not much to say)

5 Propositional Theory (Deterministic) Scenario with students, courses, profs Propositional theory Dr. Pavlov teaches CS1 and CS120 Matt takes CS1 Judy takes CS1 and CS120 PavlovDemanding  CS1Hard PavlovDemanding  CS120Hard MattSmart  CS1Hard  MattGetsAInCS1  CS1Hard  MattGetsAInCS1 JudySmart  CS1Hard  JudyGetsAInCS1  CS1Hard  JudyGetsAInCS1 JudySmart  CS120Hard  JudyGetsAInCS120  CS120Hard  JudyGetsAInCS120 CS1Hard  MattTired CS1Hard  JudyTired CS120Hard  JudyTired

6 Propositional Theory (Probabilistic) PavlovDemanding CS1HardCS120Hard MattGetsA InCS1 JudyGetsA InCS1 JudyGetsA InCS120 MattSmartJudySmart Specific to particular scenario (who takes what, etc.) No generalization of knowledge across objects JudyTired MattTired

7 First-Order Theory General theory: Relational skeleton:  p  c [Teaches(p, c)  Demanding(p)  Hard(c)]  s  c [Takes(s, c)  Easy(c)  GetsA(s, c)]  s  c [Takes(s, c)  Hard(c)  Smart(s)  GetsA(s, c)] Teaches(Pavlov, CS1)Teaches(Pavlov, CS120) Takes(Matt, CS1) Takes(Judy, CS1)Takes(Judy, CS120) Compact, generalizes across scenarios and objects But deterministic  s  c [Takes(s, c)  Hard(c)  Tired(s, c)]

8 Task for First-Order Probabilistic Model Model D(P) H(C120) H(C1) T(M) S(M) A(M, C1) T(J) S(J) A(J, C1)A(J, C120) Prof: Pavlov Course: CS1, CS120 Student: Matt, Judy Teaches: (P, C1), (P, C120) Takes: (M, C1), (J, C1), (J, C120) Prof: Peterson, Quirk Course: Bio1, Bio120 Student: Mary, John Teaches: (P, B1), (Q, B160) Takes: (M, B1), (J, B160) Relational skeleton D(P) H(B160) H(B1) T(M) S(M) A(M, B1) T(J) S(J) A(J, B160) D(Q)

9 First-Order Probabilistic Models with Known Skeleton Random functions become indexed families of random variables For each family of RVs, specify: –How to determine parents from relations –CPD that can handle varying numbers of parents One way to do this: probabilistic relational models (PRMs) [Koller & Pfeffer 1998; Friedman, Getoor, Koller & Pfeffer 1999] Demanding(p)Hard(c)Smart(s)Tired(s)GetsA(s, c)

10 Probabilistic Relational Models Functions/relations treated as slots on objects –Simple slots (random) p.Demanding, c.Hard, s.Smart, s.Tired –Reference slots (nonrandom; value may be a set) p.Teaches, c.TaughtBy Specify parents with slot chains c.Hard  {c.TaughtBy.Demanding} Introduce link objects for non-unary functions –new type: Registration –reference slots: r.Student, r.Course, c.RegisteredIn –simple slots: r.GetsA

11 PRM for Academic Example p.Demanding  {} c.Hard  {c.TaughtBy.Demanding} s.Tired  { #True (c.RegisteredIn.Course.Hard)} r.GetsA  {r.Course.Hard, r.Student.Smart} s.Smart  {} Aggregation function: takes multiset of slot chain values, returns single value CPDs always get one parent value per slot chain

12 Inference in PRMs Construct ground BN –Node for each simple slot on each object –Edges found by following parent slot chains Run a BN inference algorithm –Exact (variable elimination) –Gibbs sampling –Loopy belief propagation Pavlov.D CS120.H CS1.H Matt.T Matt.S Reg1.A Judy.T Judy.S Reg2.AReg3.A [Although see Pfeffer et al. (1999) paper on SPOOK for smarter method] Warning: May be intractable

13 Learning PRMs Learn structure: for each simple slot, a set of parent slot chains with aggregation functions Marginal likelihood –prefers fitting the data well –penalizes having lots of parameters, i.e., lots of parents Prior penalizes long slot chains: priormarginal likelihood

14 PRM Learning Algorithm Local search over structures –Operators add, remove, reverse slot chains –Greedy: looks at all possible moves, choose one that increases score the most Proceed in phases –Increase max slot chain length each time –Until no improvement in score

15 PRM Benefits and Limitations Benefits –Generalization across objects Models are compact Don’t need to learn new theory for each new scenario –Learning algorithm is known Limitations –Slot chains are restrictive, e.g., can’t say GoodRec(p, s)  {GotA(s, c) : TaughtBy(c, p)} –Objects and relations have to be specified in skeleton [although see later extensions to PRM language]

16 Basic Task for Intelligent Agents Given observations, make inferences about underlying objects Difficulties: –Don’t know list of objects in advance –Don’t know when same object observed twice (identity uncertainty / data association / record linkage)

17 S. Russel and P. Norvig (1995). Artificial Intelligence:... Unknown Objects: Applications S. Russel and P. Norvig (1995). Artificial Intelligence:... Russell, Stuart and Norvig, Peter. Articial Intelligence... Citation Matching Stuart Russell Peter Norvig Artificial Intelligence: A Modern Approach t=1t=2 Multi-Target Tracking

18 Levels of Uncertainty A C B D A C B D A C B D A C B D Attribute Uncertainty A C B D A C B D A C B D A C B D Relational Uncertainty A, C B, D Unknown Objects A, B, C, D A, C B, D A C, D B

19 Bayesian Logic (BLOG) Defines probability distribution over possible worlds with varying sets of objects Intuition: Stochastic generative process with two kinds of steps: –Set the value of a function on a tuple of arguments –Add some number of objects to the world [Milch et al., SRL 2004; IJCAI 2005]

20 Simple Example: Balls in an Urn Draws (with replacement) P(n balls in urn) P(n balls in urn | draws) 1234

21 Possible Worlds …… …… 3.00 x x x x x Draws

22 Generative Process for Possible Worlds Draws (with replacement) 1234

23 BLOG Model for Urn and Balls type Color; type Ball; type Draw; random Color TrueColor(Ball); random Ball BallDrawn(Draw); random Color ObsColor(Draw); guaranteed Color Blue, Green; guaranteed Draw Draw1, Draw2, Draw3, Draw4; #Ball ~ Poisson[6](); TrueColor(b) ~ TabularCPD[[0.5, 0.5]](); BallDrawn(d) ~ UniformChoice({Ball b}); ObsColor(d) if (BallDrawn(d) != null) then ~ NoisyCopy(TrueColor(BallDrawn(d)));

24 BLOG Model for Urn and Balls type Color; type Ball; type Draw; random Color TrueColor(Ball); random Ball BallDrawn(Draw); random Color ObsColor(Draw); guaranteed Color Blue, Green; guaranteed Draw Draw1, Draw2, Draw3, Draw4; #Ball ~ Poisson[6](); TrueColor(b) ~ TabularCPD[[0.5, 0.5]](); BallDrawn(d) ~ UniformChoice({Ball b}); ObsColor(d) if (BallDrawn(d) != null) then ~ NoisyCopy(TrueColor(BallDrawn(d))); header number statement dependency statements

25 BLOG Model for Urn and Balls type Color; type Ball; type Draw; random Color TrueColor(Ball); random Ball BallDrawn(Draw); random Color ObsColor(Draw); guaranteed Color Blue, Green; guaranteed Draw Draw1, Draw2, Draw3, Draw4; #Ball ~ Poisson[6](); TrueColor(b) ~ TabularCPD[[0.5, 0.5]](); BallDrawn(d) ~ UniformChoice({Ball b}); ObsColor(d) if (BallDrawn(d) != null) then ~ NoisyCopy(TrueColor(BallDrawn(d))); Identity uncertainty: BallDrawn(Draw1) = BallDrawn(Draw2) ?

26 BLOG Model for Urn and Balls type Color; type Ball; type Draw; random Color TrueColor(Ball); random Ball BallDrawn(Draw); random Color ObsColor(Draw); guaranteed Color Blue, Green; guaranteed Draw Draw1, Draw2, Draw3, Draw4; #Ball ~ Poisson[6](); TrueColor(b) ~ TabularCPD[[0.5, 0.5]](); BallDrawn(d) ~ UniformChoice({Ball b}); ObsColor(d) if (BallDrawn(d) != null) then ~ NoisyCopy(TrueColor(BallDrawn(d))); Arbitrary conditional probability distributions CPD arguments

27 BLOG Model for Urn and Balls type Color; type Ball; type Draw; random Color TrueColor(Ball); random Ball BallDrawn(Draw); random Color ObsColor(Draw); guaranteed Color Blue, Green; guaranteed Draw Draw1, Draw2, Draw3, Draw4; #Ball ~ Poisson[6](); TrueColor(b) ~ TabularCPD[[0.5, 0.5]](); BallDrawn(d) ~ UniformChoice({Ball b}); ObsColor(d) if (BallDrawn(d) != null) then ~ NoisyCopy(TrueColor(BallDrawn(d))); Context-specific dependence

28 Syntax of Dependency Statements RetType Function(ArgType 1 x 1,..., ArgType k x k ) if Cond 1 then ~ ElemCPD 1 (Arg 1,1,..., Arg 1,m ) elseif Cond 2 then ~ ElemCPD 2 (Arg 2,1,..., Arg 2,m )... else ~ ElemCPD n (Arg n,1,..., Arg n,m ); Conditions are arbitrary first-order formulas Elementary CPDs are names of Java classes Arguments can be terms or set expressions Number statements: same except that their headers have the form #

29 Generative Process for Aircraft Tracking SkyRadar Existence of radar blips depends on existence and locations of aircraft

30 BLOG Model for Aircraft Tracking … origin Aircraft Source(Blip); origin NaturalNum Time(Blip); #Aircraft ~ NumAircraftDistrib(); State(a, t) if t = 0 then ~ InitState() else ~ StateTransition(State(a, Pred(t))); #Blip(Source = a, Time = t) ~ NumDetectionsDistrib(State(a, t)); #Blip(Time = t) ~ NumFalseAlarmsDistrib(); ApparentPos(r) if (Source(r) = null) then ~ FalseAlarmDistrib() else ~ ObsDistrib(State(Source(r), Time(r))); 2 Source Time a t Blips 2 Time t Blips

31 Declarative Semantics What is the set of possible worlds? What is the probability distribution over worlds?

32 What Exactly Are the Objects? Objects are tuples that encode generation history Aircraft: (Aircraft, 1), (Aircraft, 2), … Blip from (Aircraft, 2) at time 8 : (Blip, (Source, (Aircraft, 2)), (Time, 8), 1)

33 Basic Random Variables (RVs) For each number statement and tuple of generating objects, have RV for number of objects generated For each function symbol and tuple of arguments, have RV for function value Lemma: Full instantiation of these RVs uniquely identifies a possible world

34 Another Look at a BLOG Model … #Ball ~ Poisson[6](); TrueColor(b) ~ TabularCPD[[0.5, 0.5]](); BallDrawn(d) ~ UniformChoice({Ball b}); ObsColor(d) if !(BallDrawn(d) = null) then ~ NoisyCopy(TrueColor(BallDrawn(d))); Dependency and number statements define CPDs for basic RVs

35 Each BLOG model defines a contingent BN Theorem: Every BLOG model that satisfies certain conditions (analogous to BN acyclicity) fully defines a distribution Semantics: Contingent BN TrueColor(B1 ) TrueColor(B2 ) TrueColor(B3 ) … ObsColor(D1 ) BallDrawn(D1 ) #Ball BallDrawn(D1) = B1 BallDrawn(D1) = B2 BallDrawn(D1) = B3 [Milch et al., AI/Stats 2005]

36 Inference on BLOG Models Very easy to define models where exact inference is hopeless Sampling-based approximation algorithms: –Likelihood weighting –Markov chain Monte Carlo

37 Likelihood Weighting (LW) Sample non-evidence nodes top-down Weight each sample by probability of observed evidence values given their parents Provably converges to correct posterior Q Only need to sample ancestors of query and evidence nodes

38 Application to BLOG Given ObsColor variables, get posterior for #Ball Until we condition on BallDrawn(d), ObsColor(d) has infinitely many parents Solution: interleave sampling and relevance determination [Milch et al., AISTATS 2005] TrueColor(B1 ) TrueColor(B2 ) TrueColor(B3 ) … ObsColor(D1 ) BallDrawn(D1 ) #Ball ObsColor(D2 ) BallDrawn(D2 )

39 LW for Urn and Balls #Ball ~ Poisson(); TrueColor(b) ~ TabularCPD(); BallDrawn(d) ~ UniformChoice({Ball b}); ObsColor(d) if !(BallDrawn(d) = null) then ~ TabularCPD(TrueColor(BallDrawn(d))); Stack Instantiation Evidence: ObsColor(Draw1) = Blue; ObsColor(Draw2) = Green; Query: #Ball Weight: 1 #Ball #Ball = 7 ObsColor(Draw1) BallDrawn(Draw1) BallDrawn(Draw1) = (Ball, 3) TrueColor((Ball, 3)) TrueColor((Ball, 3)) = Blue x 0.8 ObsColor(Draw2) BallDrawn(Draw2) BallDrawn(Draw2) = (Ball, 3) x 0.2 ObsColor(Draw1) = Blue; ObsColor(Draw2) = Green;

40 Examples of Inference prior posterior Given 10 draws, all appearing blue 5 runs of 100,000 samples each

41 Examples of inference o prior –Ball colors: {Blue, Green, Red, Orange, Yellow, Purple, Black, White} –Given 10 draws, all appearing Blue –Runs of 100,000 samples each x approximate posterior Number of balls Probability [Courtesy of Josh Tenenbaum]

42 Examples of inference o prior –Ball colors: {Blue} –Given 10 draws, all appearing Blue –Runs of 100,000 samples each Number of balls Probability [Courtesy of Josh Tenenbaum] x approximate posterior

43 Examples of inference –Ball colors: {Blue, Green} –Given 3 draws: 2 appear Blue, 1 appears Green –Runs of 100,000 samples each Number of balls Probability o prior [Courtesy of Josh Tenenbaum] x approximate posterior

44 Examples of inference –Ball colors: {Blue, Green} –Given 30 draws: 20 appear Blue, 10 appear Green –Runs of 100,000 samples each Number of balls Probability o prior [Courtesy of Josh Tenenbaum] x approximate posterior

45 Drawback of likelihood weighting: as number of observations increases, –Sample weights become very small –A few high-weight samples tend to dominate More practical to use MCMC algorithms –Random walk over possible worlds –Find high-probability areas and stay there More Practical Inference E Q

46 Metropolis-Hastings MCMC Let s 1 be arbitrary state in E For n = 1 to N –Sample s  E from proposal distribution q(s | s n ) –Compute acceptance probability –With probability , let s n+1 = s; else let s n+1 = s n Stationary distribution is proportional to p(s) Fraction of visited states in Q converges to p(Q|E)

47 Toward General-Purpose Inference Without BLOG, each new application requires new code for: –Proposing moves –Representing MCMC states –Computing acceptance probabilities With BLOG: –User specifies model and proposal distribution –General-purpose code does the rest

48 General MCMC Engine Propose MCMC state s given s n Compute ratio q(s n | s) / q(s | s n ) Compute acceptance probability based on model Set s n+1 Define p(s) Custom proposal distribution (Java class) General-purpose engine (Java code) Model (in declarative language) 1. What are the MCMC states? 2. How does the engine handle arbitrary proposals efficiently? [Milch & Russell, UAI 2006]

49 Example: Citation Model guaranteed Citation Cit1, Cit2, Cit3, Cit4; #Res ~ NumResearchersPrior(); String Name(Res r) ~ NamePrior(); #Pub ~ NumPubsPrior(); Res Author(Pub p) ~ Uniform({Res r}); String Title(Pub p) ~ TitlePrior(); Pub PubCited(Citation c) ~ Uniform({Pub p}); String Text(Citation c) ~ FormatCPD (Title(PubCited(c)), Name(Author(PubCited(c))));

50 Proposer for Citations Split-merge moves: –Propose titles and author names for affected publications based on citation strings Other moves change total number of publications [Pasula et al., NIPS 2002]

51 MCMC States Not complete instantiations! –No titles, author names for uncited publications States are partial instantiations of random variables –Each state corresponds to an event: set of outcomes satisfying description #Pub = 100, PubCited(Cit1) = (Pub, 37), Title((Pub, 37)) = “Calculus”

52 MCMC over Events Markov chain over events , with stationary distrib. proportional to p(  ) Theorem: Fraction of visited events in Q converges to p(Q|E) if: –Each  is either subset of Q or disjoint from Q –Events form partition of E E Q

53 Computing Probabilities of Events Engine needs to compute p(  ) / p(  n ) efficiently (without summations) Use instantiations that include all active parents of the variables they instantiate Then probability is product of CPDs:

54 Computing Acceptance Probabilities Efficiently First part of acceptance probability is: If moves are local, most factors cancel Need to compute factors for X only if proposal changes X or one of

55 Identifying Factors to Compute Maintain list of changed variables To find children of changed variables, use context-specific BN Update context-specific BN as active dependencies change Title((Pub, 37)) Text(Cit1) PubCited(Cit1) Text(Cit2) PubCited(Cit2) Title((Pub, 37))Title((Pub, 14)) Text(Cit1) PubCited(Cit1) Text(Cit2) PubCited(Cit2) split

56 Results on Citation Matching Hand-coded version uses: –Domain-specific data structures to represent MCMC state –Proposer-specific code to compute acceptance probabilities BLOG engine takes 5x as long to run But it’s faster than hand-coded version was in 2003! (hand-coded version took 120 secs on old hardware and JVM) Face (349 cits) Reinforce (406 cits) Reasoning (514 cits) Constraint (295 cits) Hand-codedAcc:95.1%81.8%88.6%91.7% Time:14.3 s19.4 s19.0 s12.1 s BLOG engineAcc:95.6%78.0%88.7%90.7% Time:69.7 s99.0 s99.4 s59.9 s

57 Learning BLOG Models Much larger class of dependency structures –If-then-else conditions –CPD arguments, which can be: terms set expressions, maybe containing conditions And we’d like to go further: invent new –Random functions, e.g., Colleagues(x, y) –Types of objects, e.g., Conferences Search space becomes extremely large

58 Summary First-order probabilistic models combine: –Probabilistic treatment of uncertainty –First-order generalization across objects PRMs –Define BN for any given relational skeleton –Can learn structure by local search BLOG –Expresses uncertainty about relational skeleton –Inference by MCMC over partial world descriptions –Learning is open problem