Unknown Objects and BLOG Brian Milch MIT IPAM Summer School July 16, 2007.

Slides:



Advertisements
Similar presentations
Slide 1 of 18 Uncertainty Representation and Reasoning with MEBN/PR-OWL Kathryn Blackmond Laskey Paulo C. G. da Costa The Volgenau School of Information.
Advertisements

A Tutorial on Learning with Bayesian Networks
Exact Inference in Bayes Nets
Bayesian Estimation in MARK
Gibbs sampling in open-universe stochastic languages Nimar S. Arora Rodrigo de Salvo Braz Erik Sudderth Stuart Russell.
APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
1 Department of Computer Science and Engineering, University of South Carolina Issues for Discussion and Work Jan 2007  Choose meeting time.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
BLOG: Probabilistic Models with Unknown Objects Brian Milch CS /6/04 Joint work with Bhaskara Marthi, David Sontag, Daniel Ong, Andrey Kolobov, and.
Announcements Homework 8 is out Final Contest (Optional)
CSE 590ST Statistical Methods in Computer Science Instructor: Pedro Domingos.
CIS 410/510 Probabilistic Methods for Artificial Intelligence Instructor: Daniel Lowd.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.
Topics Combining probability and first-order logic –BLOG and DBLOG Learning very complex behaviors –ALisp: hierarchical RL with partial programs State.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Relational Probability Models Brian Milch MIT 9.66 November 27, 2007.
First-Order Probabilistic Languages: Into the Unknown Brian Milch and Stuart Russell University of California at Berkeley, USA August 27, 2006 Based on.
1 First-Order Probabilistic Models Brian Milch 9.66: Computational Cognitive Science December 7, 2006.
IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.
Representational and inferential foundations for possible large-scale information extraction and question-answering from the web Stuart Russell Computer.
Markov Logic And other SRL Approaches
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
4. Particle Filtering For DBLOG PF, regular BLOG inference in each particle Open-Universe State Estimation with DBLOG Rodrigo de Salvo Braz*, Erik Sudderth,
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
BLOG: Probabilistic Models with Unknown Objects Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov University of.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Representational and inferential foundations for possible large-scale information extraction and question-answering from the web Stuart Russell Computer.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
BLOG: Probabilistic Models with Unknown Objects Brian Milch Harvard CS 282 November 29,
Learning and Structural Uncertainty in Relational Probability Models Brian Milch MIT 9.66 November 29, 2007.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
Learning Bayesian Networks for Complex Relational Data
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Artificial Intelligence
CS 4/527: Artificial Intelligence
Uncertainty in an Unknown World
CAP 5636 – Advanced Artificial Intelligence
CSE 515 Statistical Methods in Computer Science
Open universes and nuclear weapons
Relational Probability Models
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CAP 5636 – Advanced Artificial Intelligence
Machine learning, probabilistic modelling
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
Gibbs sampling in open-universe stochastic languages
CS 188: Artificial Intelligence Fall 2008
Automatic Inference in PyBLOG
Markov Networks.
Statistical Relational AI
The open universe Stuart Russell
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Unknown Objects and BLOG Brian Milch MIT IPAM Summer School July 16, 2007

2 S. Russel and P. Norvig (1995). Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Prentice Hall. Example 1: Bibliographies Russell, Stuart and Norvig, Peter. Articial Intelligence. Prentice-Hall, Title: … Name: … PubCited AuthorOf

3 Example 2: Aircraft Tracking Detection Failure

4 Example 2: Aircraft Tracking False Detection Unobserved Object

5 Handling Unknown Objects Fundamental task: given observations, make inferences about initially unknown objects But most RPM languages assume set of objects is fixed and known (Herbrand models) Bayesian logic (BLOG) lifts this assumption [Milch et al., IJCAI See also MEBN: Laskey & da Costa, UAI 2005; Dynamical Grammars: Mjolsness & Yosiphon, AMAI to appear]

6 Outline BLOG models with unknown objects –Syntax –Semantics Inference with unknown objects –Likelihood weighting –MCMC Applications –Citation matching –Multi-object tracking Alternative approaches

7 Possible Worlds How can we define a distribution over such outcomes? (not showing attribute values)

8 Generative Process Imagine process that constructs worlds using two kinds of steps –Add some objects to the world –Set the value of a function on a tuple of arguments

9 BLOG Model for Citations #Paper ~ NumPapersPrior(); Title(p) ~ TitlePrior(); guaranteed Citation Cit1, Cit2, Cit3, Cit4, Cit5, Cit6, Cit7; PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar(Title(PubCited(c))); number statement familiar syntax for reference uncertainty part of skeleton: exhaustive list of distinct citations

10 Adding Authors #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper ~ NumPapersPrior(); FirstAuthor(p) ~ Uniform({Researcher r}); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

11 Generative Process for Aircraft Tracking SkyRadar Existence of radar blips depends on existence and locations of aircraft

12 BLOG Model for Aircraft Tracking … #Aircraft ~ NumAircraftDistrib(); State(a, t) if t = 0 then ~ InitState() else ~ StateTransition(State(a, Pred(t))); #Blip(Source = a, Time = t) ~ NumDetectionsDistrib(State(a, t)); #Blip(Time = t) ~ NumFalseAlarmsDistrib(); ApparentPos(r) if (Source(r) = null) then ~ FalseAlarmDistrib() else ~ ObsDistrib(State(Source(r), Time(r))); 2 Source Time a t Blips 2 Time t Blips

13 Declarative Semantics What is the set of possible worlds? What is the probability distribution over worlds?

14 What Exactly Are the Objects? Objects are tuples that encode generation history –Aircraft: (Aircraft, 1), (Aircraft, 2), … –Blips from (Aircraft, 2) at time 8 : (Blip, (Source, (Aircraft, 2)), (Time, 8), 1) (Blip, (Source, (Aircraft, 2)), (Time, 8), 2) … Point: If we specify how many blips were generated by (Aircraft, 2) at time 8, there ’ s no ambiguity about which blips were generated

15 Basic Random Variables (RVs) For each number statement and tuple of generating objects, have RV for number of objects generated For each function symbol and tuple of arguments, have RV for function value Lemma: Full instantiation of these RVs uniquely identifies a possible world

16 Each BLOG model defines contingent Bayesian network (CBN) over basic RVs –Edges active only under certain conditions Contingent Bayesian Network [Milch et al., AI/Stats 2005] Title((Pub, 1))Title((Pub, 2))Title((Pub, 3)) … Text(Cit1)PubCited(Cit1) #Pub PubCited(Cit1) = (Pub, 1) PubCited(Cit1) = (Pub, 2) PubCited(Cit1) = (Pub, 3) (Pub, 2) = infinitely many nodes!

17 Probability Distribution Through its CBN, BLOG model specifies: –Conditional distributions for basic RVs –Context-specific independence properties e.g., Text(Cit1) indep of Title((Pub, 1)) given PubCited(Cit1) = (Pub, 3) Theorem: Under certain conditions (analogous to BN acyclicity), every BLOG model defines unique distribution over possible worlds [Milch et al., IJCAI 2005]

18 Symbol Graphs and Unknown Objects Symbol graph now contains not only random functions, but random types Parents of a function or type node are: –Functions and types that appear on the right hand side of dependency or number statements for this function/type –The types of this function/type ’ s arguments or generating objects Paper Researcher Title Text Name PubCited

19 Outline BLOG models with unknown objects –Syntax –Semantics Inference with unknown objects –Likelihood weighting –MCMC Applications –Citation matching –Multi-object tracking Alternative approaches

20 Inference for BLOG Does infinite set of basic RVs prevent inference? No: Sampling algorithm only needs to instantiate finite set of relevant variables Algorithms: –Rejection sampling [Milch et al., IJCAI 2005] –Guided likelihood weighting [Milch et al., AI/Stats 2005] Theorem: For any well-formed BLOG model, these sampling algorithms converge to correct probability for any query, using finite time per sampling step

21 Approximate Inference by Likelihood Weighting Sample non-evidence nodes top-down Weight each sample by product of probabilities of evidence nodes given their parents Provably converges to correct posterior ?

22 Application to BLOG Only need to sample ancestors of query and evidence nodes But until we condition on PubCited(Cit1), Text(Cit1) has infinitely many parents Solution: interleave sampling and relevance determination Title((Pub, 1))Title((Pub, 2))Title((Pub, 3)) … Text(Cit1)PubCited(Cit1) #Pub PubCited(Cit1) = (Pub, 1) PubCited(Cit1) = (Pub, 2) PubCited(Cit1) = (Pub, 3)

23 Likelihood Weighting for (Simplified) Citation Matching #Paper ~ NumPapersPrior(); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar(Title(PubCited(c)); Stack Instantiation Evidence: Text(Cit1) = “foo”; Text(Cit2) = “foob”; Query: #Paper Weight: 1 #Paper #Paper = 7 Text(Cit1) PubCited(Cit1) PubCited(Cit1) = (Paper, 3) Title((Paper, 3)) Title((Paper, 3)) = “Foo” x 0.8 Text(Cit2) PubCited(Cit2) PubCited(Cit2) = (Paper, 3) x 0.2 Text(Cit1) = “foo” Text(Cit2) = “foob” More realistically: use MCMC

24 MCMC for Citations Split-merge moves: –Propose titles and author names for affected publications based on citation strings Other moves change total number of publications [Pasula et al., NIPS 2002]

25 MCMC States Not complete instantiations! –No titles, author names for uncited publications States are partial instantiations of random variables –Each state corresponds to an event: set of outcomes satisfying description #Pub = 100, PubCited(Cit1) = (Pub, 37), Title((Pub, 37)) = “Calculus”

26 MCMC over Events Markov chain over events , with stationary distrib. proportional to p(  ) Theorem: Fraction of visited events in Q converges to p(Q|E) if: –Each  is either subset of Q or disjoint from Q –Events form partition of E E Q [Milch & Russell, UAI 2006]

27 Computing Probabilities of Events Engine needs to compute p(  ) / p(  n ) efficiently (without summations) Use instantiations that include all active parents of the variables they instantiate Then probability is product of CPDs:

28 Outline BLOG models with unknown objects –Syntax –Semantics Inference with unknown objects –Likelihood weighting –MCMC Applications –Citation matching –Multi-object tracking Alternative approaches

29 Citation Matching Elaboration of generative model shown earlier Parameter estimation –Priors for names, titles, citation formats learned offline from labeled data –String corruption parameters learned with Monte Carlo EM Inference –MCMC with split-merge proposals –Guided by “ canopies ” of similar citations –Accuracy stabilizes after ~20 minutes [Pasula et al., NIPS 2002]

30 Citation Matching Results Four data sets of ~ citations, referring to ~ papers

31 Cross-Citation Disambiguation Wauchope, K. Eucalyptus: Integrating Natural Language Input with a Graphical User Interface. NRL Report NRL/FR/ (1994). Is "Eucalyptus" part of the title, or is the author named K. Eucalyptus Wauchope? Kenneth Wauchope (1994). Eucalyptus: Integrating natural language input with a graphical user interface. NRL Report NRL/FR/ , Naval Research Laboratory, Washington, DC, 39pp. Second citation makes it clear how to parse the first one

32 Preliminary Experiments: Information Extraction P(citation text | title, author names) modeled with simple HMM For each paper: recover title, author surnames and given names Fraction whose attributes are recovered perfectly in last MCMC state: –among papers with one citation: 36.1% –among papers with multiple citations: 62.6% Can use inferred knowledge for disambiguation

33 Multi-Object Tracking False Detection Unobserved Object

34 State Estimation for “Aircraft” #Aircraft ~ NumAircraftPrior(); State(a, t) if t = 0 then ~ InitState() else ~ StateTransition(State(a, Pred(t))); #Blip(Source = a, Time = t) ~ NumDetectionsCPD(State(a, t)); #Blip(Time = t) ~ NumFalseAlarmsPrior(); ApparentPos(r) if (Source(r) = null) then ~ FalseAlarmDistrib() else ~ ObsCPD(State(Source(r), Time(r)));

35 Aircraft Entering and Exiting #Aircraft(EntryTime = t) ~ NumAircraftPrior(); Exits(a, t) if InFlight(a, t) then ~ Bernoulli(0.1); InFlight(a, t) if t < EntryTime(a) then = false elseif t = EntryTime(a) then = true else = (InFlight(a, Pred(t)) & !Exits(a, Pred(t))); State(a, t) if t = EntryTime(a) then ~ InitState() elseif InFlight(a, t) then ~ StateTransition(State(a, Pred(t))); #Blip(Source = a, Time = t) if InFlight(a, t) then ~ NumDetectionsCPD(State(a, t)); …plus last two statements from previous slide

36 MCMC for Aircraft Tracking Uses generative model from previous slide (although not with BLOG syntax) Examples of Metropolis-Hastings proposals: [Oh et al., CDC 2004] [Figures by Songhwai Oh]

37 Aircraft Tracking Results [Oh et al., CDC 2004] [Figures by Songhwai Oh] MCMC has smallest error, hardly degrades at all as tracks get dense MCMC is nearly as fast as greedy algorithm; much faster than MHT Estimation Error Running Time

38 BLOG Software Bayesian Logic inference engine available:

39 Alternative Approaches Many alternative languages for representing RPMs Surveys: –ILP invited talk [Milch & Russell 2006] –Forthcoming book, Statistical Relational Learning, L. Getoor and B. Taskar, eds.

40 RPMs Based on Directed Graphical Models (besides BLOG) BUGS language [Gilks et al., 1994] –Relational parent specs: height[father[i]] –Gibbs sampling engine available Probabilistic relational models (PRMs) [Koller & Pfeffer 1998; Friedman, Getoor, Koller & Pfeffer 1999] –Several learning results –No software available Relational Bayesian networks (RBNs) [Jaeger 2001] –Software available: Primula Bayesian logic programs (BLPs) [Kersting & De Raedt 2001] Multi-entity Bayes nets (MEBN) [Laskey & da Costa 2005]

41 Stochastic Programming Languages Idea: Let random coin flips drive a deterministic program, get distribution over outputs IBAL: Stochastic functional (Lisp-like) language [Pfeffer 2001] Stochastic Prolog: –Probabilistic Horn abduction [Poole 1993] –PRISM [Sato & Kameya 1997]

42 RPMs Based on Undirected Graphical Models Relational Markov networks [Taskar et al. 2002] Markov logic networks (MLNs) [Richardson & Domingos 2006] Benefits compared to directed models –Don ’ t have to worry about acyclicity –Specify weights on features, not full CPDs Drawbacks –Feature weights harder to interpret than CPDs –Parameters must be estimated jointly

43 Summary Modeling unknown objects is essential BLOG models define probability distributions over possible worlds with –Varying sets of objects –Varying mappings from observations to objects MCMC can provide effective inference for models of this kind

44 References Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., and Kolobov, A. (2005) “ BLOG: Probabilistic Models with Unknown Objects ”. In Proc. 19th Int ’ l Joint Conf. on AI, pages Milch, B., Marthi, B., Sontag, D., Russell, S., Ong, D. L., and Kolobov, A. (2005) “ Approximate inference for infinite contingent Bayesian networks ”. In Proc. 10 th Int ’ l Workshop on AI and Statistics. Milch, B. and Russell, S. (2006) “ General-purpose MCMC inference over relational structures ”. In Proc. 22 nd Conf. on Uncertainty in AI, pages Pasula, H., Marthi, B., Milch, B., Russell, S., and Shpitser, I. (2003) “ Identity uncertainty and citation matching ”. In Advances in Neural Information Processing Systems 15, MIT Press, pages Lawrence, S., Giles, C. L., and Bollacker, K. D. (1999) “ Autonomous citation matching ”. In Proc. 3 rd Int ’ l Conf. on Autonomous Agents, pages Wellner, B., McCallum, A., Feng, P., and Hay, M. (2004) “ An integrated, conditional model of information extraction and coreference with application to citation matching ”. In Proc. 20 th Conf. on Uncertainty in AI, pages

45 References Pasula, H., Russell, S. J., Ostland, M., and Ritov, Y. (1999) “ Tracking many objects with many sensors ”. In Proc. 16 th Int ’ l Joint Conf. on AI, pages Oh, S., Russell, S. and Sastry, S. (2004) “ Markov chain Monte Carlo data association for general multi-target tracking problems ”. In Proc. 43 rd IEEE Conf. on Decision and Control, pages Koller, D. and Pfeffer, A. (1998) “ Probabilistic frame-based systems ”. In Proc. 15 th AAAI National Conf. on AI, pages Friedman, N., Getoor, L., Koller, D.,and Pfeffer, A. (1999) “ Learning probabilistic relational models ”. In Proc. 16 th Int ’ l Joint Conf. on AI, pages Gilks, W. R., Thomas, A. and Spiegelhalter, D. J. (1994) “ A language and program for complex Bayesian modelling ”. The Statistician 43(1): Jaeger, M. (2001) “ Complex probabilistic modeling with recursive relational Bayesian networks ”. Annals of Mathematics and AI 32: Kersting, K. and De Raedt, L. (2001) “ Adaptive Bayesian logic programs ”. In Proc. 11 th Int ’ l Conf. on Inductive Logic Programming.

46 References Laskey, K. B. and da Costa, P. C. G. (2005) “ Of Starships and Klingons: First- order Bayesian logic for the 23 rd century ” In Proc. 21 st Conf. on Uncertainty in AI, pages Mjolsness, E. and Yosiphon, G. (to appear) “ Stochastic process semantics for dynamical grammars ”. Annals of Mathematics and AI. Pfeffer, A. (2001) “ IBAL: A probabilistic rational programming language ”. In Proc. 17 th Int ’ l Joint Conf. on AI, pages Poole, D. (1993) “ Probabilistic Horn abduction and Bayesian networks ”. Artificial Intelligence 64(1): Sato, T. and Kameya, Y. (1997) “ PRISM: A symbol-statistical modeling language ”. In Proc. 15 th Int ’ l Joint Conf. on AI, pages Taskar, B., Abbeel, P., and Koller, D. (2002) “ Discriminative probabilistic models for relational data ”. In Proc. 18 th Conf. on Uncertainty in AI, pages Richardson, M. and Domingos, P. (2006) “ Markov logic networks ”. Machine Learning 62: