Learning and Structural Uncertainty in Relational Probability Models Brian Milch MIT 9.66 November 29, 2007.

Slides:

Advertisements

Similar presentations

A Tutorial on Learning with Bayesian Networks

Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Exact Inference in Bayes Nets

Gibbs sampling in open-universe stochastic languages Nimar S. Arora Rodrigo de Salvo Braz Erik Sudderth Stuart Russell.

Introduction of Probabilistic Reasoning and Bayesian Networks

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.

1 Department of Computer Science and Engineering, University of South Carolina Issues for Discussion and Work Jan 2007  Choose meeting time.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.

BLOG: Probabilistic Models with Unknown Objects Brian Milch CS /6/04 Joint work with Bhaskara Marthi, David Sontag, Daniel Ong, Andrey Kolobov, and.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

CSE 590ST Statistical Methods in Computer Science Instructor: Pedro Domingos.

CIS 410/510 Probabilistic Methods for Artificial Intelligence Instructor: Daniel Lowd.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.

CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.

Topics Combining probability and first-order logic –BLOG and DBLOG Learning very complex behaviors –ALisp: hierarchical RL with partial programs State.

Bayes Factor Based on Han and Carlin (2001, JASA).

Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.

Relational Probability Models Brian Milch MIT 9.66 November 27, 2007.

First-Order Probabilistic Languages: Into the Unknown Brian Milch and Stuart Russell University of California at Berkeley, USA August 27, 2006 Based on.

A Logical Formulation of PRMs. Example Institute(InstId,Type) Researcher(RID,Area,Salary,InstID) Paper(PaperId,Topic) Author(RId,PaperID) Cites(PaperId1,PaperId2)

1 First-Order Probabilistic Models Brian Milch 9.66: Computational Cognitive Science December 7, 2006.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.

Representational and inferential foundations for possible large-scale information extraction and question-answering from the web Stuart Russell Computer.

Markov Logic And other SRL Approaches

Unknown Objects and BLOG Brian Milch MIT IPAM Summer School July 16, 2007.

1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.

Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

4. Particle Filtering For DBLOG PF, regular BLOG inference in each particle Open-Universe State Estimation with DBLOG Rodrigo de Salvo Braz*, Erik Sudderth,

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

BLOG: Probabilistic Models with Unknown Objects Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov University of.

CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.

Bayesian networks and their application in circuit reliability estimation Erin Taylor.

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.

Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Includes work done by: Nir Friedman, Hebrew U. Daphne.

Representational and inferential foundations for possible large-scale information extraction and question-answering from the web Stuart Russell Computer.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

BLOG: Probabilistic Models with Unknown Objects Brian Milch Harvard CS 282 November 29,

CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.

Probabilistic Reasoning Inference and Relational Bayesian Networks.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Uncertainty in an Unknown World

CSE 515 Statistical Methods in Computer Science

Relational Probability Models

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Machine learning, probabilistic modelling

CS 188: Artificial Intelligence

Class #19 – Tuesday, November 3

Gibbs sampling in open-universe stochastic languages

Statistical Relational AI

The open universe Stuart Russell

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Presentation transcript:

Learning and Structural Uncertainty in Relational Probability Models Brian Milch MIT 9.66 November 29, 2007

Outline Learning relational probability models Structural uncertainty – Uncertainty about relations – Uncertainty about object existence and identity Applications of BLOG

3 Review: Relational Probability Models Abstract probabilistic model for attributes Relational skeleton: objects & relations Graphical model

4 Reivew: Dependency Statements Specialty(r) ~TabularCPD[[0.5, 0.3, 0.2]]; Topic(p) ~ TabularCPD[[0.90, 0.01, 0.09], [0.02, 0.85, 0.13], [0.10, 0.10, 0.80]] (Specialty(FirstAuthor(p))); WordAt(wp) ~ TabularCPD[[0.03,..., 0.02, 0.001,...], [0.03,..., 0.001, 0.02,...], [0.03,..., 0.003, 0.003,...]] (Topic(Doc(wp))); BNsRLTheory BNsRLTheory | BNs | RL | Theory the Bayesian reinforcement | BNs | RL | Theory

5 Learning Assume types, functions are given Straightforward task: given structure, learn parameters – Just like in BNs, but parameters are shared across variables for same function, e.g., Topic(Smith98a), Topic(Jones00), etc. Harder: learn abstract dependency structure

6 Structure Learning for BNs Find BN structure M that maximizes Greedy local search over structures – Operators: add, delete, reverse edges – Exclude cyclic structures

7 Logical Structure Learning In RPM, want logical specification of each node ’ s parent set Deterministic analogue: inductive logic programming (ILP) [Dzeroski & Lavrac 2001; Flach and Lavrac 2002] Classic work on RPMs by Friedman, Getoor, Koller & Pfeffer [1999] – We ’ ll call their models FGKP models (they call them “ probabilistic relational models ” (PRMs))

8 FGKP Models Each dependency statement has form: where s 1,...,s k are slot chains Slot chains – Basically logical terms: Specialty(FirstAuthor(p)) – But can also treat predicates as “ multi-valued functions ” : Specialty(AuthorOf(p)) Func(x) ~ TabularCPD[...](s 1,..., s k ) Smith&Jones01 Smith Jones BNs RL AuthorOf Specialty aggregate

9 Structure Learning for FGKP Models Greedy search again – But add or remove whole slot chains – Start with chains of length 1, then 2, etc. – Check for acyclicity using symbol graph

Outline Learning relational probability models Structural uncertainty – Uncertainty about relations – Uncertainty about object existence and identity Applications of BLOG

11 Relational Uncertainty: Example Questions: Who will review my paper, and what will its average review score be? Reviews AuthorOf Reviews Specialty: RL Generosity: 2.9 Specialty: Prob. Models Generosity: 2.2 Topic: RL AvgScore: ? Topic: RL AvgScore: ? Topic: Prob Models AvgScore: ? Reviews Specialty: Theory Generosity: 1.8

12 Simplest Approach to Relational Uncertainty Add predicate Reviews(r, p) Can model this with existing syntax: Potential drawback: – Reviews(r, p) nodes are independent given specialties and topics – Expected number of reviews per paper grows with number of researchers in skeleton [Getoor et al., JMLR 2002] Reviews(r, p) ~ ReviewCPD(Specialty(r), Topic(p));

13 Another Approach: Reference Uncertainty Say each paper gets k reviews – Can add Review objects to skeleton – For each paper p, include k review objects rev with PaperReviewed(rev) = p Uncertain about values of function Reviewer(rev) [Getoor et al., JMLR 2002] PaperReviewed ? ? ? Reviewer

14 Models for Reviewer(rev) Explicit distribution over researchers? – No: won ’ t generalize across skeletons Selection models: – Uniform sampling from researchers with certain attribute values [Getoor et al., JMLR 2002] – Weighted sampling, with weights determined by attributes [Pasula et al., IJCAI 2001]

15 Choosing Reviewer Based on Specialty ReviewerSpecialty(rev) ~ SpecSelectionCPD (Topic(PaperReviewed(rev))); Reviewer(rev) ~ Uniform({Researcher r : Specialty(r) = ReviewerSpecialty(rev)});

16 Context-Specific Dependencies RevScore(rev) ~ ScoreCPD(Generosity(Reviewer(rev))); AvgScore(p) = Mean({RevScore(rev) for Review rev : PaperReviewed(Rev) = p}); random object Consequence of relational uncertainty: dependencies become context-specific – RevScore(Rev1) depends on Generosity(R1) only when Reviewer(Rev1) = R1

17 Semantics: Ground BN Can still define ground BN Parents of node X are all basic RVs whose values are potentially relevant in evaluating the right hand side of X ’ s dependency statement Example: for RevScore(Rev1) … – Reviewer(Rev1) is always relevant – Generosity(R) might be relevant for any researcher R RevScore(rev) ~ ScoreCPD(Generosity(Reviewer(rev)));

18 Ground BN Topic(P1) Specialty(R1) Specialty(R2) Specialty(R3) Generosity(R1) Generosity(R2) Generosity(R3) RevScore(Rev2) RevScore(Rev1) Reviewer(Rev2) Reviewer(Rev1) RevSpecialty(Rev2)RevSpecialty(Rev1)

19 Inference Can still use ground BN, but it ’ s often very highly connected Alternative: MCMC over possible worlds [Pasula & Russell, IJCAI 2001] – In each world, only certain dependencies are active

20 Metropolis-Hastings MCMC Metropolis-Hastings process: in world , – sample new world  from proposal distribution q(  |  ) – accept proposal with probability otherwise remain in  Stationary distribution is p(  )

21 Computing Acceptance Ratio Efficiently World probability is where pa  (X) is inst. of X ’ s active parents in  If proposal changes only X, then all factors not containing X cancel in p(  ) and p(  ) [Pasula et al., IJCAI 2001] Result: Time to compute acceptance ratio often doesn ’ t depend on number of objects

22 Learning Models for Relations Binary predicate approach: – Use existing search over slot chains Selecting based on attributes – Search over sets of attributes to look at – Search over parent slot chains for choosing attribute values Reviews(r, p) ~ ReviewCPD(Specialty(r), Topic(p)); ReviewerSpecialty(rev) ~ SpecSelectionCPD (Topic(PaperReviewed(rev))); Reviewer(rev) ~ Uniform({Researcher r : Specialty(r) = ReviewerSpecialty(rev)}); [Getoor et al., JMLR 2002]

Outline Learning relational probability models Structural uncertainty – Uncertainty about relations – Uncertainty about object existence and identity Applications of BLOG

24 S. Russel and P. Norvig (1995). Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Prentice Hall. Example 1: Bibliographies Russell, Stuart and Norvig, Peter. Articial Intelligence. Prentice-Hall, Title: … Name: … PubCited AuthorOf

25 Example 2: Aircraft Tracking Detection Failure

26 Example 2: Aircraft Tracking False Detection Unobserved Object

27 Handling Unknown Objects Fundamental task: given observations, make inferences about initially unknown objects But most RPM languages assume set of objects is fixed and known (Herbrand models) Bayesian logic (BLOG) lifts this assumption [Milch et al., IJCAI See also MEBN: Laskey & da Costa, UAI 2005; Dynamical Grammars: Mjolsness & Yosiphon, AMAI to appear]

28 Possible Worlds How can we define a distribution over such outcomes? (not showing attribute values)

29 Generative Process Imagine process that constructs worlds using two kinds of steps – Add some objects to the world – Set the value of a function on a tuple of arguments

30 BLOG Model for Citations #Paper ~ NumPapersPrior(); Title(p) ~ TitlePrior(); guaranteed Citation Cit1, Cit2, Cit3, Cit4, Cit5, Cit6, Cit7; PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar(Title(PubCited(c))); number statement familiar syntax for reference uncertainty part of skeleton: exhaustive list of distinct citations

31 Adding Authors #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper ~ NumPapersPrior(); FirstAuthor(p) ~ Uniform({Researcher r}); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

32 Generative Process for Aircraft Tracking SkyRadar Existence of radar blips depends on existence and locations of aircraft

33 BLOG Model for Aircraft Tracking … #Aircraft ~ NumAircraftDistrib(); State(a, t) if t = 0 then ~ InitState() else ~ StateTransition(State(a, Pred(t))); #Blip(Source = a, Time = t) ~ NumDetectionsDistrib(State(a, t)); #Blip(Time = t) ~ NumFalseAlarmsDistrib(); ApparentPos(r) if (Source(r) = null) then ~ FalseAlarmDistrib() else ~ ObsDistrib(State(Source(r), Time(r))); 2 Source Time a t Blips 2 Time t Blips

34 Basic Random Variables (RVs) For each number statement and tuple of generating objects, have RV for number of objects generated For each function symbol and tuple of arguments, have RV for function value Lemma: Full instantiation of these RVs uniquely identifies a possible world

35 Each BLOG model defines contingent Bayesian network (CBN) over basic RVs – Edges active only under certain conditions Contingent Bayesian Network [Milch et al., AI/Stats 2005] Title((Pub, 1))Title((Pub, 2))Title((Pub, 3)) … Text(Cit1)PubCited(Cit1) #Pub PubCited(Cit1) = (Pub, 1) PubCited(Cit1) = (Pub, 2) PubCited(Cit1) = (Pub, 3) (Pub, 2) = infinitely many nodes!

36 Probability Distribution Through its CBN, BLOG model specifies: – Conditional distributions for basic RVs – Context-specific independence properties e.g., Text(Cit1) indep of Title((Pub, 1)) given PubCited(Cit1) = (Pub, 3) Theorem: Under certain “context-specific ordering” conditions, every BLOG model fully defines a distribution over possible worlds [Milch et al., IJCAI 2005]

37 Inference with Unknown Objects Does infinite set of basic RVs prevent inference? No: Sampling algorithms only need to instantiate finite set of relevant variables Generic algorithms: – Rejection sampling [Milch et al., IJCAI 2005] – Guided likelihood weighting [Milch et al., AI/Stats 2005] More practical: MCMC over partial worlds

Toward General-Purpose MCMC with Unknown Objects Successful applications of MCMC with domain-specific proposal distributions: – Citation matching [Pasula et al., 2003] – Multi-target tracking [Oh et al., 2004] But each application requires new code for: – Proposing moves – Representing MCMC states – Computing acceptance probabilities Goal: – User specifies model and proposal distribution – General-purpose code does the rest

Proposer for Citations Split-merge moves: – Propose titles and author names for affected publications based on citation strings Other moves change total number of publications [Pasula et al., NIPS 2002]

MCMC States Not complete instantiations! – No titles, author names for uncited publications States are partial instantiations of random variables – Each state corresponds to an event: set of outcomes satisfying description #Pub = 100, PubCited(Cit1) = (Pub, 37), Title((Pub, 37)) = “Calculus”

MCMC over Events Markov chain over events , with stationary distrib. proportional to p(  ) Theorem: Fraction of visited events in Q converges to p(Q|E) if: – Each  is either subset of Q or disjoint from Q – Events form partition of E E Q

Computing Probabilities of Events Engine needs to compute P(  ) / P(  n ) efficiently (without summations) Use instantiations that include all active parents of the variables they instantiate Then probability is product of CPDs:

States That Are Even More Abstract Typical partial instantiation: – Specifies particular publications, even though publications are interchangeable Let states be abstract partial instantiations: There are conditions under which we can compute probabilities of such events #Pub = 100, PubCited(Cit1) = (Pub, 37), Title((Pub, 37)) = “Calculus”, PubCited(Cit2) = (Pub, 14), Title((Pub, 14)) = “Psych”  x  y  x [#Pub = 100, PubCited(Cit1) = x, Title(x) = “Calculus”, PubCited(Cit2) = y, Title(y) = “Psych”]

Outline Learning relational probability models Structural uncertainty – Uncertainty about relations – Uncertainty about object existence and identity Applications of BLOG

45 Citation Matching Elaboration of generative model shown earlier Parameter estimation – Priors for names, titles, citation formats learned offline from labeled data – String corruption parameters learned with Monte Carlo EM Inference – MCMC with split-merge proposals – Guided by “ canopies ” of similar citations – Accuracy stabilizes after ~20 minutes [Pasula et al., NIPS 2002]

46 Citation Matching Results Four data sets of ~ citations, referring to ~ papers

47 Cross-Citation Disambiguation Wauchope, K. Eucalyptus: Integrating Natural Language Input with a Graphical User Interface. NRL Report NRL/FR/ (1994). Is "Eucalyptus" part of the title, or is the author named K. Eucalyptus Wauchope? Kenneth Wauchope (1994). Eucalyptus: Integrating natural language input with a graphical user interface. NRL Report NRL/FR/ , Naval Research Laboratory, Washington, DC, 39pp. Second citation makes it clear how to parse the first one

48 Preliminary Experiments: Information Extraction P(citation text | title, author names) modeled with simple HMM For each paper: recover title, author surnames and given names Fraction whose attributes are recovered perfectly in last MCMC state: – among papers with one citation: 36.1% – among papers with multiple citations: 62.6% Can use inferred knowledge for disambiguation

49 Multi-Object Tracking False Detection Unobserved Object

50 State Estimation for “Aircraft” #Aircraft ~ NumAircraftPrior(); State(a, t) if t = 0 then ~ InitState() else ~ StateTransition(State(a, Pred(t))); #Blip(Source = a, Time = t) ~ NumDetectionsCPD(State(a, t)); #Blip(Time = t) ~ NumFalseAlarmsPrior(); ApparentPos(r) if (Source(r) = null) then ~ FalseAlarmDistrib() else ~ ObsCPD(State(Source(r), Time(r)));

51 Aircraft Entering and Exiting #Aircraft(EntryTime = t) ~ NumAircraftPrior(); Exits(a, t) if InFlight(a, t) then ~ Bernoulli(0.1); InFlight(a, t) if t < EntryTime(a) then = false elseif t = EntryTime(a) then = true else = (InFlight(a, Pred(t)) & !Exits(a, Pred(t))); State(a, t) if t = EntryTime(a) then ~ InitState() elseif InFlight(a, t) then ~ StateTransition(State(a, Pred(t))); #Blip(Source = a, Time = t) if InFlight(a, t) then ~ NumDetectionsCPD(State(a, t)); …plus last two statements from previous slide

52 MCMC for Aircraft Tracking Uses generative model from previous slide (although not with BLOG syntax) Examples of Metropolis-Hastings proposals: [Oh et al., CDC 2004] [Figures by Songhwai Oh]

53 Aircraft Tracking Results [Oh et al., CDC 2004] [Figures by Songhwai Oh] MCMC has smallest error, hardly degrades at all as tracks get dense MCMC is nearly as fast as greedy algorithm; much faster than MHT Estimation Error Running Time

54 BLOG Software Bayesian Logic inference engine available:

55 References Friedman, N., Getoor, L., Koller, D.,and Pfeffer, A. (1999) “ Learning probabilistic relational models ”. In Proc. 16 th Int ’ l Joint Conf. on AI, pages Taskar, B., Segal, E., and Koller, D. (2001) “ Probabilistic classification and clustering in relational data ”. In Proc. 17 th Int ’ l Joint Conf. on AI, pages Getoor, L., Friedman, N., Koller, D., and Taskar, B. (2002) “ Learning probabilistic models of link structure ”. J. Machine Learning Res. 3: Taskar, B., Abbeel, P., and Koller, D. (2002) “ Discriminative probabilistic models for relational data ”. In Proc. 18 th Conf. on Uncertainty in AI, pages Dzeroski, S. and Lavrac, N., eds. (2001) Relational Data Mining. Springer. Flach, P. and Lavrac, N. (2002) “ Learning in Clausal Logic: A Perspective on Inductive Logic Programming ”. In Computational Logic: Logic Programming and Beyond (Essays in Honour of Robert A. Kowalski), Springer Lecture Notes in AI volume 2407, pages Pasula, H. and Russell, S. (2001) “ Approximate inference for first-order probabilistic languages ”. In Proc. 17th Int ’ l Joint Conf. on AI, pages Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., and Kolobov, A. (2005) “ BLOG: Probabilistic Models with Unknown Objects ”. In Proc. 19th Int ’ l Joint Conf. on AI, pages

56 References Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., and Kolobov, A. (2005) “ BLOG: Probabilistic Models with Unknown Objects ”. In Proc. 19th Int ’ l Joint Conf. on AI, pages Milch, B., Marthi, B., Sontag, D., Russell, S., Ong, D. L., and Kolobov, A. (2005) “ Approximate inference for infinite contingent Bayesian networks ”. In Proc. 10 th Int ’ l Workshop on AI and Statistics. Milch, B. and Russell, S. (2006) “ General-purpose MCMC inference over relational structures ”. In Proc. 22 nd Conf. on Uncertainty in AI, pages Pasula, H., Marthi, B., Milch, B., Russell, S., and Shpitser, I. (2003) “ Identity uncertainty and citation matching ”. In Advances in Neural Information Processing Systems 15, MIT Press, pages Lawrence, S., Giles, C. L., and Bollacker, K. D. (1999) “ Autonomous citation matching ”. In Proc. 3 rd Int ’ l Conf. on Autonomous Agents, pages Wellner, B., McCallum, A., Feng, P., and Hay, M. (2004) “ An integrated, conditional model of information extraction and coreference with application to citation matching ”. In Proc. 20 th Conf. on Uncertainty in AI, pages

57 References Pasula, H., Russell, S. J., Ostland, M., and Ritov, Y. (1999) “ Tracking many objects with many sensors ”. In Proc. 16 th Int ’ l Joint Conf. on AI, pages Oh, S., Russell, S. and Sastry, S. (2004) “ Markov chain Monte Carlo data association for general multi-target tracking problems ”. In Proc. 43 rd IEEE Conf. on Decision and Control, pages