Relational Probability Models

Slides:

Advertisements

Similar presentations

Knowledge Representation and Reasoning University "Politehnica" of Bucharest Department of Computer Science Fall 2010 Adina Magda Florea

Advertisements

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

Exact Inference in Bayes Nets

Selectivity Estimation using Probabilistic Models Author: Lise Getoor, Ben Taskar, Daphne Koller Presenter: Qidan Cheng.

Markov Chains 1.

Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.

CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

History-Dependent Graphical Multiagent Models Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan, USA.

Relational Probability Models Brian Milch MIT 9.66 November 27, 2007.

First-Order Probabilistic Languages: Into the Unknown Brian Milch and Stuart Russell University of California at Berkeley, USA August 27, 2006 Based on.

1 First-Order Probabilistic Models Brian Milch 9.66: Computational Cognitive Science December 7, 2006.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.

Markov Logic And other SRL Approaches

Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.

1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.

Unknown Objects and BLOG Brian Milch MIT IPAM Summer School July 16, 2007.

Lifted First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International joint work with Eyal Amir and Dan Roth.

Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)

UIUC CS 497: Section EA Lecture #8 Reasoning in Artificial Intelligence Professor: Eyal Amir Spring Semester 2004 (Based on slides by Lise Getoor (UMD))

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

1 Lifted First-Order Probabilistic Inference Rodrigo de Salvo Braz University of Illinois at Urbana-Champaign with Eyal Amir and Dan Roth.

Slides for “Data Mining” by I. H. Witten and E. Frank.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

BLOG: Probabilistic Models with Unknown Objects Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov University of.

CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.

Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

BLOG: Probabilistic Models with Unknown Objects Brian Milch Harvard CS 282 November 29,

Learning and Structural Uncertainty in Relational Probability Models Brian Milch MIT 9.66 November 29, 2007.

Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.

Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.

Probabilistic Reasoning Inference and Relational Bayesian Networks.

A Brief Introduction to Bayesian networks

Reasoning Under Uncertainty: Belief Networks

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Read R&N Ch Next lecture: Read R&N

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30

A Quick Romp Through Probabilistic Relational Models

CS 4/527: Artificial Intelligence

CAP 5636 – Advanced Artificial Intelligence

Read R&N Ch Next lecture: Read R&N

CSCI 5822 Probabilistic Models of Human and Machine Learning

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CS 188: Artificial Intelligence

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11

Class #19 – Tuesday, November 3

Expectation-Maximization & Belief Propagation

Class #16 – Tuesday, October 26

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Discriminative Probabilistic Models for Relational Data

Read R&N Ch Next lecture: Read R&N

Statistical Relational AI

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Presentation transcript:

Relational Probability Models Brian Milch MIT IPAM Summer School July 16, 2007

Objects, Attributes, Relations Specialty: RL Specialty: BNs AuthorOf Reviews AuthorOf AuthorOf Topic: RL Topic: RL Topic: BNs Topic: Theory AuthorOf Topic: Theory AuthorOf Reviews AuthorOf Specialty: Theory Specialty: Theory

Specific Scenario … … Prof. Smith Smith98a Smith00 word 1 word 1 AuthorOf AuthorOf Smith98a Smith00 InDoc InDoc word 1 word 1 Bayesian word 2 word 2 networks word 3 word 3 have word 4 word 4 become word 5 word 5 a … …

Bayesian Network for This Scenario Smith Specialty BNs Smith98a Topic Smith00 Topic BNs BNs Smith98a word 1 Smith00 word 1 Bayesian Smith98a word 2 Smith00 word 2 networks Smith98a word 3 Smith00 word 3 have Smith98a word 4 Smith00 word 4 become … … BN is specific to Smith98a and Smith00

Humans have abstract knowledge that can be applied to any individuals How can such knowledge be: Represented? Learned? Used in reasoning?

Relational probability models (RPMs) Outline Relational probability models (RPMs) Representation Inference Learning Relational uncertainty: extending RPMs with probabilistic models for relations Abstract probabilistic model for attributes relational skeleton: objects & relations graphical model  +

All depend on relational skeleton Representation Have to represent Set of variables Dependencies Conditional probability distributions (CPDs) Many proposed languages We’ll use Bayesian logic (BLOG) [Milch et al. 2005] All depend on relational skeleton

Typed First-Order Logic Objects divided into types Researcher, Paper, WordPos, Word, Topic Express attributes and relations with functions and predicates FirstAuthor(paper)  Researcher (non-random) Specialty(researcher)  Topic (random) Topic(paper)  Topic (random) Doc(wordpos)  Paper (non-random) WordAt(wordpos)  Word (random)

Set of Random Variables For random functions, have variable for each tuple of argument objects Researcher: Smith, Jones Paper: Smith98a, Smith00, Jones00 WordPos: Smith98a_1, …, Smith98a_3212, Smith00_1, etc. Specialty: Specialty(Smith) Specialty(Jones) Topic: Topic(Smith98a) Topic(Smith00) Topic(Jones00) … WordAt: WordAt(Smith98a_1) WordAt(Smith98a_3212) … WordAt(Smith00_1) WordAt(Smith00_2774) … WordAt(Jones00_1) WordAt(Jones00_4893)

Dependency Statements BNs RL Theory Specialty(r) ~ TabularCPD[[0.5, 0.3, 0.2]]; BNs RL Theory | BNs | RL | Theory Topic(p) ~ TabularCPD[[0.90, 0.01, 0.09], [0.02, 0.85, 0.13], [0.10, 0.10, 0.80]] (Specialty(FirstAuthor(p))); Logical term identifying parent node the Bayesian reinforcement | BNs | RL | Theory WordAt(wp) ~ TabularCPD[[0.03,..., 0.02, 0.001,...], [0.03,..., 0.001, 0.02,...], [0.03,..., 0.003, 0.003,...]] (Topic(Doc(wp)));

Conditional Dependencies Predicting the length of a paper Conference paper: generally equals conference page limit Otherwise: depends on verbosity of author Model this with conditional dependency statement First-order formula as condition Length(p) if ConfPaper(p) then ~ PageLimitPrior() else ~ LengthCPD(Verbosity(FirstAuthor(p)));

Variable Numbers of Parents What if we allow multiple authors? Let skeleton specify predicate AuthorOf(r, p) Topic(p) now depends on specialties of multiple authors Number of parents depends on skeleton

Aggregate distributions Aggregation Aggregate distributions Aggregate values multiset defined by formula Topic(p) ~ TopicAggCPD({Specialty(r) for Researcher r : AuthorOf(r, p)}); mixture of distributions conditioned on individual elements of multiset [Taskar et al., IJCAI 2001] aggregation function Topic(p) ~ TopicCPD(Mode({Specialty(r) for Researcher r : AuthorOf(r, p)}));

Semantics: Ground BN Skeleton Ground BN R1 R2 P3 P1 P2 Spec(R1) FirstAuthor FirstAuthor FirstAuthor P3 P1 P2 3212 words 4893 words 2774 words Spec(R1) Spec(R2) Ground BN Topic(P3) Topic(P1) Topic(P2) … … … W(P1_1) W(P1_3212) W(P2_1) W(P2_2774) W(P3_1) W(P3_4893)

When Is Ground BN Acyclic? [Koller & Pfeffer, AAAI 1998] Look at symbol graph Node for each random function Read off edges from dependency statements Theorem: If symbol graph is acyclic, then ground BN is acyclic for every skeleton Specialty Topic WordAt

Acyclic Relations [Friedman et al., IJCAI 1999] Suppose researcher’s specialty depends on his/her advisor’s specialty Symbol graph has self-loop! Require certain nonrandom functions to be acyclic: F(x) < x under some partial order Label edges with “<“ and “=“ signs; get stronger acyclicity theorem Specialty(r) if Advisor(r) != null then ~ SpecCPD(Specialty(Advisor(r))) else ~ SpecialtyPrior(); Specialty < Topic WordAt

Inference: Knowledge-Based Model Construction (KBMC) Construct relevant portion of ground BN R2 R1 Skeleton: P3 P2 P1 Constructed BN: Spec(R1) Spec(R2) Topic(P3) Topic(P1) ? Topic(P2) … … W(P1_1) W(P1_3212) W(P3_1) W(P3_4893) [Breese 1992; Ngo & Haddawy 1997]

Inference on Constructed Network Run standard BN inference algorithm Exact: variable elimination/junction tree Approx: Gibbs sampling, loopy belief propagation Exploit some repeated structure with lifted inference [Pfeffer et al., UAI 1999; Poole, IJCAI 2003; de Salvo Braz et al., IJCAI 2005]

Lifted Inference Suppose: With n researchers, part of ground BN is: … Could sum out ThesisTopic(R) nodes one by one But parameter sharing implies: Summing same potential every time Obtain same potential over Specialty(R) for each R Can just do summation once, eliminate whole family of RVs, store “lifted” potential on Specialty(r) Specialty(r) ~ SpecCPD(ThesisTopic(r)); … ThesisTopic(R1) ThesisTopic(Rn) … Specialty(R1) Specialty(Rn) [Pfeffer et al., UAI 1999; Poole, IJCAI 2003; Braz et al., IJCAI 2005]

Assume types, functions are given Learning Assume types, functions are given Straightforward task: given structure, learn parameters Just like in BNs, but parameters are shared across variables for same function, e.g., Topic(Smith98a), Topic(Jones00), etc. Harder task: learn dependency structure

Structure Learning for BNs Find BN structure M that maximizes Greedy local search over structures Operators: add, delete, reverse edges Exclude cyclic structures

Logical Structure Learning In RPM, want logical specification of each node’s parent set Deterministic analogue: inductive logic programming (ILP) [Dzeroski & Lavrac 2001; Flach and Lavrac 2002] Classic work on RPMs by Friedman, Getoor, Koller & Pfeffer [1999] We’ll call their models FGKP models (they call them “probabilistic relational models” (PRMs))

FGKP Models Each dependency statement has form: where s1,...,sk are slot chains Slot chains Basically logical terms: Specialty(FirstAuthor(p)) But can also treat predicates as “multi-valued functions”: Specialty(AuthorOf(p)) Func(x) ~ TabularCPD[...](s1,..., sk) Smith BNs AuthorOf Specialty Smith&Jones01 aggregate AuthorOf Jones RL Specialty

Structure Learning for FGKP Models Greedy search again But add or remove whole slot chains Start with chains of length 1, then 2, etc. Check for acyclicity using symbol graph

Relational probability models (RPMs) Outline Relational probability models (RPMs) Representation Inference Learning Relational uncertainty: extending RPMs with probabilistic models for relations Abstract probabilistic model for attributes relational skeleton: objects & relations graphical model  +

Relational Uncertainty: Example Specialty: RL Generosity: 2.9 Specialty: Prob. Models Generosity: 2.2 Reviews AuthorOf Reviews AuthorOf Topic: RL AvgScore: ? Topic: RL AvgScore: ? Topic: Prob Models AvgScore: ? Reviews Reviews Specialty: Theory Generosity: 1.8 Questions: Who will review my paper, and what will its average review score be?

Possible Worlds RL 1.0 RL 1.0 RL 1.0 RL 1.0 RL 1.0 RL 1.0 RL 1.0 Theory 1.9 Theory 1.9 Theory 1.9 RL 2.3 RL 2.3 RL 2.3 Theory 2.7 BNs 2.7 Theory 2.7 RL 3.1 RL 3.1 RL 3.1 RL 1.8 RL 1.8 RL 1.8 RL 2.1 RL 2.1 RL 2.1

Simplest Approach to Relational Uncertainty Add predicate Reviews(r, p) Can model this with existing syntax: Potential drawback: Reviews(r, p) nodes are independent given specialties and topics Expected number of reviews per paper grows with number of researchers in skeleton Reviews(r, p) ~ ReviewCPD(Specialty(r), Topic(p)); [Getoor et al., JMLR 2002]

Another Approach: Reference Uncertainty Say each paper gets k reviews Can add Review objects to skeleton For each paper p, include k review objects rev with PaperReviewed(rev) = p Uncertain about values of function Reviewer(rev) Reviewer ? ? PaperReviewed ? [Getoor et al., JMLR 2002]

Models for Reviewer(rev) Explicit distribution over researchers? No: won’t generalize across skeletons Selection models: Uniform sampling from researchers with certain attribute values [Getoor et al., JMLR 2002] Weighted sampling, with weights determined by attributes [Pasula et al., IJCAI 2001]

Examples of Reference Uncertainty Choosing based on Specialty attribute Choosing by weighted sampling: ReviewerSpecialty(rev) ~ SpecSelectionCPD (Topic(PaperReviewed(rev))); Reviewer(rev) ~ Uniform({Researcher r : Specialty(r) = ReviewerSpecialty(rev)}); Weight(rev, r) = CompatibilityWeight (Topic(PaperReviewed(rev)), Specialty(r)); Reviewer(rev) ~ WeightedSample({(r, Weight(rev, r)) for Researcher r}); set of pairs as CPD argument

Context-Specific Dependencies RevScore(rev) ~ ScoreCPD(Generosity(Reviewer(rev))); AvgScore(p) = Mean({RevScore(rev) for Review rev : PaperReviewed(Rev) = p}); random object Consequence of relational uncertainty: dependencies become context-specific RevScore(Rev1) depends on Generosity(R1) only when Reviewer(Rev1) = R1

Semantics: Ground BN Can still define ground BN Parents of node X are all basic RVs whose values are potentially relevant in evaluating the right hand side of X’s dependency statement Example: for RevScore(Rev1)… Reviewer(Rev1) is always relevant Generosity(R) might be relevant for any researcher R RevScore(rev) ~ ScoreCPD(Generosity(Reviewer(rev)));

Ground BN Topic(P1) RevSpecialty(Rev1) RevSpecialty(Rev2) Reviewer(Rev2) Reviewer(Rev1) RevScore(Rev1) RevScore(Rev2) Generosity(R1) Generosity(R3) Generosity(R2)

Can still use ground BN, but it’s often very highly connected Inference Can still use ground BN, but it’s often very highly connected Alternative: Markov chain over possible worlds [Pasula & Russell, IJCAI 2001] In each world, only certain dependencies are active

Markov Chain Monte Carlo (MCMC) Markov chain 1, 2, ... over worlds in E Designed so unique stationary distribution is proportional to p() Fraction of 1,2,..., N in Q converges to P(Q|E) as N   Q E

Metropolis-Hastings MCMC Metropolis-Hastings process: in world , sample new world  from proposal distribution q(|) accept proposal with probability otherwise remain in  Stationary distribution is p()

Computing Acceptance Ratio Efficiently World probability is where pa(X) is instantiation of X’s active parents in  If proposal changes only X, then all factors not containing X cancel in p() and p() Result: Time to compute acceptance ratio often doesn’t depend on number of objects [Pasula et al., IJCAI 2001]

Learning Models for Relations Binary predicate approach: Use existing search over slot chains Selecting based on attributes Search over sets of attributes to look at Search over parent slot chains for choosing attribute values Reviews(r, p) ~ ReviewCPD(Specialty(r), Topic(p)); ReviewerSpecialty(rev) ~ SpecSelectionCPD (Topic(PaperReviewed(rev))); Reviewer(rev) ~ Uniform({Researcher r : Specialty(r) = ReviewerSpecialty(rev)}); [Getoor et al., JMLR 2002]

Summary Human knowledge is more abstract than basic graphical models Relational probability models Logic-based representation Structure learning by search over slot chains Inference by KBMC Relational uncertainty Natural extension to logic-based representation Approximate inference by MCMC

References Wellman, M. P., Breese, J. S., and Goldman, R. P. (1992) “From knowledge bases to decision models”. Knowledge Engineering Review 7:35-53. Breese, J.S. (1992) “Construction of belief and decision networks”. Computational Intelligence 8(4):624-647. Ngo, L. and Haddawy, P. (1997) “Answering queries from context-sensitive probabilistic knowledge bases”. Theoretical Computer Sci. 171(1-2):147-177. Koller, D. and Pfeffer, A. (1998) “Probabilistic frame-based systems”. In Proc. 15th AAAI National Conf. on AI, pages 580-587. Friedman, N., Getoor, L., Koller, D.,and Pfeffer, A. (1999) “Learning probabilistic relational models”. In Proc. 16th Int’l Joint Conf. on AI, pages 1300-1307. Pfeffer, A., Koller, D., Milch, B., and Takusagawa, K. T. (1999) “SPOOK: A System for Probabilistic Object-Oriented Knowledge”. In Proc. 15th Conf. on Uncertainty in AI, pages 541-550. Taskar, B., Segal, E., and Koller, D. (2001) “Probabilistic classification and clustering in relational data”. In Proc. 17th Int’l Joint Conf. on AI, pages 870-878. Getoor, L., Friedman, N., Koller, D., and Taskar, B. (2002) “Learning probabilistic models of link structure”. J. Machine Learning Res. 3:679-707. Taskar, B., Abbeel, P., and Koller, D. (2002) “Discriminative probabilistic models for relational data”. In Proc. 18th Conf. on Uncertainty in AI, pages 485-492.

References Poole, D. (2003) “First-order probabilistic inference”. In Proc. 18th Int’l Joint Conf. on AI, pages 985-991. de Salvo Braz, R. and Amir, E. and Roth, D. (2005) “Lifted first-order probabilistic inference.” In Proc. 19th Int’l Joint Conf. on AI, pages 1319-1325. Dzeroski, S. and Lavrac, N., eds. (2001) Relational Data Mining. Springer. Flach, P. and Lavrac, N. (2002) “Learning in Clausal Logic: A Perspective on Inductive Logic Programming”. In Computational Logic: Logic Programming and Beyond (Essays in Honour of Robert A. Kowalski), Springer Lecture Notes in AI volume 2407, pages 437-471. Pasula, H. and Russell, S. (2001) “Approximate inference for first-order probabilistic languages”. In Proc. 17th Int’l Joint Conf. on AI, pages 741-748. Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., and Kolobov, A. (2005) “BLOG: Probabilistic Models with Unknown Objects”. In Proc. 19th Int’l Joint Conf. on AI, pages 1352-1359.