Learning Probabilistic Relational Models Daphne Koller Stanford University Nir Friedman Hebrew University Lise.

Slides:



Advertisements
Similar presentations
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Advertisements

Slide 1 of 18 Uncertainty Representation and Reasoning with MEBN/PR-OWL Kathryn Blackmond Laskey Paulo C. G. da Costa The Volgenau School of Information.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
A Tutorial on Learning with Bayesian Networks
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
Learning Probabilistic Relational Models Lise Getoor 1, Nir Friedman 2, Daphne Koller 1, and Avi Pfeffer 3 1 Stanford University, 2 Hebrew University,
Introduction of Probabilistic Reasoning and Bayesian Networks
Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo.
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
Parameter Estimation using likelihood functions Tutorial #1
CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Apr, 8, 2015 Slide source: from David Page (MIT) (which were.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Hierarchical Probabilistic Relational Models for Collaborative Filtering Jack Newton and Russ Greiner
Review: Bayesian learning and inference
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Lecture 5: Learning models using EM
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Bayesian Belief Networks
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar.
1 gR2002 Peter Spirtes Carnegie Mellon University.
Thanks to Nir Friedman, HU
Bayes Net Perspectives on Causation and Causal Inference
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Web-Enabled Decision Support Systems
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Probabilistic Models of Object-Relational Domains
Slides for “Data Mining” by I. H. Witten and E. Frank.
BLOG: Probabilistic Models with Unknown Objects Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov University of.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
CPSC 422, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 34 Dec, 2, 2015 Slide source: from David Page (MIT) (which were.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
Maximum Likelihood Estimation
Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center
Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Includes work done by: Nir Friedman, Hebrew U. Daphne.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Introduction on Graphic Models
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
Learning Bayesian Networks for Complex Relational Data
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33
Qian Liu CSE spring University of Pennsylvania
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 32
Learning Bayesian Network Models from Data
Irina Rish IBM T.J.Watson Research Center
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 32
Efficient Learning using Constrained Sufficient Statistics
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,
Bayesian Learning Chapter
Nonparametric Hypothesis Tests for Dependency Structures
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Discriminative Probabilistic Models for Relational Data
Label and Link Prediction in Relational Data
Presentation transcript:

Learning Probabilistic Relational Models Daphne Koller Stanford University Nir Friedman Hebrew University Lise Getoor Stanford University Avi Pfeffer Stanford University

Data sources –relational and object-oriented databases –frame-based knowledge bases –World Wide Web Learning from Relational Data Problem: –must fix attributes in advance  can represent only some limited set of structures –IID assumption may not hold Traditional approaches –work well with flat representations –fixed length attribute-value vectors –assume IID samples

Our Approach Probabilistic Relational Models (PRMs) –rich representation language models relational dependencies probabilistic dependencies Learning PRMs –parameter estimation –model selection from data stored in relational databases

Outline Motivation Probabilistic relational models –Probabilistic Logic Programming [Poole, 1993]; [Ngo & Haddawy 1994] –Probabilistic object-oriented knowledge [Koller & Pfeffer 1997; 1998]; [Koller, Levy & Pfeffer; 1997] Learning PRMs Experimental results Conclusions

Probabilistic Relational Models Combine advantages of predicate logic & BNs: –natural domain modeling: objects, properties, relations; –generalization over a variety of situations; –compact, natural probability models. Integrate uncertainty with relational model: –properties of domain entities can depend on properties of related entities; –uncertainty over relational structure of domain.

Relational Schema Student Intelligence Performance Registration Grade Satisfaction Course Difficulty Rating Professor Popularity Teaching-Ability Stress-Level Teach In Take Describes the types of objects and relations in the databaseClassesRelationships Attributes

Example instance I Professor Prof. Gump Popularity high Teaching Ability medium Stress-Level low Course Phil142 Difficulty low Rating high Course Phil101 Difficulty low Rating high Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Student John Doe Intelligence high Performance average Student Jane Doe Intelligence high Performance average

What’s Uncertain? Relations Professor Prof. Gump Popularity high Teaching Ability medium Stress-Level low Course Phil142 Difficulty low Rating high Course Phil101 Difficulty low Rating high Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Student John Doe Intelligence high Performance average Student Jane Doe Intelligence high Performance average Attribute Values Objects Student Judy Dunn Intelligence high Performance high

Student John Deer Intelligence ??? Performance ??? Attribute Uncertainty Fixed skeleton  –set of objects in each class –relations between them Uncertainty –over assignments of values to attributes Professor Prof. Gump Popularity ??? Teaching Ability ??? Stress-Level ??? Course Phil142 Difficulty ??? Rating ??? Course Phil101 Difficulty ??? Rating ??? Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade ??? Satisfaction ??? Student Jane Doe Intelligence ??? Performance ???

PRM: Dependencies Student Intelligence Performance Reg Grade Satisfaction Course Difficulty Rating Professor Popularity Teaching-Ability Stress-Level

PRM: Dependencies (cont.) Professor Prof. Gump Popularity high Teaching Ability medium Stress-Level low Course Phil142 Difficulty low Rating high Course Phil101 Difficulty low Rating high Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade ? Satisfaction 3 Student John Doe Intelligence high Performance average Student Jane Doe Intelligence high Performance average Student John Deer Intelligence low Performance average Reg #5639 Grade ? Satisfaction 3

PRM: aggregate dependencies Reg Grade Student Intelligence Performance Satisfaction Course Difficulty Rating Professor Popularity Teaching-Ability Stress-Level Student Jane Doe Intelligence high Performance average Reg #5077 Grade C Satisfaction 2 Reg #5054 Grade C Satisfaction 1 Reg #5639 Grade A Satisfaction 3 Problem!!! Need CPTs of varying sizes av g

PRM: aggregate dependencies Student Intelligence Performance Reg Grade Satisfaction Course Difficulty Rating Professor Popularity Teaching-Ability Stress-Level avg count sum, min, max, avg, mode, count

PRM: Summary A PRM specifies –a probabilistic dependency structure S a set of parents for each attribute X.A –a set of local probability models  Given a skeleton structure , a PRM specifies a probability distribution over instances I : –over attribute values of all objects in  Classes Objects Value of attribute A in object x Attributes

Learning PRMs Relational Schema Database: Parameter estimation Structure selection Course Student Reg Course Student Reg Instance I

Parameter estimation in PRMs Assume known dependency structure S Goal: estimate PRM parameters  –entries in local probability models, A parameterization  is good if it is likely to generate the observed data, instance I. MLE Principle: Choose   so as to maximize l crucial property: decomposition separate terms for different X.A

ML parameter estimation Student Intelligence Performance Reg Grade Satisfaction Course Difficulty Rating DB technology well-suited to the computation of suff statistics: Course table Reg table Student table Count sufficient statistics

Model Selection Idea: –define scoring function –do local search over legal structures Key Components: –scoring models –legal models –searching model space

Scoring Models Bayesian approach: closed form solution

Legal Models Dependency ordering over attributes: x.a y.b if X.A depends on Y.B Paper Accepted Researcher Reputation author-of PRM defines a coherent probability model over skeleton  if   is acyclic

Guaranteeing Acyclicity How do we guarantee that a PRM is acyclic for every skeleton? PRM dependency structure S dependency graph Y.B X.A if X.A depends directly on Y.B dependency graph acyclic    acyclic for any  Attribute stratification:

Limitation of stratification Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Father Mother Person.M-chromPerson.P-chrom Person.B-type ???

Guaranteed acyclic relations Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Father Mother Prior knowledge: the Father-of relation is acyclic –dependence of Person.A on Person.Father.B cannot induce cycles

Guaranteeing acyclicity With guaranteed acyclic relations, some cycles in the dependency graph are guaranteed to be safe. We color the edges in the dependency graph A cycle is safe if –it has a green edge –it has no red edge yellow: within single object X.B X.A green: via g.a. relation Y.B X.A red: via other relations Y.B X.A Person.M-chrom Person.P-chrom Person.B-type

Searching Model Space Student Course Reg  score Add C.A  C.B  score Delete S.I  S.P Student Course Reg Student Reg Course Phase 0: consider only dependencies within a class

Phased structure search Student Course Reg  score Add C.A  R.B  score Add S.I  R.C Student Course Reg Student Reg Course Phase 1: consider dependencies from “neighboring” classes, via schema relations

Phased structure search  score Add C.A  S.P  score Add S.I  C.B Phase 2: consider dependencies from “further” classes, via relation chains Student Course Reg Student Course Reg Student Course Reg

Experimental Results: Movie Domain (real data) 11,000 movies, 7,000 actors Actor Gender Appears Role-type Movie Process Decade Genre source:

Genetics domain (synthetic data) Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Father Mother Blood-Test Contaminated Result

Experimental Results Score Dataset Size Median Likelihood Gold Standard

Benefits Summarization –PRM provides compact model Anomaly detection –identify change and deviation Interpretability –graphical representation of dependencies Dependency modeling  relational + statistical

Future directions Learning in complex real-world domains –drug treatment regimes –collaborative filtering Missing data Learning with structural uncertainty Discovery –hidden variables –causal structure –class hierarchy

Conclusions PRMs natural extension of BNs: –well-founded (probabilistic) semantics –compact representation of complex models Powerful learning techniques –builds on BN learning techniques –can learn directly from relational data Parameter estimation –efficient, effective exploitation of DB technology Structure identification –builds on well understood theory –major issues: guaranteeing coherence search heuristics