Learning Probabilistic Relational Models Daphne Koller Stanford University Nir Friedman Hebrew University Lise Getoor Stanford University Avi Pfeffer Stanford University
Data sources –relational and object-oriented databases –frame-based knowledge bases –World Wide Web Learning from Relational Data Problem: –must fix attributes in advance can represent only some limited set of structures –IID assumption may not hold Traditional approaches –work well with flat representations –fixed length attribute-value vectors –assume IID samples
Our Approach Probabilistic Relational Models (PRMs) –rich representation language models relational dependencies probabilistic dependencies Learning PRMs –parameter estimation –model selection from data stored in relational databases
Outline Motivation Probabilistic relational models –Probabilistic Logic Programming [Poole, 1993]; [Ngo & Haddawy 1994] –Probabilistic object-oriented knowledge [Koller & Pfeffer 1997; 1998]; [Koller, Levy & Pfeffer; 1997] Learning PRMs Experimental results Conclusions
Probabilistic Relational Models Combine advantages of predicate logic & BNs: –natural domain modeling: objects, properties, relations; –generalization over a variety of situations; –compact, natural probability models. Integrate uncertainty with relational model: –properties of domain entities can depend on properties of related entities; –uncertainty over relational structure of domain.
Relational Schema Student Intelligence Performance Registration Grade Satisfaction Course Difficulty Rating Professor Popularity Teaching-Ability Stress-Level Teach In Take Describes the types of objects and relations in the databaseClassesRelationships Attributes
Example instance I Professor Prof. Gump Popularity high Teaching Ability medium Stress-Level low Course Phil142 Difficulty low Rating high Course Phil101 Difficulty low Rating high Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Student John Doe Intelligence high Performance average Student Jane Doe Intelligence high Performance average
What’s Uncertain? Relations Professor Prof. Gump Popularity high Teaching Ability medium Stress-Level low Course Phil142 Difficulty low Rating high Course Phil101 Difficulty low Rating high Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Student John Doe Intelligence high Performance average Student Jane Doe Intelligence high Performance average Attribute Values Objects Student Judy Dunn Intelligence high Performance high
Student John Deer Intelligence ??? Performance ??? Attribute Uncertainty Fixed skeleton –set of objects in each class –relations between them Uncertainty –over assignments of values to attributes Professor Prof. Gump Popularity ??? Teaching Ability ??? Stress-Level ??? Course Phil142 Difficulty ??? Rating ??? Course Phil101 Difficulty ??? Rating ??? Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade ??? Satisfaction ??? Student Jane Doe Intelligence ??? Performance ???
PRM: Dependencies Student Intelligence Performance Reg Grade Satisfaction Course Difficulty Rating Professor Popularity Teaching-Ability Stress-Level
PRM: Dependencies (cont.) Professor Prof. Gump Popularity high Teaching Ability medium Stress-Level low Course Phil142 Difficulty low Rating high Course Phil101 Difficulty low Rating high Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade A Satisfaction 3 Reg #5639 Grade ? Satisfaction 3 Student John Doe Intelligence high Performance average Student Jane Doe Intelligence high Performance average Student John Deer Intelligence low Performance average Reg #5639 Grade ? Satisfaction 3
PRM: aggregate dependencies Reg Grade Student Intelligence Performance Satisfaction Course Difficulty Rating Professor Popularity Teaching-Ability Stress-Level Student Jane Doe Intelligence high Performance average Reg #5077 Grade C Satisfaction 2 Reg #5054 Grade C Satisfaction 1 Reg #5639 Grade A Satisfaction 3 Problem!!! Need CPTs of varying sizes av g
PRM: aggregate dependencies Student Intelligence Performance Reg Grade Satisfaction Course Difficulty Rating Professor Popularity Teaching-Ability Stress-Level avg count sum, min, max, avg, mode, count
PRM: Summary A PRM specifies –a probabilistic dependency structure S a set of parents for each attribute X.A –a set of local probability models Given a skeleton structure , a PRM specifies a probability distribution over instances I : –over attribute values of all objects in Classes Objects Value of attribute A in object x Attributes
Learning PRMs Relational Schema Database: Parameter estimation Structure selection Course Student Reg Course Student Reg Instance I
Parameter estimation in PRMs Assume known dependency structure S Goal: estimate PRM parameters –entries in local probability models, A parameterization is good if it is likely to generate the observed data, instance I. MLE Principle: Choose so as to maximize l crucial property: decomposition separate terms for different X.A
ML parameter estimation Student Intelligence Performance Reg Grade Satisfaction Course Difficulty Rating DB technology well-suited to the computation of suff statistics: Course table Reg table Student table Count sufficient statistics
Model Selection Idea: –define scoring function –do local search over legal structures Key Components: –scoring models –legal models –searching model space
Scoring Models Bayesian approach: closed form solution
Legal Models Dependency ordering over attributes: x.a y.b if X.A depends on Y.B Paper Accepted Researcher Reputation author-of PRM defines a coherent probability model over skeleton if is acyclic
Guaranteeing Acyclicity How do we guarantee that a PRM is acyclic for every skeleton? PRM dependency structure S dependency graph Y.B X.A if X.A depends directly on Y.B dependency graph acyclic acyclic for any Attribute stratification:
Limitation of stratification Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Father Mother Person.M-chromPerson.P-chrom Person.B-type ???
Guaranteed acyclic relations Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Father Mother Prior knowledge: the Father-of relation is acyclic –dependence of Person.A on Person.Father.B cannot induce cycles
Guaranteeing acyclicity With guaranteed acyclic relations, some cycles in the dependency graph are guaranteed to be safe. We color the edges in the dependency graph A cycle is safe if –it has a green edge –it has no red edge yellow: within single object X.B X.A green: via g.a. relation Y.B X.A red: via other relations Y.B X.A Person.M-chrom Person.P-chrom Person.B-type
Searching Model Space Student Course Reg score Add C.A C.B score Delete S.I S.P Student Course Reg Student Reg Course Phase 0: consider only dependencies within a class
Phased structure search Student Course Reg score Add C.A R.B score Add S.I R.C Student Course Reg Student Reg Course Phase 1: consider dependencies from “neighboring” classes, via schema relations
Phased structure search score Add C.A S.P score Add S.I C.B Phase 2: consider dependencies from “further” classes, via relation chains Student Course Reg Student Course Reg Student Course Reg
Experimental Results: Movie Domain (real data) 11,000 movies, 7,000 actors Actor Gender Appears Role-type Movie Process Decade Genre source:
Genetics domain (synthetic data) Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Person M-chromosome P-chromosome Blood-type Father Mother Blood-Test Contaminated Result
Experimental Results Score Dataset Size Median Likelihood Gold Standard
Benefits Summarization –PRM provides compact model Anomaly detection –identify change and deviation Interpretability –graphical representation of dependencies Dependency modeling relational + statistical
Future directions Learning in complex real-world domains –drug treatment regimes –collaborative filtering Missing data Learning with structural uncertainty Discovery –hidden variables –causal structure –class hierarchy
Conclusions PRMs natural extension of BNs: –well-founded (probabilistic) semantics –compact representation of complex models Powerful learning techniques –builds on BN learning techniques –can learn directly from relational data Parameter estimation –efficient, effective exploitation of DB technology Structure identification –builds on well understood theory –major issues: guaranteeing coherence search heuristics