Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel.

Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel Nir Friedman Ben Taskar

Why Relational? The real world is composed of objects that have properties and are related to each other Natural language is all about objects and how they relate to each other “George got an A in Geography 101”

Attribute-Based Worlds Smart students get A’s in easy classes Smart_Jane & easy_CS101  GetA_Jane_CS101 Smart_Mike & easy_Geo101  GetA_Mike_Geo101 Smart_Jane & easy_Geo101  GetA_Jane_Geo101 Smart_Rick & easy_CS221  GetA_Rick_C World = assignment of values to attributes / truth values to propositional symbols

Object-Relational Worlds World = relational interpretation: Objects in the domain Properties of these objects Relations (links) between objects  x,y(Smart(x) & Easy(y) & Take(x,y)  Grade(A,x,y))

Why Probabilities? All universals are false Smart students get A’s in easy classes True universals are rarely useful Smart students get either A, B, C, D, or F C student The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful … (almost) James Clerk Maxwell Therefore the true logic for this world is the calculus of probabilities …

Probable Worlds Probabilistic semantics: A set of possible worlds Each world associated with a probability hard smart A hard smart B hard smart C hard weak A hard weak B hard weak C easy smart A easy smart B easy smart C easy weak A easy weak B easy weak C course difficulty student intell. grade

Representation: Design Axes AttributesObjects Categorical Probabilistic Epistemic state World state Propositional logic CSPs First-order logic Relational databases Sequences AutomataGrammars Bayesian nets Markov nets n-gram models HMMs Prob. CFGs

Outline Bayesian Networks Representation & Semantics Reasoning Probabilistic Relational Models Collective Classification Undirected discriminative models Collective Classification Revisited PRMs for NLP

Bayesian Networks nodes = variables edges = direct influence Graph structure encodes independence assumptions: Letter conditionally independent of Intelligence given Grade ABC CPD P(G|D,I) Letter Grade SAT Intelligence Difficulty

BN semantics Compact & natural representation: nodes have  k parents  2 k n vs. 2 n params parameters natural and easy to elicit conditional independencies in BN structure + local probability models full joint distribution over domain = L G S ID

Full joint distribution specifies answer to any query: P(variable | evidence about others) Reasoning using BNs Letter Grade SAT Intelligence Difficulty Letter SAT Probability theory is nothing but common sense reduced to calculation. Pierre Simon Laplace

BN Inference BN Inference is NP-hard Structure can use graph structure: Graph separation  conditional independence Do separate inference in parts Results combined over interface. A C B D FE Complexity: exponential in largest separator Structured BNs allow effective inference Exact inference in dense BNs is intractable

Approximate BN Inference Belief propagation is an iterative message passing algorithm for approximate inference in BNs Each iteration (until “convergence”): Nodes pass “beliefs” as messages to neighboring nodes Cons: Limited theoretical guarantees Might not converge Pros: Linear time per iteration Works very well in practice, even for dense networks

Outline Bayesian Networks Probabilistic Relational Models Language & Semantics Web of Influence Collective Classification Undirected discriminative models Collective Classification Revisited PRMs for NLP

Bayesian Networks: Problem Bayesian nets use propositional representation Real world has objects, related to each other Intelligence Difficulty Grade Intell_Jane Diffic_CS101 Grade_Jane_CS101 Intell_George Diffic_Geo101 Grade_George_Geo101 Intell_George Diffic_CS101 Grade_George_CS101 A C These “instances” are not independent

Probabilistic Relational Models Combine advantages of relational logic & BNs: Natural domain modeling: objects, properties, relations Generalization over a variety of situations Compact, natural probability models Integrate uncertainty with relational model: Properties of domain entities can depend on properties of related entities Uncertainty over relational structure of domain

St. Nordaf University Teaches In-course Registered In-course Prof. SmithProf. Jones George Jane Welcome to CS101 Welcome to Geo101 Teaching-ability Difficulty Registered Grade Satisfac Intelligence

Relational Schema Specifies types of objects in domain, attributes of each type of object & types of relations between objects Teach Student Intelligence Registration Grade Satisfaction Course Difficulty Professor Teaching-Ability In Take Classes Relations Attributes

Probabilistic Relational Models Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies Links define potential interactions Student Intelligence Reg Grade Satisfaction Course Difficulty Professor Teaching-Ability [K. & Pfeffer; Poole; Ngo & Haddawy] ABC

Prof. SmithProf. Jones Welcome to CS101 Welcome to Geo101 PRM Semantics Teaching-ability Difficulty Grade Satisfac Intelligence Instantiated PRM  BN  variables: attributes of all objects  dependencies: determined by links & PRM George Jane

Welcome to CS101 low / high The Web of Influence Welcome to Geo101 A C low high easy / hard

Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Learning models from data Collective classification of webpages Undirected discriminative models Collective Classification Revisited PRMs for NLP

Learning PRMs Learner Relational Database Course Student Reg D Expert knowledge [Friedman, Getoor, K., Pfeffer]

Learning PRMs Parameter estimation: Probabilistic model with shared parameters Grades for all students share same model Can use standard techniques for max-likelihood or Bayesian parameter estimation Structure learning: Define scoring function over structures Use combinatorial search to find high-scoring structure

Web  KB Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Project-of Member [Craven et al.]

Web Classification Experiments WebKB dataset Four CS department websites Bag of words on each page Links between pages Anchor text for links Experimental setup Trained on three universities Tested on fourth Repeated for all four combinations

Professor department extract information computer science machine learning … Standard Classification Categories: faculty course project student other 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 words only Naïve Bayes Page... Category Word 1 Word N

Exploiting Links... LinkWord N workin g with Tom Mitchell … 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 words onlylink words Page... Category Word 1 Word N

Collective Classification... Page Category Word 1 Word N From-... Page Category Word 1 Word N Link Exists To- [Getoor, Segal, Taskar, Koller] Approx. inference: belief propagation 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 words onlylink wordscollective Classify all pages collectively, maximizing the joint label probability

P(Registration.Grade | Course.Difficulty, Student.Intelligence) Learning w. Missing Data: EM low / high easy / hard ABC Courses Students [Dempster et al. 77]

Discovering Hidden Types Internet Movie Database http://www.imdb.com

Actor Director Movie Genres Rating Year #Votes MPAA Rating Discovering Hidden Types Type [Taskar, Segal, Koller]

Directors Steven Spielberg Tim Burton Tony Scott James Cameron John McTiernan Joel Schumacher Alfred Hitchcock Stanley Kubrick David Lean Milos Forman Terry Gilliam Francis Coppola Actors Anthony Hopkins Robert De Niro Tommy Lee Jones Harvey Keitel Morgan Freeman Gary Oldman Sylvester Stallone Bruce Willis Harrison Ford Steven Seagal Kurt Russell Kevin Costner Jean-Claude Van Damme Arnold Schwarzenegger … Movies Wizard of Oz Cinderella Sound of Music The Love Bug Pollyanna The Parent Trap Mary Poppins Swiss Family Robinson … Terminator 2 Batman Batman Forever GoldenEye Starship Troopers Mission: Impossible Hunt for Red October Discovering Hidden Types

Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Undirected Discriminative Models Markov Networks Relational Markov Networks Collective Classification Revisited PRMs for NLP

Directed Models: Limitations Acyclicity constraint limits expressive power: Two objects linked to by a student probably not both professors Allow arbitrary patterns over sets of objects & links Acyclicity forces modeling of all potential links: Network size O(N 2 ) Inference is quadratic Generative training: Train to fit all of data, not to maximize accuracy Influence flows over existing links, exploiting link graph sparsity Network size O(N) Allow discriminative training: Max P (labels | observations) Solution: Undirected Models [Lafferty, McCallum, Pereira]

Markov Networks Graph structure encodes independence assumptions: Chris conditionally independent of Eve given Alice & Dave ChrisDave EveAlice Betty ABC Compatibility  (A,B,C)

Relational Markov Networks Universals: Probabilistic patterns hold for all groups of objects Locality: Represent local probabilistic dependencies Sets of links give us possible interactions Study Group Student2 Reg2 Grade Intelligence Course Reg Grade Student Difficulty Intelligence [Taskar, Abbeel, Koller ‘02] Template potential

RMN Semantics Instantiated RMN  MN  variables: attributes of all objects  dependencies: determined by links & RMN George Jane Welcome to CS101 Welcome to Geo101 Difficulty Jill Geo Study Group CS Study Group Intelligence Grade

Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Undirected Discriminative Models Collective Classification Revisited Discriminative training of RMNs Webpage classification Link prediction PRMs for NLP

Learning RMNs Parameter estimation is not closed form Convex problem  unique global maximum  (Reg1.Grade,Reg2.Grade) P(Grades,Intelligence|Difficulty) Difficulty Intelligence Grade low / higheasy / hard ABCABC L = log Intelligence Grade Intelligence Grade Maximize

Flat Models... Page Category Word 1 Word N LinkWord N... P(Category|Words) Logistic Regression

Exploiting Links... Page Category Word 1 Word N From- Link... Page Category Word 1 Word N To- 42.1% relative reduction in error relative to generative approach

More Complex Structure C Wn W1 Faculty S Students S Courses

Collective Classification: Results 35.4% relative reduction in error relative to strong flat approach

Scalability WebKB data set size 1300 entities 180K attributes 5800 links Network size / school: Directed model 200,000 variables 360,000 edges Undirected model 40,000 variables 44,000 edges Difference in training time decreases substantially when some training data is unobserved want to model with hidden variables 3 sec180 sec 20 minutes15-20 sec Directed models Undirected models TrainingClassification

Predicting Relationships Even more interesting are the relationships between objects e.g., verbs are almost always relationships Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Member

Rel Flat Model... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N NONE advisor instructor TA member project-of

Flat Model...

Collective Classification: Links Rel... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N Category

Link Model...

Triad Model ProfessorStudent Group Advisor Member

Triad Model ProfessorStudent Course Advisor TA Instructor

Triad Model

WebKB++ Four new department web sites: Berkeley, CMU, MIT, Stanford Labeled page type (8 types): faculty, student, research scientist, staff, research group, research project, course, organization Labeled hyperlinks and virtual links (6 types): advisor, instructor, TA, member, project-of, NONE Data set size: 11K pages 110K links 2million words

Link Prediction: Results Error measured over links predicted to be present Link presence cutoff is at precision/recall break-even point (  30% for all models)... 72.9% relative reduction in error relative to strong flat approach

Summary PRMs inherit key advantages of probabilistic graphical models: Coherent probabilistic semantics Exploit structure of local interactions Relational models inherently more expressive “Web of influence”: use all available information to reach powerful conclusions Exploit both relational information and power of probabilistic reasoning

Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Undirected Discriminative Models Collective Classification Revisited PRMs for NLP Word-Sense Disambiguation Relation Extraction Natural Language Understanding (?) * An outsider’s perspective or “Why Should I Care?”*

Her advisor gave her feedback about the draft. Word Sense Disambiguation Neighboring words alone may not provide enough information to disambiguate We can gain insight by considering compatibility between senses of related words financial academic physical figurative electrical criticism wind paper

Collective Disambiguation Objects: words in text Attributes: sense, gender, number, pos, … Links: Grammatical relations (subject-object, modifier,…) Close semantic relations (is-a, cause-of, …) Same word in different sentences (one-sense-per-discourse) Compatibility parameters: Learned from tagged data Based on prior knowledge (e.g., WordNet, FrameNet) Her advisor gave her feedback about the draft. financial academic physical figurative electrical criticism wind paper Can we infer grammatical structure and disambiguate word senses simultaneously rather than sequentially? Can we integrate inter-word relationships directly into our probabilistic model?

Relation Extraction Announcement Miller Jackson Made Candidate Concerns Departs CEO Of ACME’s board of directors began a search for a new CEO after the departure of current CEO, James Jackson, following allegations of creative accounting practices at ACME. [6/01] … In an attempt to improve the company’s image, ACME is considering former judge Mary Miller for the job. [7/01] … As her first act in her new position, Miller announced that ACME will be doing a stock buyback. [9/01] … Hired??

Professor Sarah met Jane. She explained the hole in her proof. Understanding Language Proof: Theorem: P=NP N=1 Most likely interpretation: Student JaneProfessor Sarah

Resolving Ambiguity Professors often meet with students Jane is probably a student Professors like to explain “She” is probably Prof. Sarah Attribute values Link types Object identity [Goldman & Charniak, Pasula & Russell] Professor Sarah met Jane. She explained the hole in her proof. Probabilistic reasoning about objects, their attributes, and the relationships between them

Acquiring Semantic Models Statistical NLP reveals patterns: Standard models learn patterns at word level But word-patterns are only implicit surrogates for underlying semantic patterns “Teacher” objects tend to participate in certain relationships Can use this pattern for objects not explicitly labeled as a teacher teacher be train hire pay fire serenade 24% 3% 1.5% 1.4% 0.3%

Competing Approaches Logical Statistical Semantic Understanding Scaling Up (via learning) PRMs Noise & Ambiguity Desiderata: Complementary Approaches

Statistics: from Words to Semantics Represent statistical patterns at semantic level What types of objects participate in what types of relationships Learn statistical models of semantics from text Reason using the models to obtain global semantic understanding of the text Georgia O’Keefe Ladder to the Moon

Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel.

Similar presentations

Presentation on theme: "Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel.

Similar presentations

Presentation on theme: "Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel."— Presentation transcript:

Similar presentations

About project

Feedback