Download presentation
Published byBlaise Morrison Modified over 9 years ago
Probabilistic Models of Object-Relational Domains
Daphne Koller Stanford University Joint work with: Lise Getoor Ben Taskar Drago Anguelov Nir Friedman Pieter Abbeel Rahul Biswas Avi Pfeffer Ming-Fai Wong Evan Parker Eran Segal
Bayesian Networks: Problem
Bayesian nets use propositional representation Real world has objects, related to each other Intelligence These “instances” are not independent Difficulty Intell_Jane Diffic_CS101 Grade_Jane_CS101 Intell_George Diffic_CS101 Grade_George_CS101 A C Grade Intell_George Diffic_Geo101 Grade_George_Geo101 One of the key benefits of the propositional representation was our ability to represent our knowledge without explicit enumeration of the worlds. In turns out that we can do the same in the probabilistic framework. The key idea, introduced by Pearl in the Bayesian network framework, is to use locality of interaction. This is an assumption which seems to be a fairly good approximation of the world in many cases. However, this representation suffers from the same problem as other propositional representations. We have to create separate representational units (propositions) for the different entities in our domain. And the problem is that these instances are not independent. For example, the difficulty of CS101 in one network is the same difficulty as in another, so evidence in one network should influence our beliefs in another.
Probabilistic Relational Models
Combine advantages of relational logic & BNs: Natural domain modeling: objects, properties, relations Generalization over a variety of situations Compact, natural probability models Integrate uncertainty with relational model: Properties of domain entities can depend on properties of related entities Uncertainty over relational structure of domain
St. Nordaf University Geo101 CS101 Teaching-ability Prof. Jones
Prof. Smith Teaches Teaches Grade In-course Registered Intelligence Satisfac Welcome to Geo101 George Grade Registered Difficulty Welcome to CS101 In-course Satisfac Grade Let us consider an imaginary university called St. Nordaf. St. Nordaf has two faculty, two students, two courses, and three registrations of students in courses, each of which is associated with a registration record. These objects are linked to each other: professors teach classes, students register in classes, etc. Each of the objects in the domain has properties that we care about. Registered Satisfac Jane In-course
Relational Schema Classes Attributes Relations
Specifies types of objects in domain, attributes of each type of object & types of relations between objects Classes Professor Student Teaching-Ability Intelligence Teach Take Attributes Relations Registration Grade Satisfaction Course In Difficulty The university can be described in an organized form using a relational database. The schema of a database tells us what types of objects we have, what are the attributes of these objects that are of interest, and how the objects can relate to each other.
Representing the Distribution
Very large probability space for a given context All possible assignments of all attributes of all objects Infinitely many potential contexts Each associated with a very different set of worlds Need to represent infinite set of complex distributions Unfortunately, this is a very large number of worlds, and to specify a probabilistic model we have to specify a probability for each one. Furthermore, the resulting distribution would be good only for a limited time. If St. Nordaf hired a new faculty member, or got a new student, or even if the two students registered for different classes next year, it would no longer apply, and St. Nordaf would have to pay Acme consulting all over again. Thus, we want a model that holds for the infinitely many potential universities that hold over this very simple schema. Thus, we are stuck with what seems to be an impossible problem. How do we represent an infinite set of possible distributions, each of which is by itself very complex.
Probabilistic Relational Models
Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies Links define potential interactions Professor Teaching-Ability Student Intelligence Course Difficulty A B C Reg Grade Satisfaction The two key ideas that come to our rescue derive from the two approaches that we are trying to combine. From relational logic, we have the notion of universal patterns, which hold for all objects in a class. From Bayesian networks, we have the notion of locality of interaction, which in the relational case has a particular twist: Links give us precisely a notion of “interaction”, and thereby provide a roadmap for which objects can interact with each other. In this example, we have a template, like a universal quantifier for a probabilistic statement. It tells us: “For any registration record in my database, the grade of the student in the course depends on the intelligence of that student and the difficulty of that course.” This dependency will be instantiated for every object (of the right type) in our domain. It is also associated with a conditional probability distribution that specifies the nature of that dependence. We can also have dependencies over several links, e.g., the satisfaction of a student on the teaching ability of the professor who teaches the course. [K. & Pfeffer; Poole; Ngo & Haddawy]
PRM Semantics Instantiated PRM BN
variables: attributes of all objects dependencies: determined by links & PRM Teaching-ability Prof. Jones Prof. Smith Grade Satisfac Intelligence Welcome to Geo101 George Difficulty Welcome to CS101 The semantics of this model is as something that generates an appropriate probabilistic model for any domain that we might encounter. The instantiation is the embodiment of universals on the one hand, and locality of interaction, as specified by the links, on the other. Jane
The Web of Influence CS101 C A low high Geo101 easy / hard low / high
Welcome to CS101 C A low high Welcome to Geo101 This web of influence has interesting ramifications from the perspective of the types of reasoning patterns that it supports. Consider Forrest Gump. A priori, we believe that he is pretty likely to be smart. Evidence about two classes that he took changes our probabilities only very slightly. However, we see that most people who took CS101 got A’s. In fact, even people who did fairly poorly in other classes got an A in CS101. Therefore, we believe that CS101 is probably an easy class. To get a C in an easy class is unlikely for a smart student, so our probability that Forrest Gump is smart goes down substantially. easy / hard low / high
Reasoning with a PRM Generic approach:
Instantiate PRM to produce ground BN Use standard BN inference In most cases, resulting BN is too densely connected to allow exact inference Use approximate inference: belief propagation Improvement: Use domain structure — objects & relations — to guide computation Kikuchi approximation where clusters = objects
Data Model Objects Database Learner Probabilistic Model
Course Student Reg Learner Probabilistic Model Expert knowledge What are the objects in the new situation? How are they related to each other? Prob. Inference Data for New Situation
Two Recent Instantiations
From a relational dataset with objects & links, classify objects and predict relationships: Target application: Recognize terrorist networks Actual application: From webpages to database From raw sensor data to categorized objects Laser data acquired by robot Extract objects, with their static & dynamic properties Discover classes of similar objects
Summary PRMs inherit key advantages of probabilistic graphical models:
Coherent probabilistic semantics Exploit structure of local interactions Relational models inherently more expressive “Web of influence”: use multiple sources of information to reach conclusions Exploit both relational information and power of probabilistic reasoning
Discriminative Probabilistic Models for Relational Data
Ben Taskar Stanford University Joint work with: Pieter Abbeel Daphne Koller Ming-Fai Wong
Web KB Tom Mitchell Professor Project-of WebKB Project Member
Sean Slattery Student Advisor-of Project-of Member Of course, data is not always so nicely arranged for us as in a relational database. Let us consider the biggest source of data --- the world wide web. Consider the webpages in a computer science department. Here is one webpage, which links to another. This second webpage links to a third, which links back to the first two. There is also a webpage with a lot of outgoing links to webpages on this site. This is not nice clean data. Nobody labels these webpages for us, and tells us what they are. We would like to learn to understand this data, and conclude from it that we have a “Professor Tom Mitchell” one of whose interests is a project called “WebKB”. “Sean Slattery” is one of the students on the project, and Professor Mitchell is his advisor. Finally, Tom Mitchell is a member of the CS CMU faculty, which contains many other faculty members. How do we get from the raw data to this type of analysis? [Craven et al.]
Undirected PRMs: Relational Markov Nets
Universals: Probabilistic patterns hold for all groups of objects Locality: Represent local probabilistic dependencies Address limitations of directed models: Increase expressive power by removing acyclicity constraint Improve predictive performance through discriminative training Course Reg Grade Student Difficulty Intelligence Template potential Study Group Student2 Reg2 Grade Intelligence The two key ideas that come to our rescue derive from the two approaches that we are trying to combine. From relational logic, we have the notion of universal patterns, which hold for all objects in a class. From Bayesian networks, we have the notion of locality of interaction, which in the relational case has a particular twist: Links give us precisely a notion of “interaction”, and thereby provide a roadmap for which objects can interact with each other. In this example, we have a template, like a universal quantifier for a probabilistic statement. It tells us: “For any registration record in my database, the grade of the student in the course depends on the intelligence of that student and the difficulty of that course.” This dependency will be instantiated for every object (of the right type) in our domain. It is also associated with a conditional probability distribution that specifies the nature of that dependence. We can also have dependencies over several links, e.g., the satisfaction of a student on the teaching ability of the professor who teaches the course. [Taskar, Abbeel, Koller ‘02]
RMN Semantics Instantiated RMN MN
variables: attributes of all objects dependencies: determined by links & RMN Welcome to Geo101 Grade Intelligence Geo Study Group Difficulty George The two key ideas that come to our rescue derive from the two approaches that we are trying to combine. From relational logic, we have the notion of universal patterns, which hold for all objects in a class. From Bayesian networks, we have the notion of locality of interaction, which in the relational case has a particular twist: Links give us precisely a notion of “interaction”, and thereby provide a roadmap for which objects can interact with each other. In this example, we have a template, like a universal quantifier for a probabilistic statement. It tells us: “For any registration record in my database, the grade of the student in the course depends on the intelligence of that student and the difficulty of that course.” This dependency will be instantiated for every object (of the right type) in our domain. It is also associated with a conditional probability distribution that specifies the nature of that dependence. We can also have dependencies over several links, e.g., the satisfaction of a student on the teaching ability of the professor who teaches the course. Welcome to CS101 CS Study Group Jane Jill
Learning RMNs Parameter estimation is not closed form Convex problem unique global maximum Maximize L = log P(Grades,Intelligence|Difficulty) (Reg1.Grade,Reg2.Grade) easy / hard ABC low / high Grade Intelligence Grade Intelligence Grade Difficulty Intelligence Parameters are not independent Grade Intelligence Grade Difficulty Intelligence Grade
Web Classification Experiments
WebKB dataset Four CS department websites Five categories (faculty,student,project,course,other) Bag of words on each page Links between pages Anchor text for links Experimental setup Trained on three universities Tested on fourth Repeated for all four combinations
Exploiting Links From- Page ... Page Category Word1 WordN To- Category ... Classify all pages collectively, maximizing the joint label probability Link Word1 WordN 35.4% relative reduction in error relative to strong flat approach
Scalability WebKB data set size Network size / school:
1300 entities 180K attributes 5800 links Network size / school: 40,000 variables 44,000 edges Training time: 20 minutes Classification time: seconds
Predicting Relationships
Tom Mitchell Professor Advisor-of Member WebKB Project Sean Slattery Student Even more interesting are relationships between objects
WebKB++ Four new department web sites: Labeled page type (8 types):
Berkeley, CMU, MIT, Stanford Labeled page type (8 types): faculty, student, research scientist, staff, research group, research project, course, organization Labeled hyperlinks and virtual links (6 types): advisor, instructor, TA, member, project-of, NONE Data set size: 11K pages 110K links 2million words
Flat Model ... ... ... From- Page To- Page Rel Word1 WordN Word1 WordN
Type Rel NONE advisor instructor TA member project-of ... LinkWord1 LinkWordN
Flat Model ... ...
Collective Classification: Links
From- Page To- Page Category Category ... ... Word1 WordN Word1 WordN Type Rel ... LinkWord1 LinkWordN
Link Model ... ...
Triad Model Advisor Professor Student Member Member Group
Triad Model Advisor Professor Student TA Instructor Course
Triad Model
Link Prediction: Results
... 72.9% relative reduction in error relative to strong flat approach Error measured over links predicted to be present Link presence cutoff is at precision/recall break-even point (30% for all models)
Summary Use relational models to recognize entities & relations directly from raw data Collective classification: Classifies multiple entities simultaneously Exploits links & correlations between related entities Uses web of influence reasoning to reach strong conclusions from weak evidence Undirected PRMs allow high-accuracy discriminative training & rich graphical patterns
Learning Object Maps from Laser Range Data
Dragomir Anguelov Daphne Koller Evan Parker Robotics Lab Stanford University
Occupancy Grid Maps Static world assumption
Inadequate for answering symbolic queries person robot
Objects Entities with coherent properties: Shape Color
Kinematics (Motion)
Object Maps Natural and concise representation
Exploit prior knowledge about object models Walls are straight, doors open and close Learn global properties of the environment Primary orientation of walls, typical door width Generalize properties across objects Objects viewed as instances of object classes, parameter sharing
Learning Object Maps Define a probabilistic generative model
Suggest object hypotheses Optimize the object parameters (EM) Select highest-scoring model Object Properties Object Segmentation
Laser Sensor Data
Probabilistic Model Data: A set of scans Global Map M
Scan: set of <robot position, laser beam reading> tuples Each scan is associated with a static map M t Global Map M A set of objects {1, …, J} Each object i = {S[i ], Dt[i ]} S[i ] – static parameters Dt[i ] – dynamic parameters Non-static environment – dynamic parameters vary only between static maps Fully dynamic environment – dynamic parameters vary between scans
Probabilistic Model - II
General map M Static maps M1… MT Objects i Robot positions sit Laser beams zit Correspondence variables Cit
Generative Model Specification
Sensor model Object models Particular instantiation: Walls Doors Model score
Sensor Model Modeling occlusion: Why we should model occlusion:
Reading ztk generated from: Random model (uniform probability) First object the beam intersects Actual object (Gaussian probability) MaxRange model (Delta function) Why we should model occlusion: Realistic sensor model Helps to infer motion Improved model search
Wall Object Model Wall model i
A line defined by <i, i>, as in S intervals <1, 2> each denoting a segment along the line 2S + 2 independent parameters Collinear segments bias
Door Object Model Door Model i A pivot p Width w
A set of angles t (t=1,2,…) Limited rotation (90o) Arc “center” d 4 static + 1 dynamic parameter
Model Score Maximize log-posterior probability of map M, data Z:
increased data likelihood Increased number of parameters Maximize log-posterior probability of map M, data Z: Define structure prior p(M) over possible maps: |S[M]| — number of static parameters in M |D[M]| — number of dynamic parameters in M L[M] — total length of segments in M
Learning Model Parameters (EM)
E-step Compute expectations M-step Walls Optimize line parameters Optimize segment ends Doors Optimize pivot and angles Optimize door width
Suggesting Object Hypotheses
Wall hypotheses Use Hough transform (histogram-based approach) Compute preferred direction of the environment Use both to suggest lines Door hypotheses Use temporal differencing of static maps Check if points along segments in static maps Mt are well explained in the general map M If not, the segment is a potential door
Results for a Single Pass
Results for Two Passes
Future Work Simultaneous localization & mapping
Object class hierarchies Dynamic environments Enrich the object representation: More sophisticated shape models Color 3D
Hierarchical Object Maps
Learn to recognize object classes Come visit our poster!
Similar presentations
© 2025 Inc.
All rights reserved.