Causal Modelling for Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Learning Probabilistic Relational Models Daphne Koller Stanford University Nir Friedman Hebrew University Lise.
Markov Networks Alan Ritter.
A Tutorial on Learning with Bayesian Networks
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Introduction of Probabilistic Reasoning and Bayesian Networks
A Hierarchy of Independence Assumptions for Multi-Relational Bayes Net Classifiers School of Computing Science Simon Fraser University Vancouver, Canada.
Random Regression: Example Target Query: P(gender(sam) = F)? Sam is friends with Bob and Anna. Unnormalized Probability: Oliver Schulte, Hassan Khosravi,
Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u
Markov Networks.
Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson
Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.
School of Computing Science Simon Fraser University Vancouver, Canada.
Modelling Relational Statistics With Bayes Nets School of Computing Science Simon Fraser University Vancouver, Canada Tianxiang Gao Yuke Zhu.
A Tractable Pseudo-Likelihood for Bayes Nets Applied To Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Pseudo-Likelihood for Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada To appear at SIAM SDM conference.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
Bayes Net Perspectives on Causation and Causal Inference
A simple method for multi-relational outlier detection Sarah Riahi and Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
Chapter 9 Neural Network.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Markov Logic And other SRL Approaches
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model.
Lifted First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International joint work with Eyal Amir and Dan Roth.
Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology.
Slides for “Data Mining” by I. H. Witten and E. Frank.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Lecture 2: Statistical learning primer for biologists
Challenge Paper: Marginal Probabilities for Instances and Classes Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Introduction on Graphic Models
CS Machine Learning Instance Based Learning (Adapted from various sources)
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Markov Random Fields in Vision
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Learning Bayesian Networks for Complex Relational Data
First-Order Bayesian Networks
Boosted Augmented Naive Bayes. Efficient discriminative learning of
General Graphical Model Learning Schema
Markov Logic Networks for NLP CSCI-GA.2591
CJT 765: Structural Equation Modeling
Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
Markov Networks.
CS 188: Artificial Intelligence
Discriminative Probabilistic Models for Relational Data
Label and Link Prediction in Relational Data
Markov Networks.
Presentation transcript:

Causal Modelling for Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada

2 Outline Relational Data vs. Single-Table Data Two key questions Definition of Nodes (Random Variables) Measuring Fit of Model to Relational Data Previous Work Parametrized Bayes Nets (Poole 2003), Markov Logic Networks (Domingos 2005). The Cyclicity Problem. New Work The Learn-and-Join Bayes Net Learning Algorithm. A Pseudo-Likelihood Function for Relational Bayes Nets. Causal Modelling for Relational Data - CFE 2010

Single Data Table Statistics 3 Traditional Paradigm Problem Single population Random variables = attributes of population members. “flat” data, can be represented in single table. Students Nameintelligenceranking Jack31 Kim21 Paul12 Jack Kim Paul Causal Modelling for Relational Data - CFE 2010 population sample

Organizational Database/Science 4 Structured Data. Multiple Populations. Taxonomies, Ontologies, nested Populations. Relational Structures. Jack Kim Paul Causal Modelling for Relational Data - CFE 2010

Relational Databases 5 Input Data: A finite (small) model/interpretation/possible world.  Multiple Interrelated Tables. Student s-idIntelligenceRanking Jack31 Kim21 Paul12 Professor p-idPopularityTeaching-a Oliver31 Jim21 Course c-idRatingDifficulty RA s-idp-idSalaryCapability JackOliverHigh3 KimOliverLow1 PaulJimMed2 Registration s-idc.idGradeSatisfaction Jack101A1 Jack102B2 Kim102A1 Paul101B1 Causal Modelling for Relational Data - CFE 2010

Link based Classification P(diff(101))? 6 Student s-idIntelligenceRanking Jack31 Kim21 Paul12 Professor p-idPopularityTeaching-a Oliver31 Jim21 Course c-idRatingDifficulty 1013??? RA s-idp-idSalaryCapability JackOliverHigh3 KimOliverLow1 PaulJimMed2 Registration s-idc.idGradeSatisfaction Jack101A1 Jack102B2 Kim102A1 Paul101B1 Causal Modelling for Relational Data - CFE 2010

Link prediction P(Registered(jack,101))? 7 Student s-idIntelligenceRanking Jack31 Kim21 Paul12 Professor p-idPopularityTeaching-a Oliver31 Jim21 Course c-idRatingDifficulty RA s-idp-idSalaryCapability JackOliverHigh3 KimOliverLow1 PaulJimMed2 Registration s-idc.idGradeSatisfaction Jack101A1 Jack102B2 Kim102A1 Paul101B1 Causal Modelling for Relational Data - CFE 2010

Relational Data: what are the random variables (nodes)? Causal Modelling for Relational Data - CFE 2010 A functor is a function symbol with 1 st -order variables f(X), g(X,Y), R(X,Y). Each variable ranges over a population or domain. A Parametrized Bayes Net (PBN) is a BN whose nodes are functors (Poole UAI 2003). Single-table data = all functors contain the same single free variable X. 8

Example: Functors and Parametrized Bayes Nets Causal Modelling for Relational Data - CFE 2010 intelligence(S) diff(C) Registered(S,C) wealth(X) Friend(X,Y) wealth(Y) age(X) 9 Parameters: conditional probabilities P(child|parents). e.g., P(wealth(Y) = T | wealth(X) = T, Friend(X,Y) = T) defines joint probability for every conjunction of value assignments.

Domain Semantics of Functors Causal Modelling for Relational Data - CFE 2010 Halpern 1990, Bacchus 1990 Intuitively, P(Flies(X)|Bird(X)) = 90% means “the probability that a randomly chosen bird flies is 90%”. Think of a variable X as a random variable that selects a member of its associated population with uniform probability. Then functors like f(X), g(X,Y) are functions of random variables, hence themselves random variables. 10

Domain Semantics: Examples Causal Modelling for Relational Data - CFE 2010 P(S = jack) = 1/3. P(age(S) = 20) =  s:age(s)=20 1/|S|. P(Friend(X,Y) = T) =  x,y:friend(x,y) 1/(|X||Y|). In general, the domain frequency is the number of satisfying instantiations or groundings, divided by the total possible number of groundings. The database tables define a set of populations with attributes and links  database distribution over functor values. 11

Defining Likelihood Functions for Relational Data Need a quantitative measure of how well a model fits the data. Single-table data consists of identically and independently structured entities (IID). Relational data is not IID. ➱ Likelihood function ≠ simple product of instance likelihoods. 12 Student s-idIntelligenceRanking Jack31 Kim21 Paul12 Professor p-idPopularityTeaching-a Oliver31 Jim21 Course c-idRatingDifficulty RA s-idp-idSalaryCapability JackOliverHigh3 KimOliverLow1 PaulJimMed2 Registration s-idc.idGradeSatisfaction Jack101A1 Jack102B2 Kim102A1 Paul101B1 12

Knowledge-based Model Construction Causal Modelling for Relational Data - CFE 2010 Ngo and Haddaway, 1997; Koller and Pfeffer, 1997; Haddaway, st -order model = template. Instantiate with individuals from database (fixed!) → ground model. Isomorphism DB facts  assignment of values → likelihood measure for DB. intelligence(S) diff(C) Registered(S,C) Class-level Template with 1st-order Variables intelligence(jack) diff(100) Registered(jack,100) intelligence(jane) diff(200) Registered(jack,200) Registered(jane,100) Registered(jane,200) Instance-level Model w/ domain(S) = {jack,jane} domain(C) = {100,200} 13

The Combining Problem Causal Modelling for Relational Data - CFE 2010 Registered(S,C) diff(C) intelligence(S) intelligence(jack)diff(100) Registered(jack,100) intelligence(jane) diff(200) Registered(jack,200) Registered(jane,100)Registered(jane,200) How do we combine information from different related entities (courses)? Aggregate properties of related entities (PRMs; Getoor, Koller, Friedman). Combine probabilities. (BLPs; Poole, deRaedt, Kersting.) 14

The Cyclicity Problem Causal Modelling for Relational Data - CFE 2010 Class-level model (template) Rich(X) Friend(X,Y) Rich(Y) Ground model Rich(a)Friend(a,b) Rich(b) Friend(b,c) Rich(c) Friend(c,a) Rich(a) With recursive relationships, get cycles in ground model even if none in 1 st -order model. Jensen and Neville 2007: “The acyclicity constraints of directed models severely constrain their applicability to relational data.” 15

Hidden Variables Avoid Cycles Causal Modelling for Relational Data - CFE Rich(X)Friend(X,Y)Rich(Y) U(X) U(Y)U(Y) Assign unobserved values u(jack), u(jane). Probability that Jack and Jane are friends depends on their unobserved “type”. In ground model, rich(jack) and rich(jane) are correlated given that they are friends, but neither is an ancestor. Common in social network analysis (Hoff 2001, Hoff and Rafferty 2003, Fienberg 2009). $1M prize in Netflix challenge. Also for multiple types of relationships (Kersting et al. 2009). Computationally demanding.

Undirected Models Avoid Cycles Causal Modelling for Relational Data - CFE Class-level model (template) Ground model Rich(X) Friend(X,Y) Rich(Y) Friend(a,b) Rich(a)Rich(b) Friend(c,a) Rich(c) Friend(b,c)

Markov Network Example Undirected graphical model Cancer CoughAsthma Smoking Potential functions defined over cliques SmokingCancer Ф(S,C) False 4.5 FalseTrue 4.5 TrueFalse 2.7 True 4.5 Causal Modelling for Relational Data - CFE

Markov Logic Networks 19 Domingos and Richardson ML 2006 An MLN is a set of formulas with weights. Graphically, a Markov network with functor nodes.  Solves the combining and the cyclicity problems. For every functor BN, there is a predictively equivalent MLN (the moralized BN). Causal Modelling for Relational Data - CFE 2010 Rich(X) Friend(X,Y) Rich(Y) Rich(X) Friend(X,Y) Rich(Y)

New Proposal Causal Modelling for Relational Data - CFE Causality at token level (instances) is underdetermined by type level model. Cannot distinguish whether wealth(jane) causes wealth(jack), wealth(jack) causes wealth(jane) or both (feedback).  Focus on type-level causal relations. How? Learn model of Halpern’s database distribution. For token-level inference/prediction, convert to undirected model. wealth(X) Friend(X,Y) wealth(Y)

The Learn-and-Join Algorithm (AAAI 2010) 21 Required: single-table BN learner L. Takes as input (T,RE,FE): Single data table. A set of edge constraints (forbidden/required edges). Nodes: Descriptive attributes (e.g. intelligence(S)) Boolean relationship nodes (e.g., Registered(S,C)). 1.RequiredEdges, ForbiddenEdges := emptyset. 2.For each entity table E i : a)Apply L to E i to obtain BN G i. For two attributes X,Y from E i, b)If X → Y in G i, then RequiredEdges += X → Y. c)If X → Y not in G i, then ForbiddenEdges += X → Y. 3.For each relationship table join (= conjunction) of size s = 1,..k a)Compute Rtable join, join with entity tables := J i. b)Apply L to (J i, RE, FE) to obtain BN G i. c)Derive additional edge constraints from G i. 4.Add relationship indicators: If edge X → Y was added when analyzing join R 1 join R 2 … join R m, add edges R i → Y. Causal Modelling for Relational Data - CFE 2010

Phase 1: Entity tables Causal Modelling for Relational Data - CFE diff(C) rating(C) popularity (p(C)) teach-ability(p(C)) Students Nameintelligenceranking Jack31 Kim21 Paul12 ranking(S) intelligence(S) BN learner L Course NumberProfratingdifficulty 101Oliver31 102David22 103Oliver32 BN learner L

Phase 2: relationship tables 23 Registration StudentCourse S.NameC.numbergradesatisfactionintelligencerankingratingdifficulty Jack101A13131 …. ……………… ranking(S) intelligence(S) diff(C ) rating(C ) popularity(p(C )) teach-ability(p(C )) BN learner L ranking(S) intelligence(S) grade(S,C) satisfaction(S,C) diff(C) rating(C) popularity(p(C)) teach-ability(p(C))

Phase 3: add Boolean relationship indicator variables Causal Modelling for Relational Data - CFE ranking(S) intelligence(S) diff(C) rating(C) grade(S,C) satisfaction(S,C) popularity(p(C)) teach-ability(p(C)) ranking(S) intelligence(S) diff(C) rating(C) grade(S,C) satisfaction(S,C) Registered(S,C) popularity(p(C)) teach-ability(p(C))

Running time on benchmarks Time in Minutes. NT = did not terminate. x + y = structure learning + parametrization. JBN: Our join-based algorithm. MLN, CMLN: standard programs from the U of Washington (Alchemy) 25 Causal Modelling for Relational Data - CFE 2010

Accuracy 26 Causal Modelling for Relational Data - CFE 2010

Pseudo-likelihood for Functor Bayes Nets Causal Modelling for Relational Data - CFE What likelihood function P(database,graph) does the learn-and- join algorithm optimize? 1. Moralize the BN (causal graph). 2. Use the Markov net likelihood function for moralized BN--- without the normalization constant.   families. P(child|parent) #child-parent instances  pseudo-likelihood. Relational Causal Graph Markov Logic Network Likelihood Function

Features of Pseudo-likelihood P* Causal Modelling for Relational Data - CFE Tractability: maximizing estimates = empirical conditional database frequencies! Similar to pseudo-likelihood function for Markov nets (Besag 1975, Domingos and Richardson 2007). Mathematically equivalent but conceptually different interpretation: expected log-likelihood for randomly selected individuals.

Halpern Semantics for Functor Bayes Nets (new) Causal Modelling for Relational Data - CFE Randomly select instances X 1 = x 1,…,X n =x n. for each variable in BN. 2. Look up their properties, relationships. 3. Compute log-likelihood for the BN assignment obtained from the instances. 4. L H = average log-likelihood over uniform random selection of instances. Rich(jack) Friend(jack,jane) Rich(jane) Rich(X) Friend(X,Y) Rich(Y) =T =F =T =F Proposition L H (D,B) = ln(P*(D,B) x c where c is a (meaningful) constant. No independence assumptions!

Summary of Review 30 Two key conceptual questions for relational causal modelling. 1. What are the random variables (nodes)? 2. How to measure fit of model to data? 1. Nodes = functors, open function terms (Poole). 2. Instantiate type-level model with all possible tokens. Use instantiated model to assign likelihood to the totality of all token facts. Problem: instantiated model may contain cycles even if type-level model does not. One solution: use undirected models. Causal Modelling for Relational Data - CFE 2010

Summary of New Results Causal Modelling for Relational Data - CFE New algorithm for learning causal graphs with functors.  Fast and scalable (e.g., 5 min vs. 21 hr).  Substantial Improvements in Accuracy. New pseudo-likelihood function for measuring fit of model to data. Tractable parameter estimation. Similar to Markov network (pseudo)-likelihood. New semantics: expected log-likelihood of the properties of randomly selected individuals.

Open Problems Causal Modelling for Relational Data - CFE Learning Learn-and-Join learns dependencies among attributes, not dependencies among relationships. Parameter learning still a bottleneck. Inference/Prediction Markov logic likelihood does not satisfy Halpern’s principle: if P( ϕ (X)) = p, then P( ϕ (a)) = p where a is a constant. (Related to Miller’s principle). Is this a problem?

Thank you! 33 Any questions? Causal Modelling for Relational Data - CFE 2010

Choice of Functors Causal Modelling for Relational Data - CFE Can have complex functors, e.g. Nested: wealth(father(father(X))). Aggregate: AVG C {grade(S,C): Registered(S,C)}. In remainder of this talk, use functors corresponding to Attributes (columns), e.g., intelligence(S), grade(S,C) Boolean Relationship indicators, e.g. Friend(X,Y).

Typical Tasks for Statistical-Relational Learning (SRL) Causal Modelling for Relational Data - CFE 2010 Link-based Classification: given the links of a target entity and the attributes of related entities, predict the class label of the target entity. Link Prediction: given the attributes of entities and their other links, predict the existence of a link. 35