© Daphne Koller, 2005 Probabilistic Models of Relational Domains Daphne Koller Stanford University.

Slides:



Advertisements
Similar presentations
Learning Probabilistic Relational Models Daphne Koller Stanford University Nir Friedman Hebrew University Lise.
Advertisements

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Exact Inference in Bayes Nets
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Dynamic Bayesian Networks (DBNs)
© Daphne Koller, 2003 Probabilistic Models of Relational Domains Daphne Koller Stanford University.
Modeling the Shape of People from 3D Range Scans
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne.
From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
Modeling 3D Deformable and Articulated Shapes Yu Chen, Tae-Kyun Kim, Roberto Cipolla Department of Engineering University of Cambridge.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Lecture 5: Learning models using EM
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Statistical Learning from Relational Data Daphne Koller Stanford University Joint work with many many people.
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Probabilistic Models of Object-Relational Domains
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Image Classification for Automatic Annotation
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Data Mining and Decision Support
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
Machine Learning Basics
Dynamical Statistical Shape Priors for Level Set Based Tracking
Markov Random Fields Presented by: Vladan Radosavljevic.
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
Discriminative Probabilistic Models for Relational Data
Label and Link Prediction in Relational Data
Variable Elimination Graphical Models – Carlos Guestrin
Mean Field and Variational Methods Loopy Belief Propagation
Presentation transcript:

© Daphne Koller, 2005 Probabilistic Models of Relational Domains Daphne Koller Stanford University

Relations are Everywhere The web Webpages (& the entities they represent), hyperlinks Corporate databases Customers, products, transactions Social networks People, institutions, friendship links Biological data Genes, proteins, interactions, regulation Sensor data about physical world 3D points, objects, spatial relationships

Relational Data is Different Data instances not independent Topics of linked webpages are correlated Data instances are not identically distributed: Heterogeneous instances (papers, authors) No IID assumption  This is a good thing

© Daphne Koller, 2005 Attribute-Based & Relational Probabilistic Models Attribute-based probabilistic models Relational logic Relational Bayesian networks Relational Markov networks

Bayesian Networks nodes = variables edges = direct influence Graph structure encodes independence assumptions: Job conditionally independent of Intelligence given Grade ABC CPD P(G|D,I) Job Grade SAT Intelligence Difficulty

Job Full joint distribution specifies answer to any query: P(variable | evidence about others) Reasoning using BNs Grade SAT Intelligence Difficulty SAT

Bayesian Networks: Problem Bayesian nets use propositional representation Real world has objects, related to each other Intelligence Difficulty Grade Intell_Jane Diffic_CS101 Grade_Jane_CS101 Intell_George Diffic_Geo101 Grade_George_Geo101 Intell_George Diffic_CS101 Grade_George_CS101 A C These “instances” are not independent

The University BN Difficulty_ Geo101 Difficulty_ CS101 Grade_ Jane_ CS101 Intelligence_ George Intelligence_ Jane Grade_ George_ CS101 Grade_ George_ Geo101

G_Homer G_Bart G_Marge G_LisaG_Maggie G_Harry G_Betty G_Selma B_Harry B_Betty B_Selma B_Homer B_Marge B_Bart B_LisaB_Maggie The Genetics BN G = genotype B = bloodtype

Simple Approach Graphical model with shared parameters … and shared local dependency structure Want to encode this constraint: For human knowledge engineer For network learning algorithm How do we specify which nodes share params? shared (but different) structure across nodes?

Simple Approach II We can write a special-purpose program for each domain: genetic inheritance (family tree imposes constraints) university (course registrations impose constraints) Is there something more general?

Relational Logic General framework for representing: objects & their properties classes of objects with same model relations between objects Represent a model at the template level, and apply it to an infinite set of domains Given finite domain, each instantiation of the model is propositional, but the template is not

Relational Schema Specifies types of objects in domain, attributes of each type of object & types of relations between objects Student Intelligence Registration Grade Satisfaction Course Difficulty Professor Teaching-Ability Classes Attributes Teach Relations Has In

St. Nordaf University Teaches In-course Registered In-course Prof. SmithProf. Jones George Jane Welcome to CS101 Welcome to Geo101 Teaching-ability Difficulty Registered Grade Satisfac Intelligence High Hard Easy A C B Hate Smart Weak High Low Easy A B C Like Hate Like Smart Weak

Relational Logic: Summary Vocabulary: Classes of objects: Person, Course, Registration, … Individual objects in a class: George, Jane, … Attributes of these objects: George.Intelligence, Reg1.Grade Relationships between these objects Of(Reg1,George), Teaches(CS101,Smith) A world specifies: A set of objects, each in a class The values of the attributes of all objects The relations that hold between the objects

Binary Relations Any relation can be converted into an object: R(x 1,x 2,…,x k )  new “relation” object y, R 1 (x 1,y), R 2 (x 2,y),…, R k (x k,y) E.g., registrations are “relation objects”  Can restrict attention to binary relations R(x,y)

Relations & Links Binary relations can also be viewed as links: Specify the set of objects related to x via R R(x,y)  y  x.R 1, x  y.R 2 E.g., Teaches(p,c)  p.Courses = {courses c : Teaches(p,c)} c.Instructor = {professors p : Teaches(p,c)}

Relational Bayesian Network Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies Links define potential interactions Student Intelligence Reg Grade Satisfaction Course Difficulty Professor Teaching-Ability [K. & Pfeffer; Poole; Ngo & Haddawy] ABC

Prof. SmithProf. Jones Welcome to CS101 Welcome to Geo101 RBN Semantics Teaching-ability Grade Satisfac Intelligence George Jane Welcome to CS101 Difficulty

Welcome to CS101 low / high The Web of Influence Welcome to Geo101 A C low high easy / hard

Why Undirected Models? Symmetric, non-causal interactions E.g., web: categories of linked pages are correlated Cannot introduce direct edges because of cycles Patterns involving multiple entities E.g., web: “triangle” patterns Directed edges not appropriate “Solution”: Impose arbitrary direction Not clear how to parameterize CPD for variables involved in multiple interactions Very difficult within a class-based parameterization [Taskar, Abbeel, K. 2001]

Markov Networks Laura Noah Mary James Kyle Template potential

Markov Networks: Review A Markov network is an undirected graph over some set of variables V Graph associated with a set of potentials  i Each potential is factor over subset V i Variables in V i must be a (sub)clique in network

Relational Markov Networks Probabilistic patterns hold for groups of objects Groups defined as sets of (typed) elements linked in particular ways Study Group Student2 Reg2 Grade Intelligence Course Reg Grade Student Difficulty Intelligence Template potential [Taskar, Abbeel, K. 2002]

RMN Language Define clique templates All tuples {reg R 1, reg R 2, group G} s.t. In(G, R 1 ), In(G, R 2 ) Compatibility potential  (R 1.Grade, R 2.Grade) Ground Markov network contains potential  (r 1.Grade, r 2.Grade) for all appropriate r 1, r 2

Welcome to CS101 Ground MN (or Chain Graph) Welcome to Geo101 Difficulty Grade Intelligence George Jane Jill Intelligence Geo Study Group CS Study Group Grade

© Daphne Koller, 2005 Case Study I: Linked Documents Webpage classification Link prediction Citation matching

Web  KB Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Project-of Member [Craven et al.]

Professor department extract information computer science machine learning … Standard Classification Categories: faculty course project student other Page... Category Word 1 Word N

Standard Classification... LinkWord N workin g with Tom Mitchell … Page... Category Word 1 Word N Logistic test set error 4-fold CV: Trained on 3 universities Tested on 4th Discriminatively trained naïve Markov = Logistic Regression

Power of Context Professor?Student?Post-doc?

Model Structure Probabilistic Relational Model Course Student Reg Training Data New Data Learning Inference Conclusions Collective Classification Train on one year of student intelligence, course difficulty, and grades Given only grades in following year, predict all students’ intelligence Example:

[Taskar, Abbeel, K., 2002] Collective Classification Model... Page Category Word 1 Word N From- Link... Page Category Word 1 Word N To- LogisticLinks test set error Compatibility  (From,To) FT Classify all pages collectively, maximizing the joint label probability

[Taskar, Abbeel, K., 2002] More Complex Structure

C Wn W1 Faculty S Students S Courses

Collective Classification: Results LogisticLinksSectionLink+Section [Taskar, Abbeel, K., 2002] test set error 35.4% error reduction over logistic

Max Conditional Likelihood maximize w EstimationClassification argmax y . x . y* We don’t care about the conditional distribution P ( . y | . x )

margin # labeling mistakes in y Max Margin Estimation [Taskar, Guestrin, K., 2003] (see also [Collins, 2002; Hoffman 2003]) Quadratic program Exponentially many constraints  maximize || w || =1  EstimationClassification argmax y . x . y* What we really want: correct class labels

Max Margin Markov Networks We use structure of Markov network to provide equivalent formulation of QP Exponential only in tree width of network Complexity = max-likelihood classification Can solve approximately in networks where induced width is too large Analogous to loopy belief propagation Can use kernel-based features! SVMs meet graphical models [Taskar, Guestrin, K., 2003]

WebKB Revisited 16.1% relative reduction in error relative to cond. likelihood RMNs

Predicting Relationships Even more interesting: relationships between objects Tom Mitchell Professor WebKB Project Sean Slattery Student Advisor-of Member

Rel Flat Model... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N NONE advisor instructor TA member project-of Introduce exists/type attribute for each potential link Learn discriminative model for this attribute

Collective Classification: Links Rel... Page Word 1 Word N From-... Page Word 1 Word N To- Type... LinkWord 1 LinkWord N Category [Taskar, Wong, Abbeel, K., 2002]

Link Prediction: Results Error measured over links predicted to be present Link presence cutoff is at precision/recall break-even point (  30% for all models) % relative reduction in error relative to strong flat approach [Taskar, Wong, Abbeel, K., 2002]

Identity Uncertainty Model Background knowledge  is an object universe A set of potential objects PRM defines distribution over worlds  Assignments of values to object attributes Partition of objects into equivalence classes Objects in same class have same attribute values [Pasula, Marthi, Milch, Russell, Shpitser, 2002]

Citation Matching Model* Each citation object associated with paper object Uncertainty over equivalence classes for papers If P 1 =P 2, have same attributes & links Author Name Citation ObsTitle Text * Simplified Author-as-Cited Name Paper Title PubType Appears-in Refers-to Written-by Link chain: Appears-in. Refers-to. Written-by Title, PubType Authors

Identity Uncertainty Depending on choice of equivalence classes: Number of objects changes Dependency structure changes No “nice” corresponding ground BN Algorithm: Each partition hypothesis defines simple BN Use MCMC over equivalence class partition Exact inference over resulting BN defines acceptance probability for Markov chain [Pasula, Marthi, Milch, Russell, Shpitser, 2002]

Identity Uncertainty Results Accuracy of citation recovery: % of actual citation clusters recovered perfectly [Pasula, Marthi, Milch, Russell, Shpitser, 2002] 61.5% relative reduction in error relative to state of the art

© Daphne Koller, 2005 Case Study II: 3D Objects Object registration Part finding Shape modeling Scene segmentation

3D Scene Understanding Goal: Understand 3D data in terms of objects and relations “puppet holding stick”

3D Object Models Pose variation Shape variation Object models: Discover object parts Model pose variation in terms of parts Class models: Model shape variation within class Models learned from data

The Dataset Cyberware Scans 4 views, ~125k polygons ~65k points each missing surfaces 70 scans 48 scans

Standard Modeling Pipeline [Allen, Curless, Popovic 2002] 1.Articulated Template 2. Fit Template to Scans 3. Interpolation A lot of human intervention Pose or body shape deformations modeled, but not both Similar to: [Lewis et al. ‘00] [Sloan et al. ’01] [Mohr, Gleicher ’03], …

Data Preprocessing: Registration Task: Establish correspondences between two surfaces [Anguelov et al., 2004]

Generative Model Model mesh XTransformed mesh X’ Deformation / Transformation  Goal: Given model mesh X and data mesh Z, recover transformation  and correspondences C Data mesh Z Data Generation / Correspondences C Correspondence c k specifies which point x ’ i generated point z k [Anguelov, Srinivasan, K., Thrun, Pang, Davis, 2004]

Standard Method: Nonrigid ICP X Z c1c1 c2c2 Correspondences for different points computed independently Poor correspondences Poor transformations

Geodesic Potentials: close  close Nearby points in Z must be nearby in X Correlates pairs of adjacent points z k, z l ZX

Geodesic Potentials: far  far Distant points in Z must be distant in X Correlates pairs of distant points z k, z l Z X

Collective Correspondence Model Scan Point z i Local appearance Model Point x 1 Local appearance Scan Point z j Local appearance Model Point x 2 Local appearance 12…N12…N Link CiCi  (C i,C j ) Deformation potential CjCj (Ci)(Ci) (Cj)(Cj) Appearance potential Label all points collectively, maximizing the joint label probability

Inference is hard! Large model, many edges Exact inference is intractable! Loopy belief propagation: Approximate inference algorithm Passes messages between nodes in graph Often works fairly well in practice Doesn’t always converge When it does, convergence point not always very good

Inference is hard! In our case: With very fine mesh, can converge to poor result Use coarse-grained mesh, and then refine There are O(n 2 ) “farness” constraints Inference in resulting fully connected model is completely intractable Most constraints never relevant Add constraints only as needed

Results: Pose Variation [Anguelov, Srinivasan, K., Thrun, Pang, Davis, 2004]

Results: Shape Variation [Anguelov, Srinivasan, K., Thrun, Pang, Davis, 2004]

Recovering articulated models Input: models, correspondences Output: rigid parts, skeleton [Anguelov, Koller, Pang, Srinivasan, Thrun ‘04]

Recovering Articulation: State of the art Algorithm assigns points to parts independently; ignoring the correlations between the assignments Prone to local minima Each joint is estimated from a separate sequence Skeleton: 9 parts combine [Cheung et al., ‘03]

Recovering articulation [Anguelov et al. ’04] Stages of the process 1. Register meshes using Correlated Correspondences algorithm 2. Cluster surface into rigid parts 3. Estimate joints

Model Structure Probabilistic Relational Model Course Stude nt Reg Unlabeled Relational Data Learning Collective Clustering Given only students’ grades, cluster similar students Given a set of 3D meshes, cluster “related” points Example: Clustering of instances

Learning w. Missing Data: EM Learn joint probabilistic model with hidden vars EM Algorithm applies essentially unchanged E-step computes expected sufficient statistics, aggregated over all objects in class M-step uses ML (or MAP) parameter estimation Key difference: In general, the hidden variables are not independent Computation of expected sufficient statistics requires inference over entire network

P(Registration.Grade | Course.Difficulty, Student.Intelligence) Learning w. Missing Data: EM low / high easy / hard ABC Courses Students [Dempster et al. 77]

Collective Clustering Model Orig pos Part Transformation New pos Model Point Data Pos Data Point Correspond In Near Orig pos Part Transformation New pos Model Point Data Pos Data Point In Correspond

Associative Markov Nets yiyi yjyj  ij ii For K = 2, can be found using min-cut* For K > 2, solve within factor of 2 of optimal *Greig et al. 89, Kolmogorov & Zabih 02 E.g.: nearby pixels or laser scan points, similar webpages {1, …, K}

Finding Parts Hidden variables: Assignment of points to parts Parameters: Part transformations Optimize using EM algorithm E step performed efficiently using min-cut algorithm Number of clusters determined automatically [Anguelov, K., Pang, Srinivasan,Thrun, 2004]

Results: Puppet articulation [Anguelov, K., Pang, Srinivasan,Thrun, 2004]

Results: Arm articulation [Anguelov, K., Pang, Srinivasan,Thrun, 2004]

Results: 50 Human Scans Tree-shaped skeleton found Rigid parts found [Anguelov, K., Pang, Srinivasan,Thrun, 2004]

Modeling Human Deformation Deformed polygon Template polygon Pose deformation Body shape deformation Rigid part rotation Predict from nearby Joint angles Linear subspace (PCA) [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Pose Deformation input Joint angles Deformations output Regression function

Pose deformation [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Body Shape Deformation input output Low-dimensional subspace (PCA)

Shape Deformation Model

Shape Transfer [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Shape Completion Sparse surface markers Find most probable surface w.r.t. model Joint angles  R Body shape  in PCA space Completed surface [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Partial View Completion [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Motion Capture Animation [Anguelov, Srinivasan, K., Thrun, Rodgers, Davis, 2005]

Segmentation Train model to assign points to parts Discriminative training using pre-segmented images Collective classification Neighboring points more likely assigned to same part Use associative Markov network, with min-cut for inference [Anguelov, Taskar, Chatalbashev, Gupta, K., Heitz, Ng, 2005]

3D Mapping Laser Range Finder GPS IMU Data provided by: Michael Montemerlo & Sebastian Thrun Label: ground, building, tree, shrub Training: 30 thousand points Testing: 3 million points

Segmentation results Hand labeled 180K test points ModelAccuracy SVM68% V-SVM73% AMN93%

[Anguelov, Taskar, Chatalbashev, Gupta, K., Heitz, Ng, 2005] Segmentation Results Comparison RMN SVMRMN without links ModelAccuracy SVM86.5% V-SVM87.2% AMN94.4% Labels: head, torso, legs, background

© Daphne Koller, 2005 Case Study III: Cellular Networks Discovering regulatory networks from gene expression data Predicting protein-protein interactions

Model Based Approach Biological processes are about objects & relations Classes of objects: Genes, experiments, tissues, patients Properties Observed: gene sequence, experiment conditions Hidden: gene function Relations Gene regulation Protein-protein interactions

Biology 101: Gene Expression Gene 2 Coding Control Gene 1 Coding Control DNA RNAProtein Swi5 Transcription factor Swi5 Cells express different subsets of their genes in different tissues and under different conditions

Gene Expression Microarrays Measure mRNA level for all genes in one condition Hundreds of experiments Highly noisy Expression of gene i in experiment j Experiments Genes Induced Repressed

Expression level in each module is a function of expression of regulators Gene Regulation Model Experiment Gene Expression Module Regulator 1 Regulator 2 Regulator 3 Level Module assignment of gene “g” Expression level of Regulator 1 in experiment Segal et al. (Nature Genetics, 2003)

Module Networks Goal: Discover regulatory modules and their regulators Module genes: set of genes that are similarly controlled Regulation program: expression as function of regulators Modules HAP4  CMK1  true false true false

Global Module Map Are the module genes functionally coherent? Are some module genes known targets of the predicted regulators? Hap4 Xbp1Yer184cYap6 Gat1 Ime4Lsg1Msn4Gac1Gis1 Ypl230w Not3Sip Kin82Cmk1Tpk1Ppt Tpk2Pph Bmh1Gcn /50 30/50 Segal et al. (Nature Genetics, 2003)

Wet Lab Experiments Summary 3/3 regulators regulate computationally predicted genes New yeast biology suggested Ypl230w activates protein-folding, cell wall and ATP-binding genes Ppt1 represses phosphate metabolism and rRNA processing Kin82 activates energy and osmotic stress genes Segal et al. (Nature Genetics, 2003)

Human Data Human is a lot more complicated than yeast… More genes, regulators, noise Less ability to perturb the system How do we identify “real” regulatory relationships? Idea: use comparative genomics “Accidental” relationships in expression data uncorrelated in different species Relevant relationships confer selective advantage, and are likely maintained Goal: Discover regulatory modules that are shared across organisms

Gene Experiment Expression Regulator 1 Regulator 2 Regulator 3 Level Organism 2 Module Experiment Gene Expression Regulator 1 Regulator 2 Regulator 3 Level Organism 1 Module Conserved Gene Regulation Model Orthologs are more likely to be in the same module Regulation programs for the same module are more likely to share regulators Goal: Discover regulators that are shared across organisms

Human (90 arrays)Mouse (43 arrays) Conserved Regulation: Data Normal brain (4) Medulloblastoma (60) Other brain tumors (26) Gliomas, AT/RT, PNETs Normal brain (20) Medulloblastoma (23) 3718 human-mouse orthologous gene pairs measured in both human & mouse microarrays 604 candidate regulators based on GO annotations Include both transcription factors & signaling proteins

Does adding mouse data help? Improvement in Expression Prediction Accuracy Test Data Log-Likelihood (gain per gene) Human* C: bonus for assigning orthologs to corresponding modules improvement in expression prediction for unseen arrays By combining expression data from two species, we learn a better model of gene regulation in each * similar results for mouse Human-only module network

Conserved Cyclin D Module 34/38 (human), 34/40 (mouse) genes are shared Significant split on medulloblastoma for both human (p < 0.02) and mouse (p < ), and poor survival in human (p < 0.03) mousehuman 17/22 MB 2/11 MB 23/24 MB0/19 MB

Cyclin D1 & Medulloblastoma Cyclin D1 is known to be an important mediator of Shh- induced proliferation and tumorigenesis in medulloblastoma (Oliver, 2003) mousehuman

Conclusion

Under the Hood: Representation “Correct” modeling granularity is key Too simple  miss important structure Too rich  cannot be identified from the data Relational models provide significant value: Exploiting correlations between related instances Integrating across multiple data sources, multiple levels of abstraction

Under the Hood: Inference Huge graphical models  Exact inference is intractable ,000 hidden variables Often very densely connected Often use belief propagation, but additional ideas key to scaling, convergence: “Smart” initialization Hierarchical model, coarse to fine Incremental inference, with gradual introduction of constraints Pruning the space using heuristics Important to identify & exploit additional structure Use of min-cut for segmentation

Under the Hood: Learning Relational models inherently based on reuse Fewer parameters to estimate More identifiable Algorithmic ideas key to good accuracy: Phased learning of model components Discriminative training using max-margin approach Important to identify & exploit additional structure Convex optimization (relaxed linear programming) for discriminative training of certain Markov networks

The Web of Influence World contains many different types of objects, spanning multiple scales Objects are related in complex networks When we try to pick out anything by itself, we find that it is bound fast by a thousand invisible cords that cannot be broken, to everything in the universe. John Muir, 1869 “Web of influence” that provides powerful clues for understanding the world