Download presentation
Presentation is loading. Please wait.
Published byMatthew Holland Modified over 9 years ago
1
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy
2
RECAP from last time A regulatory network has structure and parameters Network reconstruction – Identify structure and parameters from data Classes of methods for network reconstruction – Per-gene vs Per-module – Sparse candidates is an example of per-gene Key idea: restrict the parent set to a skeleton defined by “good” candidates Good candidates: high mutual information OR high predictive power
3
Goals for today Per-module methods – Module network Incorporating priors in graph structure learning – Combining per-gene and per-module methods Assessing confidence in networks
4
Module Networks Motivation: – Most complex systems have too many variables – Not enough data to robustly learn dependencies among them – Large networks are hard to interpret Key idea: Group similarly behaving variables into “modules” and learn parameters for each module Relevance to gene regulatory networks – Genes that are co-expressed are likely regulated in similar ways Segal et al 2005
5
An expression module Set of genes that behave similarly across conditions Modules Gasch & Eisen, 2002 Genes
6
Modeling questions in Module Networks What is the mathematical definition of a module? – All variables in a module have the same conditional probability distributions How to model the CPD between parent and children? – Regression Tree How to learn module networks?
7
Defining a Module Network Denoted by : Structure, specifying the parents of each module : Assignment of X i to module k, : Parameterizing CPD P(M j |Pa Mj ), Pa Mj are parents of module M j – Each Variable X i in M j has the same conditional distribution
8
Bayesian network vs Module network Each variable takes three values: UP, DOWN, SAME
9
Bayesian network vs Module network Bayesian network – CPD per random variable – Learning only requires to search for parents Module network – CPD per module – Learning requires parent search and module membership assignment
10
Learning a Module Network Given – training dataset D={x 1,..,x N }, – number of modules Learn – Module assignment of each X i to a module – CPDs Θ – The parents of each module
11
Score of a Module network Module network Data K : number of modules, X j : j th module Pa Mj Parents of module M j Likelihood of module j
12
Module network learning algorithm
13
Module initialization as clustering of variables for module network
14
Module re-assignment Two requirements – Must preserve the acyclic structure – Must improve score Perform sequential update: – The delta score of moving a variable from one module to another while keeping the other variables fixed
15
Module re-assignment via sequential update
16
Regression tree to capture CPD X 1 > e 1 X 2 > e 2 YES NO YES Each path captures a mode of regulation of X 3 by X 1 and X 2 Expression of target modeled using Gaussians at each leaf node X3X3 X1X1 X2X2
17
Assessing the value of using Module Networks Generate data, D from a known module network, M true –M true was in turn learned from real data – 10 modules, 500 variables Learn a module network, M from D Assess M ’s quality using: – Test data likelihood (higher is better) – Agreement in parent-child relationships between M and M true
18
Test data likelihood Each line type represents size of training data
19
Recovery of graph structure
20
Module networks has better performance than simple Bayesian network Gain in test data likelihood over Bayesian network
21
Application of Module networks to yeast expression data Segal, Regev, Pe’er, Gasch, Nature Genetics 2005
22
The Respiration and Carbon Module Regulation tree
23
Global View of Modules modules for common processes often share common – regulators – binding site motifs
24
Goals for today Per-module methods – Module network Incorporating priors in graph structure learning – Combining per-gene and per-module methods Assessing confidence in networks
25
Per-gene vs per-module Per-gene methods – Precise regulatory programs per gene – No modular organization revealed/captured Per-module methods – Modular organization-> simpler representation – Gene-specific regulatory information is lost
26
Can we combine the strengths of both approaches? Per gene Y1Y1 X1X1 X2X2 Y2Y2 X3X3 X4X4 Y2Y2 Y1Y1 X1X1 X2X2 MERLIN: Per gene module-constrained Per module Y2Y2 Y1Y1 X1X1 X2X2 Module X4X4
27
Bayesian formulation of network inference is an unknown random variable Optimize posterior distribution of graph given data Graph prior Data
28
Let distribute independently over edges Define prior probability of edge presence A prior to combine per-gene and per-module methods Present edgesAbsent edges Module Prior strength Graph structure complexity Module support for an edge
29
Behavior of graph structure prior Probability of edge
30
Quantifying module support For each candidate X j for X i ’s regulator set
31
MERLIN: Learning upstream regulators of regulatory modules Targets Initial modules Measurements from multiple conditions Final reconstructed network Module Revisit modules using expression & regulatory programs Update regulators using new modules ATF1 RAP1.. Candidate regulators MCK1 HOG1.. Transcription factors Signaling proteins EXPRESSION CLUSTERING Roy et al, Plos Comp bio, 2013
32
MERLIN correctly infers edges between true and inferred networks on simulated data ? True networkInferred network GENIE3 MERLIN MODNET LINEAR-REGRESSION Precision Recall Precision= # of correct edges # of predicted edges Recall= # of correct edges # of true edges
33
Goals for today Per-module methods – Module network Incorporating priors in graph structure learning – Combining per-gene and per-module methods Assessing confidence in networks
34
Assessing confidence in the learned network Typically the number of training samples is not sufficient to reliably determine the “right” network One can however estimate the confidence of specific features of the network – Graph features f(G) Examples of f(G) – An edge between two random variables – Order relations: Is X, Y ’s ancestor?
35
How to assess confidence in graph features? What we want is P(f(G)|D), which is But it is not feasible to compute this sum Instead we will use a “bootstrap” procedure
36
Bootstrap to assess graph feature confidence For i=1 to m – Construct dataset D i by sampling with replacement N samples from dataset D, where N is the size of the original D – Learn a network B i For each feature of interest f, calculate confidence
37
Does the bootstrap confidence represent real relationships? Compare the confidence distribution to that obtained from randomized data Shuffle the columns of each row (gene) separately. Repeat the bootstrap procedure randomize each row independently genes Experimental conditions
38
Bootstrap-based confidence differs between real and actual data f f Random Real
39
Example of a high confidence sub-network One learned Bayesian networkBootstrapped confidence Bayesian network Highlights a subnetwork associated with yeast mating
40
Summary Biological systems are complex with many components Learning networks from global expression data is challenging We have seen three strategies to learn these networks – Sparse candidate – Module networks – Strategies to assess network structure confidence
41
Other problems in regulatory network inference Combining different types of datasets to improve network structure – E.g. Motif and ChIP binding Modeling dynamics in networks Incorporate perturbations on regulatory nodes Integrating upstream signaling networks with transcriptional networks Learning context-specific networks – Differential wiring
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.