Download presentation
Presentation is loading. Please wait.
Published byEvelyn Mason Modified over 8 years ago
1
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray
2
Talk plan Biological problem Genome-wide inference of regulatory activity Reducing the network
3
Biological networks Living cells contain thousands of genes These interact in complex ways forming dynamical networks (pathways) We would like to model different processes as (approximately) independent pathways Can we infer pathways from gene-expression data?
4
Transcriptional networks Some attempts at inferring regulatory networks (as opposed to pathways) from expression Hierarchical clustering (Eisen et al ‘95, Spellman et al ‘98) Dynamic Bayesian Networks (Friedman et al ’00) Need very large amounts of data to achieve good results (Husmeier 2003)
5
Adding location data Recent experimental techniques (ChIP) allow to measure binding of transcription factors (TFs) to target genes in vivo This can be viewed as a direct measurement of the wiring of the regulatory network The data is very noisy, and many binding events do not result in regulation
6
Regulatory network of S. Cerevisiae Data from Lee et al, Science (2002)
7
Inferring regulatory activity We wish to integrate location data and expression data This will allow us to infer which regulatory relations are confirmed by the expression data We use probabilistic models as they can handle more naturally the noise inherent in the system
8
Genome-wide inference Assume we have gene-expression measurements for N genes and we know the wiring of the regulatory network TF1 g1 g2gN...... gk TFd...... S 11 S 21 S k1 S 2d S Nd Placing suitable prior assumptions on the TF concentrations and the regulatory strengths, we can obtain posterior estimates from the data
9
Probabilistic model We use a (log) linear model of regulation We place a state-space-model (SSM) prior distribution on the TF concentrations We place a normal prior on the regulatory strengths G.Sanguinetti, N.D. Lawrence and M. Rattray, Bioinformatics, in press.
10
Estimating the parameters Exact inference in this model is intractable Approximate inference is performed using a variational EM algorithm This exploits Jensen’s inequality to get a bound on the log likelihood Under a factorization assumption on the approximating distribution q, the E-step becomes exactly solvable via fixed point equations.
11
Results The left hand picture shows the expression level of ACE2 in the yeast cell cycle, the middle shows the inferred protein concentration and right shows the significance of the activities. Location data from Lee et al (2002), microarray data from Spellman et al (1998)
12
Finding submodules Knowledge of the posterior distribution of the regulatory intensities allows to determine which regulations are significant We can use this information to identify approximately independent submodules of the regulatory network These can be used a starting point for fine- grained modelling of small gene networks.
13
Flood-fill One way to reduce the density of the regulatory network is to threshold based on regulatory strength or significance level One can then recover submodules by exhaustive search (similar to the flood- fill algorithm in physics) + Conceptually easy, exhaustive - Arbitrariness in setting threshold
14
Spectral clustering Based on the observation that presence of clusters implies approximately block diagonal structure for affinity matrix Cluster identification is reduced to an eigenvalue problem
15
Spectral submodules Use linear affinity matrix Spectral decomposition yields very coherent submodules Not exhaustive
16
Future directions Extend the modelling of transcriptional regulation to incorporate nonlinear and combinatorial effects Model submodules in a detailed way (cf. Lawrence, Sanguinetti and Rattray, NIPS06) Identify submodules in a semi-supervised fashion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.