A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.

Slides:

Advertisements

Similar presentations

Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.

Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions

. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.

Promoter and Module Analysis Statistics for Systems Biology.

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.

CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.

1 アンサンブルカルマンフィルターによる大気海洋結合モデルへのデータ同化 On-line estimation of observation error covariance for ensemble-based filters Genta Ueno The Institute of Statistical.

. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.

From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)

04/02/2006RECOMB 2006 Detecting MicroRNA Targets by Linking Sequence, MicroRNA and Gene Expression Data Joint work with Quaid Morris (2) and Brendan Frey.

Author: Jim C. Huang etc. Lecturer: Dong Yue Director: Dr. Yufei Huang.

Visual Recognition Tutorial

Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.

Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.

Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.

Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Statistical methods for identifying yeast cell cycle transcription factors Speaker: Chun-hui Cai.

Gene Regulatory Networks - the Boolean Approach Andrey Zhdanov Based on the papers by Tatsuya Akutsu et al and others.

BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.

Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

Maximum Likelihood (ML), Expectation Maximization (EM)

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,

6. Gene Regulatory Networks

Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.

Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.

Assigning Numbers to the Arrows Parameterizing a Gene Regulation Network by using Accurate Expression Kinetics.

Epistasis Analysis Using Microarrays Chris Workman.

Structure Learning for Inferring a Biological Pathway Charles Vaske Stuart Lab.

Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.

Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.

MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.

TF Infer A Tool for Probabilistic Inference of Transcription Factor Activities H.M. Shahzad Asif Institute of Adaptive and Neural Computation School of.

TF Infer A Tool for Probabilistic Inference of Transcription Factor Activities H.M. Shahzad Asif Machine Learning Group Department of Computer Science.

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto and Mark Craven K. Noto and M. Craven, Learning Regulatory.

Solution Space? In most cases lack of constraints provide a space of solutions What can we do with this space? 1.Optimization methods (previous lesson)

Computational Molecular Biology Non-unique Probe Selection via Group Testing.

Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Analysis of the yeast transcriptional regulatory network.

Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.

Introduction to biological molecular networks

Computational Molecular Biology Non-unique Probe Selection via Group Testing.

Modelling Gene Regulatory Networks using the Stochastic Master Equation Hilary Booth, Conrad Burden, Raymond Chan, Markus Hegland & Lucia Santoso BioInfoSummer2004.

. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.

Tutorial I: Missing Value Analysis

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.

Transcription factor binding motifs (part II) 10/22/07.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Evaluation of count scores for weight matrix motifs Project Presentation for CS598SS Hong Cheng and Qiaozhu Mei.

Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.

Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.

CH 5: Multivariate Methods

Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001

Recovering Temporally Rewiring Networks: A Model-based Approach

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

Filtering and State Estimation: Basic Concepts

1 Department of Engineering, 2 Department of Mathematics,

Volume 106, Issue 6, Pages (September 2001)

Presentation transcript:

A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence

Talk plan Overview of the problem Extending regression Introducing dynamics Modelling separately concentrations What next?

The problem The Central Dogma Genes mRNA Proteins Life Transcription Translation Protein interactions Easy to measure Hard to measure COMPLEX!

Specific problem Transcription factors produce proteins that promote or repress transcription of other genes; they play a fundamental role in gene networking Deduce the activity of the transcription factors’ proteins (in an experimental condition) from the mRNA expression data.

Why not use the TFs expressions? TFs are often low expressed, noisy TFs are post-transcriptionally regulated TFs interact non-trivially with each other

Current approaches Integrate with ChIP-on-chip data ChIP-on-chip gives a binary matrix X of transcription factors binding genes (connectivity matrix) Regress microarray expression data on X b mt is the transcription factor activity (TFA) of TF m at time t, monotonically linked to protein concentrations (Liao et al, Boulesteix and Strimmer, Gao et al,...)

Problems All genes bound by the TF contribute equally to the estimate of the TFA, regardless of the regulation type. TFAs are gene-independent, but the influence of a transcription factor varies from gene to gene (and according to condition) The model is linear (inevitable)

Extending Regression Modify the regression model to allow different TFAs for different genes and experiments Reduce the number of parameters by placing a prior distribution over the gene-specific TFAs. The choice of the prior distribution depends on the situation we model. E.g., for independent samples we may assume TFAs at different time points to be independent

Introducing dynamics To model time series data, we choose a Kalman filter prior on the rows of B where This is equivalent to assuming TFAs vary smoothly

Likelihood function Given the model and the prior, we can obtain a likelihood The likelihood can be estimated efficiently using the sparsity of the covariance and recursion relations.

Estimating the TFAs TFAs can be estimated a posteriori using Bayes’s Theorem and moment matching Error bars associated with each TFA are given by the squared root of the diagonal entries in the posterior covariance. Mean TFAs can be obtained by averaging gene- specific TFAs over the target genes.

Testing the model We compared our averaged TFAs with the ones obtained by regression for the Spellman dataset (Mol.Biol.Cell,1998), ChIP data from Lee et al. (Science 2002). The diagrams show the TFA for ACE2p.

...but we also get... TFA for SCW11 TFA for CTS1 TFA for YER124CTFA for YKL151C

...and we can do more! Error bars allow to determine which regulations are significant Correlations among TFs can be obtained from Σ Gene NameMaximum TFA with error YER124C ACE2=1.1±0.2, FKH2=0.03±0.04 YHR143W ACE2=1.4±0.2, FKH1=0.011±0.009, FKH2=0.03±0.04 PHO3 NDD1=1.6±0.2, FKH2=0.06±0.02 AGA1 MBP1=1.5±0.4, SWI4=1.0±0.4, MCM1=0±0.003

Decoupling action and concentration It is not clear in the model whether a high gene-specific TFA is the result of a high affinity or of a high protein concentration We modify the model to distinguish the effects of protein concentration and affinity Specifically, we model

Estimating the parameters The model is no longer exact. Approximate inference is performed using a variational EM algorithm This exploits Jensen’s inequality to get a bound on the log likelihood Under a factorization assumption on the approximating distribution q, the E-step becomes exactly solvable via fixed point equations.

Results The left hand picture shows the expression level of ACE2 in the yeast cell cycle, the middle shows the inferred protein concentration and right shows the significance of the activities.

Problems ChIP data is notoriously noisy; for example the same transcription factor (MSN4) in the same conditions (rich medium) is found to bind 32 genes in Lee et al. and 57 genes in Harbison et al. (the intersection is 20 genes). Posterior estimation helps with false positives, not with false negatives. The model is additive (in log space) and doesn’t model combinatorial effects.

What next? Collaborate with biologists to validate our predictions on novel data Microarray and ChIP data from same lab should be more consistent Use the model results as a starting point for systems biology modeling Introduce combinatorial effects