Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayesian Belief Propagation
Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Thermodynamic Models of Gene Regulation Xin He CS598SS 04/30/2009.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
IR Lab, 16th Oct 2007 Zeyn Saigol
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
Gaussian Processes I have known
Author: Jim C. Huang etc. Lecturer: Dong Yue Director: Dr. Yufei Huang.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lecture 5: Learning models using EM
Gene Regulatory Networks - the Boolean Approach Andrey Zhdanov Based on the papers by Tatsuya Akutsu et al and others.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
6. Gene Regulatory Networks
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Combining the strengths of UMIST and The Victoria University of Manchester Propagating Measurement Uncertainty in Microarray Data Analysis Magnus Rattray.
Assigning Numbers to the Arrows Parameterizing a Gene Regulation Network by using Accurate Expression Kinetics.
Maximum likelihood (ML)
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
PATTERN RECOGNITION AND MACHINE LEARNING
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Comparison of methods for reconstruction of models for gene expression regulation A.A. Shadrin 1, *, I.N. Kiselev, 1 F.A. Kolpakov 2,1 1 Technological.
Inferring transcriptional and microRNA-mediated regulatory programs in glioblastma Setty, M., et al.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University,
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Gene repression and activation
Multifactor GPs Suppose now we wish to model different mappings for different styles. We will add a latent style vector s along with x, and define the.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Inferring gene regulatory networks with non-stationary dynamic Bayesian networks Dirk Husmeier Frank Dondelinger Sophie Lebre Biomathematics & Statistics.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
BCS547 Neural Decoding.
Introduction to biological molecular networks
Learning Chaotic Dynamics from Time Series Data A Recurrent Support Vector Machine Approach Vinay Varadan.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Can small quantum systems learn? NATHAN WIEBE & CHRISTOPHER GRANADE, DEC
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Learning gene regulatory networks in Arabidopsis thaliana
Ch3: Model Building through Regression
A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression By Alfredo A Kalaitzis and Neil.
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
Filtering and State Estimation: Basic Concepts
1 Department of Engineering, 2 Department of Mathematics,
Analyzing Time Series Gene Expression Data
CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.
Probabilistic Surrogate Models
Presentation transcript:

Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray

Talk plan Biological problem Dynamical models of gene expression Introducing GPs in the equation Linear and non-linear response Results Future extensions?

Transcription Transcription is the process by which the genetic information stored in DNA is expressed as mRNA molecules. It is promoted or repressed by proteins known as transcription Factors (TFs). TF concentrations are hard to measure. The effect of TFs on gene expression is hard to quantify precisely. From Alberts et al., Molecular Biology of the Cell

Simplified model Consider only one transcription factor binding some target genes TF g1 g2gN Model in detail this simplified situation, turning hard experimental problems into inference tasks.

Modelling transcription Quantitative description of transcriptional regulation can be achieved only by inference. Assume a simplified situation where one TF regulates a few targets. Let x j (t) be the mRNA concentration of gene j at time t. Then at equilibrium Here B j is the baseline expression level, D j is the decay rate of mRNA for gene j, and f(t) is the TF protein concentration. The function g determines the response of the gene to the TF. Common choices for g are linear (Barenco et al., Gen. Biol.,2006) or Michaelis-Menten (Rogers et al., MASAMB, 2006).

Inference Bayesian approaches have discretised the system (1) at the observed time points and treated the function values as additional parameters. Estimates of the parameters were obtained by MCMC. Computationally expensive. Inference limited to a few points. Need to evaluate the production rates. This can be difficult as standard techniques (e.g. polynomial interpolation) suffer in the presence of noise.

GPs for Linear response Treat the system (1) as a continuous system placing a GP prior distribution on f. Equation (1) can be solved in the linear case As this is a linear operation on the function f, it follows that the mRNA levels are also governed by a GP.

Kernel computations If we define g i (t)=  0 t f(u)e D i u du, we get the covariance of g i and g j in terms of the covariance of f as We can then compute the cross covariances between the various mRNA species and the latent function For RBF priors, this can be computed analytically.

We can jointly sample from the (x,f) process. Parameter estimation can be carried out using type II maximum likelihood. Posterior distribution for the TF concentrations is obtained by standard GP regression

Nonlinear response If the response is not a linear function (or if the prior covariance is not RBF) the inference problem is no longer exact. MAP-Laplace estimation for the profiles is possible by functional gradient descent. It is still possible to optimise the parameters. Details omitted on compassionate grounds.

Results: data set Used GPs to reproduce results from Barenco et al., Gen.Biol The task is to infer the TF concentration profile for p53, an important tumour suppressor, from the time series profile of five of its target genes. The model parameters are the RBF inverse width, baseline expression level, decay rate and sensitivity to p53 for each gene (16 parameters) The data consists of 6 time points on three independent cell lines (human leukemia)

Results: linear response Inferred TF profiles using linear response with RBF prior (left) and MLP prior (right).

Results: parameter estimates Baseline expression levels Sensitivities to p53 Decay rates

Results: non linear response We imposed positivity of the TF concentrations by using an exponential response. RBF prior MLP prior

Future directions Efficiency and flexibility of GPs make them ideal for inference of regulatory networks. Include biologically relevant features such as transcriptional delays. Extend to more than one TF, accounting for logical regulatory functions. Extend to model spatio-temporal data.