Inferring gene regulatory networks with non-stationary dynamic Bayesian networks Dirk Husmeier Frank Dondelinger Sophie Lebre Biomathematics & Statistics.

Slides:



Advertisements
Similar presentations
J. Daunizeau Institute of Empirical Research in Economics, Zurich, Switzerland Brain and Spine Institute, Paris, France Bayesian inference.
Advertisements

Bayesian inference Jean Daunizeau Wellcome Trust Centre for Neuroimaging 16 / 05 / 2008.
Bayesian network for gene regulatory network construction
A Tutorial on Learning with Bayesian Networks
DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
Le Song Joint work with Mladen Kolar and Eric Xing KELLER: Estimating Time Evolving Interactions Between Genes.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Dynamic Bayesian Networks (DBNs)
Mechanistic models and machine learning methods for TIMET Dirk Husmeier.
Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
J. Daunizeau Wellcome Trust Centre for Neuroimaging, London, UK Institute of Empirical Research in Economics, Zurich, Switzerland Bayesian inference.
Reverse engineering gene and protein regulatory networks using Graphical Models. A comparative evaluation study. Marco Grzegorczyk Dirk Husmeier Adriano.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Exploring Network Inference Models Math-in-Industry Camp & Workshop: Michael Grigsby: Cal Poly, Pomona Mustafa Kesir: Northeastern University Nancy Rodriguez:
6. Gene Regulatory Networks
Learning Bayesian Networks (From David Heckerman’s tutorial)
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Reverse Engineering of Genetic Networks (Final presentation)
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology.
Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Randomized Algorithms for Bayesian Hierarchical Clustering
-Arnaud Doucet, Nando de Freitas et al, UAI
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Reconstructing gene regulatory networks with probabilistic models Marco Grzegorczyk Dirk Husmeier.
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke.
Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.
MCMC in structure space MCMC in order space.
Lecture 2: Statistical learning primer for biologists
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
Gaussian Processes For Regression, Classification, and Prediction.
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Mechanistic models and machine learning methods for TIMET
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Introduction to Sampling based inference and MCMC
Incorporating graph priors in Bayesian networks
Ch3: Model Building through Regression
Recovering Temporally Rewiring Networks: A Model-based Approach
CSCI 5822 Probabilistic Models of Human and Machine Learning
Estimating Networks With Jumps
Filtering and State Estimation: Basic Concepts
CSCI 5822 Probabilistic Models of Human and Machine Learning
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Bayesian inference J. Daunizeau
Multivariate Methods Berlin Chen
Presentation transcript:

Inferring gene regulatory networks with non-stationary dynamic Bayesian networks Dirk Husmeier Frank Dondelinger Sophie Lebre Biomathematics & Statistics Scotland

Overview Introduction Non-homogeneous dynamic Bayesian network for non-stationary processes Flexible network structure Open problems

Can we learn signalling pathways from postgenomic data? From Sachs et al Science 2005

Network reconstruction from postgenomic data

Friedman et al. (2000), J. Comp. Biol. 7, Marriage between graph theory and probability theory

Bayes net ODE model

A CB D EF NODES EDGES Graph theory Directed acyclic graph (DAG) representing conditional independence relations. Probability theory It is possible to score a network in light of the data: P(D|M), D:data, M: network structure. We can infer how well a particular network explains the observed data.

[A]= w1[P1] + w2[P2] + w3[P3] + w4[P4] + noise BGe (Linear model) A P1 P2 P4 P3 w1 w4 w2 w3

BDe (Nonlinear discretized model) P1 P2 P1 P2 Activator Repressor Activator Repressor Activation Inhibition Allow for noise: probabilities Conditional multinomial distribution P P

Model Parameters q Integral analytically tractable!

BDe: UAI 1994 BGe: UAI 1995

Dynamic Bayesian network

Example: 2 genes  16 different network structures Best network: maximum score

Identify the best network structure Ideal scenario: Large data sets, low noise

Uncertainty about the best network structure Limited number of experimental replications, high noise

Sample of high-scoring networks

Feature extraction, e.g. marginal posterior probabilities of the edges

Sample of high-scoring networks Feature extraction, e.g. marginal posterior probabilities of the edges High-confident edge High-confident non-edge Uncertainty about edges

Can we generalize this scheme to more than 2 genes? In principle yes. However …

Number of structures Number of nodes

Configuration space of network structures Find the high-scoring structures Sampling from the posterior distribution Taken from the MSc thesis by Ben Calderhead

Madigan & York (1995), Guidici & Castello (2003)

Configuration space of network structures MCMC Local change Ifaccept If accept with probability Taken from the MSc thesis by Ben Calderhead

Overview Introduction Non-homogeneous dynamic Bayesian networks for non-stationary processes Flexible network structure Open problems

Dynamic Bayesian network

Example: 4 genes, 10 time points t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Standard dynamic Bayesian network: homogeneous model

Limitations of the homogeneity assumption

Our new model: heterogeneous dynamic Bayesian network. Here: 2 components t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Our new model: heterogeneous dynamic Bayesian network. Here: 3 components

Learning with MCMC q k h Number of components (here: 3) Allocation vector

Non-homogeneous model  Non-linear model

[A]= w1[P1] + w2[P2] + w3[P3] + w4[P4] + noise BGe: Linear model A P1 P2 P4 P3 w1 w4 w2 w3

BDe: Nonlinear discretized model P1 P2 P1 P2 Activator Repressor Activator Repressor Activation Inhibition Allow for noise: probabilities Conditional multinomial distribution P P

Pros and cons of the two models Linear Gaussian model Restriction to linear processes Original data  no information loss Multinomial model Nonlinear model Discretization  information loss

Can we get an approximate nonlinear model without data discretization? y x

Idea: piecewise linear model y x

t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Inhomogeneous dynamic Bayesian network with common changepoints

Inhomogenous dynamic Bayesian network with node-specific changepoints t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

NIPS 2009

Overview Introduction Non-homogeneous dynamic Bayesian network for non-stationary processes Flexible network structure Open problems

Non-stationarity in the regulatory process

Non-stationarity in the network structure

ICML 2010

Flexible network structure with regularization

Morphogenesis in Drosophila melanogaster Gene expression measurements over 66 time steps of 4028 genes (Arbeitman et al., Science, 2002). Selection of 11 genes involved in muscle development. Zhao et al. (2006), Bioinformatics 22

Transition probabilities: flexible structure with regularization Morphogenetic transitions: Embryo  larva larva  pupa pupa  adult

Comparison with: Dondelinger, Lèbre & Husmeier Ahmed & Xing

Collaboration with Frank Dondelinger and Sophie Lèbre NIPS 2010

Method based on homogeneous DBNs Method based on differential equations

Sample of high-scoring networks

Feature extraction, e.g. marginal posterior probabilities of the edges

Method based on homogeneous DBNs Method based on differential equations

Overview Introduction Non-homogeneous dynamic Bayesian network for non-stationary processes Flexible network structure Open problems

Exponential versus binomial prior distribution Exploration of various information sharing options

How to deal with static data?

Change-point process Free allocation

Allocation sampler versus change-point process More flexibility, unrestricted mixture model. Not restricted to time series Higher computational costs Incorporates plausible prior knowledge for time series. Reduced complexity Less universal, not applicable to static data

Marco Grzegorczyk University of Dortmund Germany Frank Dondelinger Biomathematics & Statistics Scotland United Kingdom Sophie Lèbre Université de Strasbourg France Acknowledgements

Further details for discussion during question time

Details on exponential prior

Hierarchical Bayesian model

MCMC scheme (for symmetric proposal distributions)

Details on other priors

where

Partition function Ignoring the fan-in restriction:  Number of genes

Simulation study We randomly generated 10 networks with 10 nodes each. Number of regulators for each node drawn from a Poisson distribution with mean=3. 5 time series segments Network changes: number of changes drawn from a Poisson distribution. For each segment: time series of length 50 generated from a linear regression model, interaction parameters drawn from N(0,1), iid Gaussian noise from N(0,1).

Synthetic simulation study No information sharing between adjacent segments Information sharing between adjacent segments Frank Dondelinger, Sophie Lèbre, Dirk Husmeier: ICML 2010