Reverse engineering gene and protein regulatory networks using Graphical Models. A comparative evaluation study. Marco Grzegorczyk Dirk Husmeier Adriano.

Slides:



Advertisements
Similar presentations
Systems biology SAMSI Opening Workshop Algebraic Methods in Systems Biology and Statistics September 14, 2008 Reinhard Laubenbacher Virginia Bioinformatics.
Advertisements

Bayesian network for gene regulatory network construction
DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Mechanistic models and machine learning methods for TIMET Dirk Husmeier.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Introduction of Probabilistic Reasoning and Bayesian Networks
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Simulation and Application on learning gene causal relationships Xin Zhang.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
6. Gene Regulatory Networks
1 gR2002 Peter Spirtes Carnegie Mellon University.
Bayesian Networks Alan Ritter.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
A Brief Introduction to Graphical Models
Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Reverse Engineering of Genetic Networks (Final presentation)
Using Bayesian Networks to Analyze Expression Data By Friedman Nir, Linial Michal, Nachman Iftach, Pe'er Dana (2000) Presented by Nikolaos Aravanis Lysimachos.
Probabilistic Models that uncover the hidden Information Flow in Signalling Networks.
Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology.
Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.
Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Course files
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Inferring gene regulatory networks with non-stationary dynamic Bayesian networks Dirk Husmeier Frank Dondelinger Sophie Lebre Biomathematics & Statistics.
Reconstructing gene regulatory networks with probabilistic models Marco Grzegorczyk Dirk Husmeier.
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
MCMC in structure space MCMC in order space.
Lecture 2: Statistical learning primer for biologists
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Introduction on Graphic Models
Mechanistic models and machine learning methods for TIMET
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Journal club Jun , Zhen.
Incorporating graph priors in Bayesian networks
Learning gene regulatory networks in Arabidopsis thaliana
1 Department of Engineering, 2 Department of Mathematics,
A Short Tutorial on Causal Network Modeling and Discovery
1 Department of Engineering, 2 Department of Mathematics,
Estimating Networks With Jumps
1 Department of Engineering, 2 Department of Mathematics,
Regulation Analysis using Restricted Boltzmann Machines
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Presentation transcript:

Reverse engineering gene and protein regulatory networks using Graphical Models. A comparative evaluation study. Marco Grzegorczyk Dirk Husmeier Adriano Werhli

Systems biology Learning signalling pathways and regulatory networks from postgenomic data

possibly completely unknown

E.g.: Flow cytometry experiments data Here: Concentrations of (phosphorylated) proteins

possibly completely unknown E.g.: Flow cytometry experiments data Machine Learning statistical methods

true network extracted network Is the extracted network a good prediction of the real relationships?

true network extracted network biological knowledge (gold standard network) Evaluation of learning performance

Reverse Engineering of Regulatory Networks Can we learn network structures from postgenomic data themselves? Are there statistical methods to distinguish between direct and indirect correlations? Is it worth applying time-consuming Bayesian network approaches although computationally cheaper methods are available? Do active interventions improve the learning performances? –Gene knockouts (VIGs, RNAi)

direct interaction common regulator indirect interaction co-regulation

Reverse Engineering of Regulatory Networks Can we learn network structures from postgenomic data themselves? Are there statistical methods to distinguish between direct and indirect correlations? Is it worth applying time-consuming Bayesian network approaches although computationally cheaper methods are available? Do active interventions improve the learning performances? –Gene knockouts (VIGs, RNAi)

Relevance networks Graphical Gaussian models Bayesian networks Three widely applied methodologies:

Relevance networks Graphical Gaussian models Bayesian networks

Relevance networks (Butte and Kohane, 2000) 1.Choose a measure of association A(.,.) 2.Define a threshold value t A 3.For all pairs of domain variables (X,Y) compute their association A(X,Y) 4. Connect those variables (X,Y) by an undirected edge whose association A(X,Y) exceeds the predefined threshold value t A

Relevance networks (Butte and Kohane, 2000)

1.Choose a measure of association A(.,.) 2.Define a threshold value t A 3.For all pairs of domain variables (X,Y) compute their association A(X,Y) 4. Connect those variables (X,Y) by an undirected edge whose association A(X,Y) exceeds the predefined threshold value t A

direct interaction common regulator indirect interaction co-regulation

Pairwise associations without taking the context of the system into consideration direct interaction pseudo-correlations between A and B E.g.:Correlation between A and C is disturbed (weakend) by the influence of B

12 X 21 X 21 ‘direct interaction’ ‘common regulator’ ‘indirect interaction’ X strong correlation σ 12

Relevance networks Graphical Gaussian models Bayesian networks

Graphical Gaussian Models direct interaction Partial correlation, i.e. correlation conditional on all other domain variables Corr(X 1,X 2 |X 3,…,X n ) But usually: #observations < #variables strong partial correlation π 12

Shrinkage estimation of the covariance matrix (Schäfer and Strimmer, 2005) with where 0<λ 0 <1 estimated (optimal) shrinkage intensity, is guaranteed

direct interaction common regulator indirect interaction co-regulation

Graphical Gaussian Models direct interaction common regulator indirect interaction P(A,B)=P(A)·P(B) But: P(A,B|C)≠P(A|C)·P(B|C)

Further drawbacks Relevance networks and Graphical Gaussian models can extract undirected edges only. Bayesian networks promise to extract at least some directed edges. But can we trust in these edge directions? It may be better to learn undirected edges than learning directed edges with false orientations.

Relevance networks Graphical Gaussian models Bayesian networks

A CB D EF NODES EDGES Marriage between graph theory and probability theory. Directed acyclic graph (DAG) represents conditional independence relations. Markov assumption leads to a factorization of the joint probability distribution:

Bayesian networks versus causal networks Bayesian networks represent conditional (in)dependency relations - not necessarily causal interactions.

Bayesian networks versus causal networks A CB A CB True causal graph Node A unknown

Bayesian networks A CB D EF NODES EDGES Marriage between graph theory and probability theory. Directed acyclic graph (DAG) represents conditional independence relations. Markov assumption leads to a factorization of the joint probability distribution:

Bayesian networks Parameterisation: Gaussian BGe scoring metric: data~N(μ,Σ) with normal-Wishart distribution of the (unknown) parameters, i.e.: μ~N(μ*,(vW) -1 ) and W~Wishart(T 0 )

Bayesian networks BGe metric: closed form solution

Learning the network structure n46810 #DAGs5433,7· ,8 · ,2 ·10 18 Idea: Heuristically searching for the graph M* that is most supported by the data P(M*|data)>P(graph|data), e.g. greedy search graph → score BGe (graph)

MCMC sampling of Bayesian networks Better idea: Bayesian model averaging via Markov Chain Monte Carlo (MCMC) simulations Construct and simulate a Markov Chain (M t ) t in the space of DAGs {graph} whose distribution converges to the graph posterior distribution as stationary distribution, i.e.: P(M t =graph|data) → P(graph|data) t → ∞ to generate a DAG sample: G 1,G 2,G 3,…G T

Order MCMC (Friedman and Koller, 2003) Order MCMC generates a sample of node orders from which in a second step DAGs can be sampled: ABDEFC ABDEFC AFDEBC AFDEAC → DAG Acceptance probability (Metropolis Hastings) G 1,G 2,G 3,…G T DAG sample

Equivalence classes of BNs A B C A B A B A B C C C A B C completed partially directed graphs (CPDAGs) A C B v-structure P(A,B)=P(A)·P(B) P(A,B|C) ≠ P(A|C)·P(B|C) P(A,B)≠P(A)·P(B) P(A,B|C)=P(A|C)·P(B|C)

CPDAG representations A BC D EF DAGs CPDAGs Utilise the CPDAG sample for estimating the posterior probability of edge relation features: where I(G i ) is 1 if the CPDAG G i contains the directed edge A → B, and 0 otherwise A BC D EF

CPDAG representations A BC D EF A BC D EF DAGs CPDAGs Utilise the DAG (CPDAG) sample for estimating the posterior probability of edge relation features: where I(G i ) is 1 if the CPDAG of G i contains the directed edge A → B, and 0 otherwise A BC D EF interpretation superposition

Interventional data AB AB AB inhibition of A AB down-regulation of Bno effect on B A and B are correlated

Evaluation of Performance Relevance networks and Graphical Gaussian models extract undirected edges (scores = (partial) correlations) Bayesian networks extract undirected as well as directed edges (scores = posterior probabilities of edges) Undirected edges can be interpreted as superposition of two directed edges with opposite direction. How to cross-compare the learning performances when the true regulatory network is known? Distinguish between DGE (directed graph evaluation) and UGE (undirected graph evaluation)

data Probabilistic inference - DGE true regulatory network Thresholding edge scores TP:1/2 FP:0/4 TP:2/2 FP:1/4 concrete network predictions lowhigh

data Probabilistic inference - UGE undirected edge scores add up scores of directed edges with opposite direction skeleton of true regulatory network

data Probabilistic inference - UGE skeleton of true regulatory network undirected edge scores add up scores of directed edges with opposite direction + + +

data Probabilistic inference skeleton of true regulatory network undirected edge scores

data Probabilistic inference skeleton of true regulatory network Thresholding undirected edge scores TP:1/2 FP:0/1 TP:2/2 FP:1/1 highlow concrete network (skeleton) predictions

Evaluation 1: AUC scores Area under Receiver Operator Characteristic (ROC) curve AUC=0.5AUC=10.5<AUC≤1 sensitivity inverse specificity

Evaluation 2: TP scores We set the threshold such that we obtained 5 spurious edges (5 FPs) and counted the corresponding number of true edges (TP count).

Evaluation 2: TP scores 5 FP counts

Evaluation 2: TP scores 5 FP counts BN GGM RN

Evaluation On real experimental cytometric from the RAF signalling pathway for which a gold standard network is known On synthetic data simulated from this gold- standard network topology

Evaluation On real experimental cytometric from the RAF signalling pathway for which a gold standard network is known On synthetic data simulated from the gold- standard network topology

Evaluation: Raf signalling pathway Cellular signalling cascade which consists of 11 phosphorylated proteins and phospholipids in human immune systems cell Deregulation  carcinogenesis Extensively studied in the literature  gold standard network

‘gold standard RAF pathway‘ according to Sachs et al. (2004)

Raf pathway 11 nodes (proteins) and 20 directed edges

Data Intracellular multicolour flow cytometry experiments concentrations of 11 proteins 5400 cells have been measured under 9 different cellular conditions (cues) We decided to downsample our test data sets to 100 instances - indicative of microarray experiments

Two types of experiments

Evaluation On real experimental data, using the gold standard network from the literature On synthetic data simulated from this gold- standard network

Raf pathway

Gaussian simulated data

Netbuilder simulated data DNA + TF`s→ DNA●TF → mRNA → protein

DNA + TF A DNA●TF A DNA + TF B DNA●TF B Netbuilder simulated data Steady-state approximation KAKA KBKB

Generating data using Netbuilder tool The main idea of Netbuilder is instead of solving the steady state approximation to ODEs explicitly, we approximate them with a qualitatively equivalent combination of multiplications and sums of sigmoidal transfer functions. Netbuilder simulated data

Experimental Results

Synthetic data, observations

Synthetic data, interventions

Cytometry data, observations

Cytometry data, interventions

How can we explain the difference between synthetic and real data ?

Raf pathway

Regulation of Raf-1 by Direct Feedback Phosphorylation. Molecular Cell, Vol. 17, 2005 Dougherty et al Disputed structure of the gold- standard network

Complications with real data Interventions are not “ideal” owing to negative feedback loops. Putative negative feedback loops: Can we trust our gold-standard network?

Stabilisation through negative feedback loops inhibition

Conclusions 1 BNs and GGMs outperform RNs, most notably on Gaussian data. No significant difference between BNs and GGMs on observational data. For interventional data, BNs clearly outperform GGMs and RNs, especially when taking the edge direction (DGE score) rather than just the skeleton (UGE score) into account.

Conclusions 2 Performance on synthetic data better than on real data. Real data: more complex Real interventions are not ideal Errors in the gold-standard network

Additional analysis I: Raf pathway

CPDAGs of networks 3/20 directed edges13/16 directed edges ORIGINAL MODIFIED

ORIGINALMODIFIED Gaussian Netbuilder

Some additional analysis II

Thank you

References: Butte, A.S. and Kohane, I.S. (2003): Relevance networks: A first step toward finding genetic regulatory networks within microarray data. In Parmigiani, G., Garett, E.S., Irizarry, R.A. und Zeger, S.L. editors, The analysis of Gene Expression Data, pages Springer. Doughtery, M.K. et al. (2005): Regulation of Raf-1 by Direct Feedback Phosphorylation. Molecular Cell, 17, Friedman, N. and Koller, D. (2003): Being Bayesian about network structure. Machine Learning, 50: Madigan, D. and York, J. (1995): Bayesian graphical models for discrete data. International Statistical Review, 63: Sachs, K., Perez, O., Pe`er, D., Lauffenburger, D.A., Nolan, G.P. (2004): Protein-signaling networks derived from multiparameter single-cell data. Science, 308: Schäfer, J. and Strimmer, K. (2005): A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1): Article 32.