Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.

Slides:



Advertisements
Similar presentations
Systems biology SAMSI Opening Workshop Algebraic Methods in Systems Biology and Statistics September 14, 2008 Reinhard Laubenbacher Virginia Bioinformatics.
Advertisements

Bayesian network for gene regulatory network construction
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Network biology Wang Jie Shanghai Institutes of Biological Sciences.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Mechanistic models and machine learning methods for TIMET Dirk Husmeier.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Introduction of Probabilistic Reasoning and Bayesian Networks
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Separation of Scales. Interpretation of Networks Most publications do not consider this.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
J. Daunizeau Wellcome Trust Centre for Neuroimaging, London, UK Institute of Empirical Research in Economics, Zurich, Switzerland Bayesian inference.
Reverse engineering gene and protein regulatory networks using Graphical Models. A comparative evaluation study. Marco Grzegorczyk Dirk Husmeier Adriano.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Simulation and Application on learning gene causal relationships Xin Zhang.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
6. Gene Regulatory Networks
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph Regularized Dual Lasso for Robust eQTL Mapping Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yu Shi 3 Wei.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.
Gene Set Enrichment Analysis (GSEA)
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Probabilistic Models that uncover the hidden Information Flow in Signalling Networks.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology.
Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Inferring gene regulatory networks with non-stationary dynamic Bayesian networks Dirk Husmeier Frank Dondelinger Sophie Lebre Biomathematics & Statistics.
Reconstructing gene regulatory networks with probabilistic models Marco Grzegorczyk Dirk Husmeier.
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
MCMC in structure space MCMC in order space.
Introduction to biological molecular networks
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Mechanistic models and machine learning methods for TIMET
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
CSCI2950-C Lecture 12 Networks
Incorporating graph priors in Bayesian networks
Representation, Learning and Inference in Models of Cellular Networks
Recovering Temporally Rewiring Networks: A Model-based Approach
Markov Properties of Directed Acyclic Graphs
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Estimating Networks With Jumps
1 Department of Engineering, 2 Department of Mathematics,
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Incorporating graph priors in Bayesian networks
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Presentation transcript:

Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli

+

+ +

Learning Bayesian networks from data and prior knowledge

Bayesian networks A CB D EF NODES EDGES Marriage between graph theory and probability theory. Directed acyclic graph (DAG) representing conditional independence relations. It is possible to score a network in light of the data. We can infer how well a particular network explains the observed data.

Bayesian networks versus causal networks A CB A CB True causal graph Node A unknown

Bayesian networks versus causal networks A CB Equivalence classes: networks with the same scores. Equivalent networks cannot be distinguished in light of the data. We can only learn the undirected graph. Unless… we use interventions or prior knowledge. A CB A CB A CB

Learning Bayesian networks from data P(M|D) = P(D|M) P(M) / Z

Use TF binding motifs in promoter sequences

Biological prior knowledge matrix Biological Prior Knowledge Indicates some knowledge about the relationship between genes i and j

Biological prior knowledge matrix Biological Prior Knowledge Define the energy of a Graph G Indicates some knowledge about the relationship between genes i and j

Prior distribution over networks Energy of a network

Sample networks and hyperparameters from the posterior distribution Capture intrinsic inference uncertainty Learn the trade-off parameters automatically P(M|D) = P(D|M) P(M) / Z

Prior distribution over networks Energy of a network

Rewriting the energy Energy of a network

Approximation of the partition function

Multiple sources of prior knowledge

Rewriting the energy Energy of a network

Approximation of the partition function

MCMC sampling scheme

Sample networks and hyperparameters from the posterior distribution Metropolis-Hastings scheme Proposal probabilities

MCMC with one prior Sample graph and the parameter . Separate in two samples to improve the acceptance: 1.Sample graph with  fixed. 2.Sample  with graph fixed.

Sample graph and the parameter . BGe BDe MCMC with one prior Separate in two samples to improve the acceptance: 1.Sample graph with  fixed. 2.Sample  with graph fixed.

Sample graph and the parameter . BGe BDe MCMC with one prior Separate in two samples to improve the acceptance: 1.Sample graph with  fixed. 2.Sample  with graph fixed.

Sample graph and the parameter . BGe BDe MCMC with one prior Separate in two samples to improve the acceptance: 1.Sample graph with  fixed. 2.Sample  with graph fixed.

Sample graph and the parameter . BGe BDe MCMC with one prior Separate in two samples to improve the acceptance: 1.Sample graph with  fixed. 2.Sample  with graph fixed.

Approximation of the partition function

MCMC with two priors Sample graph and the parameters    and  2 Separate in three samples to improve the acceptance: 1.Sample graph with  1 and  2 fixed. 2.Sample  1 with graph and  2 fixed. 3.Sample  2 with graph and  1 fixed.

Bayesian networks with biological prior knowledge Biological prior knowledge: Information about the interactions between the nodes. We use two distinct sources of biological prior knowledge. Each source of biological prior knowledge is associated with its own trade-off parameter:  1 and  2. The trade off parameter indicates how much biological prior information is used. The trade-off parameters are inferred. They are not set by the user!

Bayesian networks with two sources of prior Data BNs + MCMC Recovered Networks and trade off parameters Source 1 Source 2 11 22

Bayesian networks with two sources of prior Data BNs + MCMC Source 1 Source 2 11 22 Recovered Networks and trade off parameters

Bayesian networks with two sources of prior Data BNs + MCMC Source 1 Source 2 11 22 Recovered Networks and trade off parameters

Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

Application to the Raf regulatory network

Raf regulatory network From Sachs et al Science 2005

Evaluation: Raf signalling pathway Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell Deregulation  carcinogenesis Extensively studied in the literature  gold standard network

Data Prior knowledge

Intracellular multicolour flow cytometry. Measured protein concentrations. 11 proteins: 1200 concentration profiles. We sample 5 separate subsets with 100 concentration profiles each. Flow cytometry data and KEGG

Microarray example Spellman et al (1998) Cell cycle 73 samples Tu et al (2005) Metabolic cycle 36 samples Genes time

Data Prior knowledge

KEGG PATHWAYS are a collection of manually drawn pathway maps representing our knowledge of molecular interactions and reaction networks. Flow cytometry data and KEGG

Prior knowledge from KEGG

Prior distribution

The data and the priors + KEGG + Random

Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

Bayesian networks with two sources of prior Data BNs + MCMC Recovered Networks and trade off parameters Source 1 Source 2 11 22

Bayesian networks with two sources of prior Data BNs + MCMC Source 1 Source 2 11 22 Recovered Networks and trade off parameters

Sampled values of the hyperparameters

Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

How to compare the recovered networks?

True networkPredicted network compare Thresholding Performance evaluation

True networkPredicted network compare True Positives False Positives Counting Performance evaluation

True networkPredicted network compare True Positives False Positives Counting Performance evaluation DGE – Consider edge directions UGE – Discard the edge directions

Performance evaluation: ROC curves

We use the Area Under the Receiver Operating Characteristic Curve (AUC). AUC=0.75 AUC=1 AUC=0.5 Performance evaluation: ROC curves

Evaluation 2: TP scores We set the threshold such that we obtain 5 spurious edges (5 FPs) and count the corresponding number of true edges (TP count).

Flow cytometry data and KEGG

Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

Learning the trade-off hyperparameter Repeat MCMC simulations for large set of fixed hyperparameters β Obtain AUC scores for each value of β Compare with the proposed scheme in which β is automatically inferred. Mean and standard deviation of the sampled trade off parameter

Learning the trade-off hyperparameters on simulated data Mean and standard deviation of the sampled trade-off parameter Repeat MCMC simulations for large set of fixed hyperparameters β Obtain AUC scores for each value of β Compare with the proposed scheme in which β is automatically inferred.

Regulation of Raf-1 by Direct Feedback Phosphorylation. Molecular Cell, Vol. 17, 2005 Dougherty et al New evidence for the accepted network

Conclusion Bayesian scheme for the systematic integration of different sources of biological prior knowledge. The method can automatically evaluate how useful the different sources of prior knowledge are. We get an improvement in the regulatory network reconstruction. This improvement is close to optimal.

Thank you