Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Bayesian network for gene regulatory network construction
A Tutorial on Learning with Bayesian Networks
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Mechanistic models and machine learning methods for TIMET Dirk Husmeier.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
Reverse engineering gene and protein regulatory networks using Graphical Models. A comparative evaluation study. Marco Grzegorczyk Dirk Husmeier Adriano.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Approximate Inference 2: Monte Carlo Markov Chain
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
A Brief Introduction to Graphical Models
Gene Set Enrichment Analysis (GSEA)
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Reverse Engineering of Genetic Networks (Final presentation)
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology.
Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
Inferring gene regulatory networks with non-stationary dynamic Bayesian networks Dirk Husmeier Frank Dondelinger Sophie Lebre Biomathematics & Statistics.
Reconstructing gene regulatory networks with probabilistic models Marco Grzegorczyk Dirk Husmeier.
MCMC in structure space MCMC in order space.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Lecture 2: Statistical learning primer for biologists
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Pattern Recognition and Machine Learning
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Network Arnaud Doucet Nando de Freitas Kevin Murphy Stuart Russell.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Mechanistic models and machine learning methods for TIMET
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Canadian Bioinformatics Workshops
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Incorporating graph priors in Bayesian networks
Markov Networks.
Markov Networks.
Presentation transcript:

Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology at Edinburgh

Systems Biology

Cell membran nucleus Protein activation cascade TF phosphorylation -> cell response

Raf signalling network From Sachs et al Science 2005

unknown high- throughput experiments postgenomic data machine learning statistical methods

Differential equation models Multiple parameter sets can offer equally plausible solutions. Multimodality in parameters space: point estimates become meaningless. Overfitting problem  not suitable for model selection. Bayesian approach: computing of marginal likelihood computationally challenging.

Bayesian networks A CB D EF NODES EDGES Marriage between graph theory and probability theory. Directed acyclic graph (DAG) representing conditional independence relations. It is possible to score a network in light of the data: P(D|M), D:data, M: network structure. We can infer how well a particular network explains the observed data.

Learning Bayesian networks P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data

MCMC in structure space Madigan & York (1995), Guidici & Castello (2003)

Alternative paradigm: order MCMC

MCMC in structure space Instead of

MCMC in order space

Problem: Distortion of the prior distribution

A A A B B B AB BA

A A A B B B AB BA 0.5

A A A B B B AB BA

A A A B B B AB BA

A A A B B B AB BA

Current work with Marco Grzegorczyk MCMC in structure space rather than order space. Design new proposal moves that achieve faster mixing and convergence. Proposed new paradigm

First idea Propose new parents from the distribution: Identify those new parents that are involved in the formation of directed cycles. Orphan them, and sample new parents for them subject to the acyclicity constraint.

1) Select a node2) Sample new parents3) Find directed cycles 4) Orphan “loopy” parents 5) Sample new parents for these parents

Problem: This move is not reversible Path via illegal structure

Devise a simpler move that is reversible Identify a pair of nodes X Y Orphan both nodes. Sample new parents from the “Boltzmann distribution” subject to the acyclicity constraint such the inverse edge Y X is included. C1 C2 C1,2

1) Select an edge 2) Orphan the nodes involved3) Constrained resampling of the parents

This move is reversible!

1) Select an edge 2) Orphan the nodes involved3) Constrained resampling of the parents

Simple idea Mathematical Challenge: Show that condition of detailed balance is satisfied. Derive the Hastings factor … … which is a function of various partition functions

Acceptance probability

Ergodicity The new move is reversible but … … not irreducible AB BA BA Theorem: A mixture with an ergodic transition kernel gives an ergodic Markov chain. REV-MCMC: at each step randomly switch between a conventional structure MCMC step and the proposed new move.

Does the new method avoid the bias intrinsic to order MCMC? How do convergence and mixing compare to structure and order MCMC? What is the effect on the network reconstruction accuracy? Evaluation

Results Analytical comparison of the convergence properties Empirical comparison of the convergence properties Evaluation of the systematic bias Molecular regulatory network reconstruction with prior knowledge

Analytical comparison of the convergence properties Generate data from a noisy XOR Enumerate all 3-node networks t

Analytical comparison of the convergence properties Generate data from a noisy XOR Enumerate all 3-node networks Compute the posterior distribution p° Compute the Markov transition matrix A for the different MCMC methods Compute the Markov chain p(t+1)= A p(t) Compute the (symmetrized) KL divergence KL(t)= t

Solid line: REV-MCMC. Other lines: structure MCMC and different versions of inclusion-driven MCMC

Results Analytical comparison of the convergence properties Empirical comparison of the convergence properties Evaluation of the systematic bias Molecular regulatory network reconstruction with prior knowledge

Empirical comparison of the convergence and mixing properties Standard benchmark data: Alarm network (Beinlich et al. 1989) for monitoring patients in intensive care 37 nodes, 46 directed edges Generate data sets of different size Compare the three MCMC algorithms under the same computational costs  structure MCMC (1.0E6)  order MCMC (1.0E5)  REV-MCMC (1.0E5)

AUC=0.75 AUC=1 AUC=0.5 What are the implications for network reconstruction ? ROC curves Area under the ROC curve (AUROC)

Conclusion Structure MCMC has convergence and mixing difficulties. Order MCMC and REV-MCMC show a similar (and much better) performance.

Conclusion Structure MCMC has convergence and mixing difficulties. Order MCMC and REV-MCMC show a similar (and much better) performance. How about the bias?

Results Analytical comparison of the convergence properties Empirical comparison of the convergence properties Evaluation of the systematic bias Molecular regulatory network reconstruction with prior knowledge

Evaluation of the systematic bias using standard benchmark data Standard machine learning benchmark data: FLARE and VOTE Restriction to 5 nodes  complete enumeration possible (~ 1.0E4 structures) The true posterior probabilities of edge features can be computed Compute the difference between the true scores and those obtained with MCMC

Deviations between true and estimated directed edge feature posterior probabilities

Results Analytical comparison of the convergence properties Empirical comparison of the convergence properties Evaluation of the systematic bias Molecular regulatory network reconstruction with prior knowledge

Raf regulatory network From Sachs et al Science 2005

Raf signalling pathway Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell Deregulation  carcinogenesis Extensively studied in the literature  gold standard network

Data Prior knowledge

Flow cytometry data Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins 5400 cells have been measured under 9 different cellular conditions (cues) Downsampling to 10 & 100 instances (5 separate subsets): indicative of microarray experiments

Data Prior knowledge

Biological prior knowledge matrix Biological Prior Knowledge Define the energy of a Graph G Indicates some knowledge about the relationship between genes i and j P  B (for “belief”)

Prior distribution over networks Energy of a network

Prior knowledge Sachs et al. Edge Non-edge

AUROC scores

Conclusion True prior knowledge that is strong  no significant difference True prior knowledge that is weak  Order MCMC leads to a slight yet significant deterioration. (Significant at the p=0.01 value obtained from a paired t-test).

Prior knowledge from KEGG

Flow cytometry data and KEGG

The new method avoids the bias intrinsic to order MCMC. Its convergence and mixing are similar to order MCMC; both methods outperform structure MCMC. We can get an improvement over order MCMC when using explicit prior knowledge. Conclusions

Thank you! Any questions?