Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Slides:



Advertisements
Similar presentations
A Tutorial on Learning with Bayesian Networks
Advertisements

DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Le Song Joint work with Mladen Kolar and Eric Xing KELLER: Estimating Time Evolving Interactions Between Genes.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Dynamic Bayesian Networks (DBNs)
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Mechanistic models and machine learning methods for TIMET Dirk Husmeier.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
Author: Jim C. Huang etc. Lecturer: Dong Yue Director: Dr. Yufei Huang.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
Conditional Random Fields
+ Protein and gene model inference based on statistical modeling in k-partite graphs Sarah Gester, Ermir Qeli, Christian H. Ahrens, and Peter Buhlmann.
6. Gene Regulatory Networks
Learning Bayesian Networks (From David Heckerman’s tutorial)
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
Bayes Factor Based on Han and Carlin (2001, JASA).
Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Reverse Engineering of Genetic Networks (Final presentation)
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology.
Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
1 Methods for evaluating inference algorithms June, 2005 Omer Berkman Tel Aviv University, Israel.
A ROBUST B AYESIAN TWO - SAMPLE TEST FOR DETECTING INTERVALS OF DIFFERENTIAL GENE EXPRESSION IN MICROARRAY TIME SERIES Oliver Stegle, Katherine Denby,
Randomized Algorithms for Bayesian Hierarchical Clustering
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Inferring gene regulatory networks with non-stationary dynamic Bayesian networks Dirk Husmeier Frank Dondelinger Sophie Lebre Biomathematics & Statistics.
Reconstructing gene regulatory networks with probabilistic models Marco Grzegorczyk Dirk Husmeier.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.
Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.
Introduction to biological molecular networks
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
John Lafferty Andrew McCallum Fernando Pereira
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Mechanistic models and machine learning methods for TIMET
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
Canadian Bioinformatics Workshops
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Oliver Schulte Machine Learning 726
Incorporating graph priors in Bayesian networks
Graduate School of Information Sciences, Tohoku University
Learning gene regulatory networks in Arabidopsis thaliana
Bayes Net Learning: Bayesian Approaches
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
Recovering Temporally Rewiring Networks: A Model-based Approach
CSCI 5822 Probabilistic Models of Human and Machine Learning
CSCI 5822 Probabilistic Models of Human and Machine Learning
Graduate School of Information Sciences, Tohoku University
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Presentation transcript:

Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland

James Watson & Francis Crick, 1953

Frederick Sanger, 1980

Microarrays Next generation sequencing

PART 1 Genomics

Maximum likelihood: Forward-backward algorithm Expectation maximization algorithm

Bayesian inference: Gibbs sampling Stochastic forward-backward algorithm

Beta distribution

Factorial HMM

PART 2 Systems Biology

Network reconstruction from postgenomic data

Model Parameters q

Friedman et al. (2000), J. Comp. Biol. 7, Marriage between graph theory and probability theory

Bayes net ODE model

Model Parameters q Probability theory  Likelihood

Model Parameters q Bayesian networks: integral analytically tractable!

UAI 1994

Identify the best network structure Ideal scenario: Large data sets, low noise

Uncertainty about the best network structure Limited number of experimental replications, high noise

Sample of high-scoring networks

Feature extraction, e.g. marginal posterior probabilities of the edges High-confident edge High-confident non-edge Uncertainty about edges

Number of structures Number of nodes Sampling with MCMC

Madigan & York (1995), Guidici & Castello (2003)

Overview Introduction Limitations Methodology Application to morphogenesis Application to synthetic biology

Homogeneity assumption Interactions don’t change with time

Limitations of the homogeneity assumption

Example: 4 genes, 10 time points t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

Supervised learning. Here: 2 components t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

Changepoint model Parameters can change with time

Changepoint model Parameters can change with time

t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Unsupervised learning. Here: 3 components

Extension of the model q

q

q k h Number of components (here: 3) Allocation vector

Analytically integrate out the parameters q k h Number of components (here: 3) Allocation vector

P(network structure | changepoints, data) P(changepoints | network structure, data) Birth, death, and relocation moves RJMCMC within Gibbs

Dynamic programming, complexity N 2

Collaboration with the Institute of Molecular Plant Sciences at Edinburgh University (Andrew Millar’s group) - Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3 - Transcriptional profiles at 4*13 time points in 2h intervals under constant light for - 4 experimental conditions Circadian rhythms in Arabidopsis thaliana

Comparison with the literature Precision Proportion of identified interactions that are correct Recall = Sensitivity Proportion of true interactions that we successfully recovered Specificity Proportion of non-interactions that are successfully avoided

CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 False negative Which interactions from the literature are found? True positive Blue: activations Red: Inhibitions True positives (TP) = 8 False negatives (FN) = 5 Recall= 8/13= 62%

Which proportion of predicted interactions are confirmed by the literature? False positives Blue: activations Red: Inhibitions True positive True positives (TP) = 8 False positives (FP) = 13 Precision = 8/21= 38%

Precision= 38% CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 Recall= 62%

Literature = gold standard  Scores are pessimistic Precision=50% Recall=50% Not random expectation

True positives (TP) = 8 False positives (FP) = 13 False negatives (FN) = 5 True negatives (TN) = 9² = 55 Sensitivity = TP/[TP+FN] = 62% Specificity = TN/[TN+FP] = 81% Recall Proportion of avoided non-interactions

Model extension So far: non-stationarity in the regulatory process

Non-stationarity in the network structure

Flexible network structure.

Model Parameters q

Use prior knowledge!

Flexible network structure.

Flexible network structure with regularization Hyperparameter Normalization factor

Flexible network structure with regularization Exponential prior versus Binomial prior with conjugate beta hyperprior

NIPS 2010

Overview Introduction Limitations Methodology Application to morphogenesis Application to synthetic biology

Morphogenesis in Drosophila melanogaster Gene expression measurements at 66 time points during the life cycle of Drosophila (Arbeitman et al., Science, 2002). Selection of 11 genes involved in muscle development. Zhao et al. (2006), Bioinformatics 22

Can we learn the morphogenetic transitions: embryo  larva larva  pupa pupa  adult ?

Average posterior probabilities of transitions Morphogenetic transitions: Embryo  larva larva  pupa pupa  adult

Can we learn changes in the regulatory network structure ?

Overview Introduction Limitations Methodology Application to morphogenesis Application to synthetic biology

Can we learn the switch Galactose  Glucose? Can we learn the network structure?

Task 1: Changepoint detection Switch of the carbon source: Galactose  Glucose

Task 2: Network reconstruction Precision Proportion of identified interactions that are correct Recall Proportion of true interactions that we successfully recovered

BANJO: Conventional homogeneous DBN TSNI: Method based on differential equations Inference: optimization, “best” network

Sample of high-scoring networks

Marginal posterior probabilities of the edges P=1 P=0 P=0.5

Part 3 Future work Strategic issues

Phylogenetics  phylogenomics High performance computing

How are we getting from here …

… to there ?!

Phylogenetics  phylogenomics High performance computing Collaboration with computer scientists

Input: Learn: MCMC

Phylogenetics  phylogenomics High performance computing Collaboration with computer scientists Collaboration with biologists

Phylogenetics  phylogenomics High performance computing Collaboration with computer scientists Collaboration with biologists MRC University of Glasgow Centre of Excellence in Virology (  virus evolution, virus-host interactions)

Scottish Government science strategy: Climate change and biodiversity

Spatial autocorrelation and bio-climate variables Spatial autocorrelation: Z= weighted abundance from Markov neighbourhood. Bio-climate variables: Z= temperature, water, …

Ecological Informatics 5, , 2010

Collaboration with Andrej Aderhold V Anne Smith School of Biology University of St Andrews

Collaboration with Andrej Aderhold (Computer Scientist) V Anne Smith (Biologist) School of Biology University of St Andrews

Computer Science Biology Statistics

Phylogenetics  phylogenomics High performance computing Collaboration with computer scientists Collaboration with biologists MRC University of Glasgow Centre of Excellence in Virology (  virus evolution, virus-host interactions) Ecological networks and biodiversity