Download presentation
Presentation is loading. Please wait.
Published byAdam Turner Modified over 9 years ago
1
Mechanistic models and machine learning methods for TIMET Dirk Husmeier
2
Protein signalling pathway From Sachs et al Science 2005 Cell membrane Receptor molecules Inhibition Activation Interaction in signalling pathway Phosphorylated protein
3
Can we learn the signalling pathway from data? From Sachs et al Science 2005 Cell membrane Receptor molecules Inhibition Activation Interaction in signalling pathway Phosphorylated protein
4
Network unknown High-throughput experiments Postgenomicdata Machine learning Statistics
5
Methodology Mechanistic models Machine learning methods Workpackages WP1.7: Re-calibrate the circadian clock model for mature plants growing without exogeneous sugars. WP 2.4: Bi-directional regulation: Mechanistic modelling of each metabolic pathway, with connections to the clock. WP 2.5: Bi-directional regulation: Testing predictions of bidirectional models.
6
Methodology Mechanistic models Bayesian networks Integration of biological prior knowledge Non-homogeneous Bayesian network for non-stationary processes
7
Regulatory network
8
Elementary molecular biological processes
9
Description with differential equations
11
Kinetic parameters q Concentrations Rates
12
Description with differential equations Rates Concentrations Kinetic parameters q
13
Parameters q known: Numerically integrate the differential equations for different hypothetical networks
14
Experiment: Gene expression time series Can we infer the correct gene regulatory network?
15
Model selection for known parameters q Gene expression time series predicted with different models Measured gene expression time series Highest likelihood: best model Compare
16
Model selection for unknown parameters q Gene expression time series predicted with different models Measured gene expression time series Joint maximum likelihood:
17
1) Practical problem: numerical optimization q 2) Conceptual problem: overfitting ML estimate increases on increasing the network complexity
18
Regularization E.g.: BIC Maximum likelihood parameters Number of parameters Number of data points Data misfit term Regularization term
19
Model selection: find the best pathway Select the model with the highest posterior probability: This requires an integration over the whole parameter space:
20
Model selection: find the best pathway Select the model with the highest posterior probability: This requires an integration over the whole parameter space: This integral is usually analytically intractable
21
Complexity problem This requires an integration over the whole parameter space: The numerical approximation is highly non-trivial q
23
Marginal likelihoods for the alternative pathways Computational expensive, network reconstruction ab initio unfeasible
24
NIPS 2008
25
Objective: Reconstruction of regulatory networks ab initio Higher level of abstraction: Bayesian networks
26
Machine learning methods Bayesian networks (overview) Integration of biological prior knowledge Non-homogeneous Bayesian networks for non-stationary processes Circadian gene regulatory network in Arabidopsis thaliana Current work
27
Friedman et al. (2000), J. Comp. Biol. 7, 601-620 Marriage between graph theory and probability theory
28
Bayes net ODE model
29
[A]= w1[P1] + w2[P2] + w3[P3] + w4[P4] + noise Linear model A P1 P2 P4 P3 w1 w4 w2 w3
30
Model Parameters q Integral analytically tractable!
31
Example: 2 genes 16 different network structures Best network: maximum score
32
Identify the best network structure Ideal scenario: Large data sets, low noise
33
Uncertainty about the best network structure Limited number of experimental replications, high noise
34
Sample of high-scoring networks
35
Feature extraction, e.g. marginal posterior probabilities of the edges
36
Sample of high-scoring networks Feature extraction, e.g. marginal posterior probabilities of the edges High-confident edge High-confident non-edge Uncertainty about edges
37
Can we generalize this scheme to more than 2 genes? In principle yes. However …
38
Number of structures Number of nodes
39
Configuration space of network structures Find the high-scoring structures Sampling from the posterior distribution
40
Configuration space of network structures MCMC Local change Ifaccept If accept with probability
41
Machine learning methods Bayesian networks (overview) Integration of biological prior knowledge Non-homogeneous Bayesian network for non-stationary processes Circadian gene network in Arabidopsis thaliana Current work
42
Bayesian inference Select the model based on the posterior probability: This requires an integration over the whole parameter space:
43
Uncertainty about the best network structure Limited number of experimental replications, high noise
44
Reduced uncertainty by using prior knowledge DataPrior knowledge
45
Hyperparameter β trades off data versus prior knowledge KEGG pathway Microarray data β Bayesian analysis: integration of prior knowledge
46
Hyperparameter β trades off data versus prior knowledge KEGG pathway Microarray data β small
47
Hyperparameter β trades off data versus prior knowledge KEGG pathway Microarray data β large
48
Input: Learn: MCMC
50
Raf signalling pathway From Sachs et al Science 2005 Cell membrane Receptor molecules Inhibition Activation Interaction in signalling pathway Phosphorylated protein
51
Flow cytometry data Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins 5400 cells have been measured under 9 different cellular conditions (cues) Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments
52
Prior knowledge from KEGG 0.25 0 0.5 0 0.87 0 1 0.5 0 0 1 0.71 0 0 Data: protein concentrations from flow cytometry experiments
53
Protein signalling network from the literature
54
Predicted network 11 nodes, 20 edges, 90 non-edges 20 top-scoring edges: 15/20 correct 5/90 false 75% 94%
55
Machine learning methods Bayesian networks (overview) Integration of biological prior knowledge Non-homogeneous Bayesian network for non-stationary processes Circadian gene regulatory network in Arabidopsis thaliana Current work
57
Example: 4 genes, 10 time points t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10
58
t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Standard dynamic Bayesian network: homogeneous model
59
Our new model: heterogeneous dynamic Bayesian network. Here: 2 components t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10
60
t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Our new model: heterogeneous dynamic Bayesian network. Here: 3 components
61
Learning with MCMC q k h Number of components (here: 3) Allocation vector
62
Morphogenesis in Drosophila melanogaster Gene expression measurements over 66 time steps of 4028 genes (Arbeitman et al., Science, 2002). Selection of 11 genes involved in muscle development. Zhao et al. (2006), Bioinformatics 22
63
t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Heterogeneous dynamic Bayesian network: Plausible segmentation?
64
Number of components
65
Four stages of the Drosophila life cycle: embryo larva pupa adult
66
time
67
Morphogenetic transitions: Embryo larva larva pupa pupa adult time Gene expression program governing the transition to adult morphology active well before the fly emerges from the pupa.
68
Machine learning methods Bayesian networks (overview) Integration of biological prior knowledge Non-homogeneous Bayesian network for non-stationary processes Circadian gene regulatory network in Arabidopsis thaliana Current work
69
Collaboration with the Institute of Molecular Plant Sciences at Edinburgh University (Andrew Miller’s group) 2 time series T 20 and T 28 of microarray gene expression data from Arabidopsis thaliana. - Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3 - Both time series measured under constant light condition at 13 time points: 0h, 2h,…, 24h, 26h - Plants entrained with different light:dark cycles 10h:10h (T 20 ) and 14h:14h (T 28 ) Circadian rhythms in Arabidopsis thaliana
70
Gene expression time series plots (Arabidopsis data T 20 and T 28 ) T 28 T 20
71
Predicted network Blue – activation Red – inhibition Black – mixture Three different line widths: - thin = PP>0.5 - medium = PP>0.75 - fat = PP>0.9
72
Cogs of the Plant Clockwork Review – Rob McClung, Plant Cell 2006 Two major gene classes… Morning genes e.g. LHY, CCA1 … repress evening genes e.g. TOC1, ELF3, ELF4, GI, LUX … which activate LHY and CCA1
73
Literature vs. inferred network CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 False positivesFalse negatives
74
True positives (TP) = 8 False positives (FP) = 13 False negatives (FN) = 5 True negatives (TN) = 9²-8-13-5= 55 Sensitivity = TP/[TP+FN] = 62% Specificity = TN/[TN+FP] = 81%
75
Overview of the plant clock model Unknown component X allows > 8h delay between TOC1 and LHY/CCA1 expression X LHY/ CCA1 TOC1 Y (GI) PRR9/ PRR7 MorningEvening ZTL Locke et al. Mol. Syst. Biol. 2006
76
Literature vs. inferred network CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 False positivesFalse negatives
77
Machine learning methods Bayesian networks (overview) Integration of biological prior knowledge Non-homogeneous Bayesian network for non-stationary processes Circadian gene regulatory network in Arabidopsis thaliana Current work
78
Flexible network structure with regularization Joint work with Sophie Lèbre and Frank Dondelinger
79
Morphogenetic transitions: Embryo larva larva pupa pupa adult time Gene expression program governing the transition to adult morphology active well before the fly emerges from the pupa. Drosophila melanogaster: Expression of 11 muscle development genes over 66 time points Fixed structure, flexible parameters
80
Transition probabilities: flexible structure with regularization Morphogenetic transitions: Embryo larva larva pupa pupa adult
81
Comparison with: Dondelinger, Lèbre & Husmeier Ahmed & Xing
82
Summary Mechanistic models Bayesian networks Integration of biological prior knowledge Non-homogeneous Bayesian network for non-stationary processes
83
Thank you! Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.