Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Similar presentations


Presentation on theme: "Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland."— Presentation transcript:

1 Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland

2 James Watson & Francis Crick, 1953

3 Frederick Sanger, 1980

4

5 Microarrays Next generation sequencing

6

7 PART 1 Genomics

8

9

10

11

12

13

14

15

16

17

18 Maximum likelihood: Forward-backward algorithm Expectation maximization algorithm

19 Bayesian inference: Gibbs sampling Stochastic forward-backward algorithm

20 Beta distribution

21

22

23

24

25 Factorial HMM

26

27

28

29 PART 2 Systems Biology

30

31

32 Network reconstruction from postgenomic data

33 Model Parameters q

34 Friedman et al. (2000), J. Comp. Biol. 7, 601-620 Marriage between graph theory and probability theory

35 Bayes net ODE model

36 Model Parameters q Probability theory  Likelihood

37 Model Parameters q Bayesian networks: integral analytically tractable!

38 UAI 1994

39 Identify the best network structure Ideal scenario: Large data sets, low noise

40 Uncertainty about the best network structure Limited number of experimental replications, high noise

41 Sample of high-scoring networks

42 Feature extraction, e.g. marginal posterior probabilities of the edges High-confident edge High-confident non-edge Uncertainty about edges

43 Number of structures Number of nodes Sampling with MCMC

44 Madigan & York (1995), Guidici & Castello (2003)

45

46 Overview Introduction Limitations Methodology Application to morphogenesis Application to synthetic biology

47 Homogeneity assumption Interactions don’t change with time

48 Limitations of the homogeneity assumption

49 Example: 4 genes, 10 time points t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

50 Supervised learning. Here: 2 components t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

51 Changepoint model Parameters can change with time

52 Changepoint model Parameters can change with time

53 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Unsupervised learning. Here: 3 components

54 Extension of the model q

55 q

56 q k h Number of components (here: 3) Allocation vector

57 Analytically integrate out the parameters q k h Number of components (here: 3) Allocation vector

58

59 P(network structure | changepoints, data) P(changepoints | network structure, data) Birth, death, and relocation moves RJMCMC within Gibbs

60 Dynamic programming, complexity N 2

61

62 Collaboration with the Institute of Molecular Plant Sciences at Edinburgh University (Andrew Millar’s group) - Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3 - Transcriptional profiles at 4*13 time points in 2h intervals under constant light for - 4 experimental conditions Circadian rhythms in Arabidopsis thaliana

63 Comparison with the literature Precision Proportion of identified interactions that are correct Recall = Sensitivity Proportion of true interactions that we successfully recovered Specificity Proportion of non-interactions that are successfully avoided

64 CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 False negative Which interactions from the literature are found? True positive Blue: activations Red: Inhibitions True positives (TP) = 8 False negatives (FN) = 5 Recall= 8/13= 62%

65 Which proportion of predicted interactions are confirmed by the literature? False positives Blue: activations Red: Inhibitions True positive True positives (TP) = 8 False positives (FP) = 13 Precision = 8/21= 38%

66 Precision= 38% CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 Recall= 62%

67 Literature = gold standard  Scores are pessimistic Precision=50% Recall=50% Not random expectation

68 True positives (TP) = 8 False positives (FP) = 13 False negatives (FN) = 5 True negatives (TN) = 9²-8-13-5= 55 Sensitivity = TP/[TP+FN] = 62% Specificity = TN/[TN+FP] = 81% Recall Proportion of avoided non-interactions

69 Model extension So far: non-stationarity in the regulatory process

70 Non-stationarity in the network structure

71 Flexible network structure.

72 Model Parameters q

73 Use prior knowledge!

74 Flexible network structure.

75 Flexible network structure with regularization Hyperparameter Normalization factor

76 Flexible network structure with regularization Exponential prior versus Binomial prior with conjugate beta hyperprior

77 NIPS 2010

78 Overview Introduction Limitations Methodology Application to morphogenesis Application to synthetic biology

79 Morphogenesis in Drosophila melanogaster Gene expression measurements at 66 time points during the life cycle of Drosophila (Arbeitman et al., Science, 2002). Selection of 11 genes involved in muscle development. Zhao et al. (2006), Bioinformatics 22

80 Can we learn the morphogenetic transitions: embryo  larva larva  pupa pupa  adult ?

81 Average posterior probabilities of transitions Morphogenetic transitions: Embryo  larva larva  pupa pupa  adult

82

83 Can we learn changes in the regulatory network structure ?

84

85 Overview Introduction Limitations Methodology Application to morphogenesis Application to synthetic biology

86

87

88 Can we learn the switch Galactose  Glucose? Can we learn the network structure?

89 Task 1: Changepoint detection Switch of the carbon source: Galactose  Glucose

90

91 Task 2: Network reconstruction Precision Proportion of identified interactions that are correct Recall Proportion of true interactions that we successfully recovered

92 BANJO: Conventional homogeneous DBN TSNI: Method based on differential equations Inference: optimization, “best” network

93

94 Sample of high-scoring networks

95 Marginal posterior probabilities of the edges P=1 P=0 P=0.5

96

97 Part 3 Future work Strategic issues

98

99

100

101 Phylogenetics  phylogenomics High performance computing

102 How are we getting from here …

103 … to there ?!

104 Phylogenetics  phylogenomics High performance computing Collaboration with computer scientists

105 Input: Learn: MCMC

106

107 Phylogenetics  phylogenomics High performance computing Collaboration with computer scientists Collaboration with biologists

108 Phylogenetics  phylogenomics High performance computing Collaboration with computer scientists Collaboration with biologists MRC University of Glasgow Centre of Excellence in Virology (  virus evolution, virus-host interactions)

109 Scottish Government 2011-2016 science strategy: Climate change and biodiversity

110

111 Spatial autocorrelation and bio-climate variables Spatial autocorrelation: Z= weighted abundance from Markov neighbourhood. Bio-climate variables: Z= temperature, water, …

112 Ecological Informatics 5, 451-464, 2010

113 Collaboration with Andrej Aderhold V Anne Smith School of Biology University of St Andrews

114 Collaboration with Andrej Aderhold (Computer Scientist) V Anne Smith (Biologist) School of Biology University of St Andrews

115 Computer Science Biology Statistics

116 Phylogenetics  phylogenomics High performance computing Collaboration with computer scientists Collaboration with biologists MRC University of Glasgow Centre of Excellence in Virology (  virus evolution, virus-host interactions) Ecological networks and biodiversity


Download ppt "Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland."

Similar presentations


Ads by Google