Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology.

Similar presentations


Presentation on theme: "Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology."— Presentation transcript:

1 Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology at Edinburgh

2 Raf signalling network From Sachs et al Science 2005 Systems Biology

3

4 unknown high- throughput experiments postgenomic data machine learning statistical methods

5 Bayesian networks A CB D EF NODES EDGES Marriage between graph theory and probability theory. Directed acyclic graph (DAG) representing conditional independence relations. It is possible to score a network in light of the data: P(D|M), D:data, M: network structure. We can infer how well a particular network explains the observed data.

6 Model

7 Parameters

8

9

10

11

12 Learning Bayesian networks P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data

13

14 MCMC in structure space Madigan & York (1995), Guidici & Castello (2003)

15 Alternative paradigm: order MCMC Machine Learning, 2004

16

17 Successful application of Bayesian networks to the Raf regulatory network From Sachs et al Science 2005

18 Flow cytometry data Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins 5400 cells have been measured under 9 different cellular conditions (cues) Optimzation with hill climbing Perfect reconstruction

19 Microarray data Spellman et al (1998) Cell cycle 73 samples Tu et al (2005) Metabolic cycle 36 samples Genes time

20

21 AUC scores TP for FP=5

22 Part 1 Integration of prior knowledge

23

24 + + + + …

25

26

27 Use TF binding motifs in promoter sequences

28 Biological prior knowledge matrix Biological Prior Knowledge Define the energy of a Graph G Indicates some knowledge about the relationship between genes i and j

29 Prior distribution over networks Deviation between the network G and the prior knowledge B: Graph: є {0,1} Prior knowledge: є [0,1]“Energy” Hyperparameter

30 New contribution Generalization to more sources of prior knowledge Inferring the hyperparameters Bayesian approach

31 Multiple sources of prior knowledge

32 Sample networks and hyperparameters from the posterior distribution

33 Bayesian networks with two sources of prior Data BNs + MCMC Recovered Networks and trade off parameters Source 1 Source 2 11 22

34 Bayesian networks with two sources of prior Data BNs + MCMC Source 1 Source 2 11 22 Recovered Networks and trade off parameters

35 Bayesian networks with two sources of prior Data BNs + MCMC Source 1 Source 2 11 22 Recovered Networks and trade off parameters

36 Sample networks and hyperparameters from the posterior distribution with MCMC Metropolis-Hastings scheme Proposal probabilities

37 Sample networks and hyperparameters from the posterior distribution Metropolis-Hastings scheme Proposal probabilities

38 Sample networks and hyperparameters from the posterior distribution Metropolis-Hastings scheme Proposal probabilities

39 Prior distribution

40 Rewriting the energy Energy of a network

41 Approximation of the partition function Partition function of an ideal gas

42 Evaluation on the Raf regulatory network From Sachs et al Science 2005

43 Evaluation: Raf signalling pathway Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell Deregulation  carcinogenesis Extensively studied in the literature  gold standard network

44 Data Prior knowledge

45 Flow cytometry data Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins 5400 cells have been measured under 9 different cellular conditions (cues) Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

46 Microarray example Spellman et al (1998) Cell cycle 73 samples Tu et al (2005) Metabolic cycle 36 samples Genes time

47 Data Prior knowledge

48 Prior knowledge from KEGG

49 Prior distribution

50 Prior knowledge from KEGG Raf network 0.25 0 0.5 0 0.87 0 1 0.5 0 0 0 1 0.71 0 0

51 Data and prior knowledge + KEGG + Random

52 Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

53 Sampled values of the hyperparameters

54 Bayesian networks with two sources of prior knowledge Data BNs + MCMC Recovered Networks and trade off parameters Random KEGG 11 22

55 Bayesian networks with two sources of prior knowledge Data BNs + MCMC Random KEGG 11 22 Recovered Networks and trade off parameters

56 Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

57 We use the Area Under the Receiver Operating Characteristic Curve (AUC). 0.5<AUC<1 AUC=1 AUC=0.5 Performance evaluation: ROC curves

58 5 FP counts BN GGM RN Alternative performance evaluation: True positive (TP) scores

59

60 Flow cytometry data and KEGG

61 Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

62 Learning the trade-off hyperparameter Repeat MCMC simulations for large set of fixed hyperparameters β Obtain AUC scores for each value of β Compare with the proposed scheme in which β is automatically inferred. Mean and standard deviation of the sampled trade off parameter

63

64 Flow cytometry data and KEGG

65 Part 2 Combining data from different experimental conditions

66

67 What if we have multiple data sets obtained under different experimental conditions? Example: Cytokine network Infection Treatment with IFN Infection and treatment with IFN Collaboration with Peter Ghazal, Paul Dickinson, Kevin Robertson, Thorsten Forster & Steve Watterson.

68

69

70

71 data Monolithic Individual

72 data Monolithic Individual Propose a compromise between the two

73 M1M1 M2M2 22 11 D1D1 D2D2 M* MIMI II DIDI... Compromise between the two previous ways of combining the data

74 BGe or BDe Ideal gas approximation

75 MCMC

76 Empirical evaluation Real application: macrophages infected with CMV and pre-treated with IFN-γ No gold-standard Simulated data from the Raf signalling network

77 Simulated data Raf network

78 Simulated data

79 v-Raf network Simulated data

80 Raf network v-Raf network Simulated data

81

82 Simulated Data Weights between nodes are different for different data sets.

83 Simulated Data Weights between nodes are different for different data sets.

84 5 data sets 100 data points each 1 random data set (pure noise) 1 data set from the modified network 3 data sets from the Raf network, but with different regulations strengths

85 M1M1 M2M2 22 11 D1D1 M* MIMI II DIDI... Compromise between the two previous ways of combining the data

86 Corrupt, noisy data Modified network Raf network Posterior distribution of ß

87 M1M1 M2M2 22 11 D1D1 M* MIMI II DIDI... Compromise between the two previous ways of combining the data 0

88 5 data sets 100 data points each 1 random data set (pure noise) 1 data set from the modified network 3 data sets from the Raf network, but with different regulations strengths

89 Corrupt, noisy data Modified network Raf network

90 Network reconstruction accuracy

91 Convergence problems Coupling methodStd MCMC

92 Data sets: 1 rand (blue) 3 raf 1 vraf (cyan) Traceplots of sampled hyperparameters; Gaussian data set log likelihood

93 The MCMC simulations have convergence problems. If the simulations “converge”: –Random data set is identified and switched off. –Data from a slightly modified network are also identified. –The reconstructed network outperforms the two competing approaches. Future work: The convergence problems need to be addressed. Conclusions – Part 2

94 Part 3 Markov chain Monte Carlo

95

96 Learning Bayesian networks P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data

97 MCMC in structure space Madigan & York (1995), Guidici & Castello (2003)

98

99 Main idea Propose new parents from the distribution: Identify those new parents that are involved in the formation of directed cycles. Orphan them, and sample new parents for them subject to the acyclicity constraint.

100 1) Select a node2) Sample new parents3) Find directed cycles 4) Orphan “loopy” parents 5) Sample new parents for these parents

101 Mathematical Challenge: Show that condition of detailed balance is satisfied. Derive the Hastings factor … … which is a function of various partition functions

102 Acceptance probability

103

104 Summary Learning Bayesian networks from postgenomic data Integration of biological prior knowledge Learning regulatory networks from heterogeneous data obtained under different experimental conditions Improving MCMC

105 Acknowledgements Funding from the Scottish Government Rural and Environment Research and Analysis Directorate (RERAD) Collaboration with Adriano Werhli Marco Grzegorczyk

106 Adriano Werhli Marco Grzegorzcyk

107 Thank you! Any questions?


Download ppt "Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology."

Similar presentations


Ads by Google