Presentation is loading. Please wait.

Presentation is loading. Please wait.

Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Similar presentations


Presentation on theme: "Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction."— Presentation transcript:

1

2

3 Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

4 Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Can we improve the network reconstruction by systematically integrating different sources of biological prior knowledge?

5

6 +

7 + +

8 + + + + …

9 Which sources of prior knowledge are reliable? How do we trade off the different sources of prior knowledge against each other and against the data?

10 Overview of the talk Revision: Bayesian networks Integration of prior knowledge Empirical evaluation

11 Overview of the talk Revision: Bayesian networks Integration of prior knowledge Empirical evaluation

12 Bayesian networks A CB D EF NODES EDGES Marriage between graph theory and probability theory. Directed acyclic graph (DAG) representing conditional independence relations. It is possible to score a network in light of the data: P(D|M), D:data, M: network structure. We can infer how well a particular network explains the observed data.

13

14 Bayesian networks versus causal networks Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.

15 Bayesian networks versus causal networks A CB A CB True causal graph Node A unknown

16 Bayesian networks versus causal networks A CB Equivalence classes: networks with the same scores: P(D|M). Equivalent networks cannot be distinguished in light of the data. A CB A CB A CB

17 Symmetry breaking A CB Prior knowledge A CB A CB A CB P(M|D) = P(D|M) P(M) / Z D: data. M: network structure

18 P(D|M)

19 Prior knowledge: B is a transcription factor with binding sites in the upstream regions of A and C P(M)

20 P(M|D) ~ P(D|M) P(M)

21 Learning Bayesian networks P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data

22

23

24 Overview of the talk Revision: Bayesian networks Integration of prior knowledge Empirical evaluation

25

26 Use TF binding motifs in promoter sequences

27 Biological prior knowledge matrix Biological Prior Knowledge Indicates some knowledge about the relationship between genes i and j

28 Biological prior knowledge matrix Biological Prior Knowledge Define the energy of a Graph G Indicates some knowledge about the relationship between genes i and j

29 Notation Prior knowledge matrix: P  B (for “belief”) Network structure: G (for “graph”) or M (for “model”) P: Probabilities

30 Prior distribution over networks Energy of a network

31 Sample networks and hyperparameters from the posterior distribution Capture intrinsic inference uncertainty Learn the trade-off parameters automatically P(M|D) = P(D|M) P(M) / Z

32 Prior distribution over networks Energy of a network

33 Rewriting the energy Energy of a network

34 Approximation of the partition function Partition function of a perfect gas

35 Multiple sources of prior knowledge

36 MCMC sampling scheme

37 Sample networks and hyperparameters from the posterior distribution Metropolis-Hastings scheme Proposal probabilities

38 Bayesian networks with biological prior knowledge Biological prior knowledge: Information about the interactions between the nodes. We use two distinct sources of biological prior knowledge. Each source of biological prior knowledge is associated with its own trade-off parameter:  1 and  2. The trade off parameter indicates how much biological prior information is used. The trade-off parameters are inferred. They are not set by the user!

39 Bayesian networks with two sources of prior Data BNs + MCMC Recovered Networks and trade off parameters Source 1 Source 2 11 22

40 Bayesian networks with two sources of prior Data BNs + MCMC Source 1 Source 2 11 22 Recovered Networks and trade off parameters

41 Bayesian networks with two sources of prior Data BNs + MCMC Source 1 Source 2 11 22 Recovered Networks and trade off parameters

42 Overview of the talk Revision: Bayesian networks Integration of prior knowledge Empirical evaluation

43 Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

44 Raf regulatory network From Sachs et al Science 2005

45 Raf regulatory network

46 Evaluation: Raf signalling pathway Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell Deregulation  carcinogenesis Extensively studied in the literature  gold standard network

47 Data Prior knowledge

48 Flow cytometry data Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins 5400 cells have been measured under 9 different cellular conditions (cues) Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

49 Microarray example Spellman et al (1998) Cell cycle 73 samples Tu et al (2005) Metabolic cycle 36 samples Genes time

50 Data Prior knowledge

51 KEGG PATHWAYS are a collection of manually drawn pathway maps representing our knowledge of molecular interactions and reaction networks. http://www.genome.jp/kegg/ Flow cytometry data and KEGG

52 Prior knowledge from KEGG

53 Prior distribution

54 The data and the priors + KEGG + Random

55 Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

56 Bayesian networks with two sources of prior Data BNs + MCMC Recovered Networks and trade off parameters Source 1 Source 2 11 22

57 Bayesian networks with two sources of prior Data BNs + MCMC Source 1 Source 2 11 22 Recovered Networks and trade off parameters

58 Sampled values of the hyperparameters

59 Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

60 How can we evaluate the reconstruction accuracy ?

61

62 Flow cytometry data and KEGG

63 Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

64 Learning the trade-off hyperparameter Repeat MCMC simulations for large set of fixed hyperparameters β Obtain AUC scores for each value of β Compare with the proposed scheme in which β is automatically inferred. Mean and standard deviation of the sampled trade off parameter

65

66 Conclusion Bayesian scheme for the systematic integration of different sources of biological prior knowledge. The method can automatically evaluate how useful the different sources of prior knowledge are. We get an improvement in the regulatory network reconstruction. This improvement is close to optimal.

67 Thank you


Download ppt "Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction."

Similar presentations


Ads by Google