Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Can we improve the network reconstruction by systematically integrating different sources of biological prior knowledge?

+ + + + …

Which sources of prior knowledge are reliable? How do we trade off the different sources of prior knowledge against each other and against the data?

Overview of the talk Revision: Bayesian networks Integration of prior knowledge Empirical evaluation

Bayesian networks A CB D EF NODES EDGES Marriage between graph theory and probability theory. Directed acyclic graph (DAG) representing conditional independence relations. It is possible to score a network in light of the data: P(D|M), D:data, M: network structure. We can infer how well a particular network explains the observed data.

Bayesian networks versus causal networks Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.

Bayesian networks versus causal networks A CB A CB True causal graph Node A unknown

Bayesian networks versus causal networks A CB Equivalence classes: networks with the same scores: P(D|M). Equivalent networks cannot be distinguished in light of the data. A CB A CB A CB

Symmetry breaking A CB Prior knowledge A CB A CB A CB P(M|D) = P(D|M) P(M) / Z D: data. M: network structure

P(D|M)

Prior knowledge: B is a transcription factor with binding sites in the upstream regions of A and C P(M)

P(M|D) ~ P(D|M) P(M)

Learning Bayesian networks P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data

Use TF binding motifs in promoter sequences

Biological prior knowledge matrix Biological Prior Knowledge Indicates some knowledge about the relationship between genes i and j

Biological prior knowledge matrix Biological Prior Knowledge Define the energy of a Graph G Indicates some knowledge about the relationship between genes i and j

Notation Prior knowledge matrix: P  B (for “belief”) Network structure: G (for “graph”) or M (for “model”) P: Probabilities

Prior distribution over networks Energy of a network

Sample networks and hyperparameters from the posterior distribution Capture intrinsic inference uncertainty Learn the trade-off parameters automatically P(M|D) = P(D|M) P(M) / Z

Prior distribution over networks Energy of a network

Rewriting the energy Energy of a network

Approximation of the partition function Partition function of a perfect gas

Multiple sources of prior knowledge

MCMC sampling scheme

Sample networks and hyperparameters from the posterior distribution Metropolis-Hastings scheme Proposal probabilities

Bayesian networks with biological prior knowledge Biological prior knowledge: Information about the interactions between the nodes. We use two distinct sources of biological prior knowledge. Each source of biological prior knowledge is associated with its own trade-off parameter:  1 and  2. The trade off parameter indicates how much biological prior information is used. The trade-off parameters are inferred. They are not set by the user!

Bayesian networks with two sources of prior Data BNs + MCMC Recovered Networks and trade off parameters Source 1 Source 2 11 22

Bayesian networks with two sources of prior Data BNs + MCMC Source 1 Source 2 11 22 Recovered Networks and trade off parameters

Evaluation Can the method automatically evaluate how useful the different sources of prior knowledge are? Do we get an improvement in the regulatory network reconstruction? Is this improvement optimal?

Raf regulatory network From Sachs et al Science 2005

Raf regulatory network

Evaluation: Raf signalling pathway Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell Deregulation  carcinogenesis Extensively studied in the literature  gold standard network

Data Prior knowledge

Flow cytometry data Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins 5400 cells have been measured under 9 different cellular conditions (cues) Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

Microarray example Spellman et al (1998) Cell cycle 73 samples Tu et al (2005) Metabolic cycle 36 samples Genes time

Data Prior knowledge

KEGG PATHWAYS are a collection of manually drawn pathway maps representing our knowledge of molecular interactions and reaction networks. http://www.genome.jp/kegg/ Flow cytometry data and KEGG

Prior knowledge from KEGG

Prior distribution

The data and the priors + KEGG + Random

Bayesian networks with two sources of prior Data BNs + MCMC Recovered Networks and trade off parameters Source 1 Source 2 11 22

Bayesian networks with two sources of prior Data BNs + MCMC Source 1 Source 2 11 22 Recovered Networks and trade off parameters

Sampled values of the hyperparameters

How can we evaluate the reconstruction accuracy ?

Flow cytometry data and KEGG

Learning the trade-off hyperparameter Repeat MCMC simulations for large set of fixed hyperparameters β Obtain AUC scores for each value of β Compare with the proposed scheme in which β is automatically inferred. Mean and standard deviation of the sampled trade off parameter

Conclusion Bayesian scheme for the systematic integration of different sources of biological prior knowledge. The method can automatically evaluate how useful the different sources of prior knowledge are. We get an improvement in the regulatory network reconstruction. This improvement is close to optimal.

Thank you

Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Similar presentations

Presentation on theme: "Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Similar presentations

Presentation on theme: "Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction."— Presentation transcript:

Similar presentations

About project

Feedback