Ingenuity Knowledge Base Mechanistic Networks: explaining gene expression data using literature-based molecular interactions Andreas Krämer, Stuart Tugendreich, Jeff Green; Ingenuity Systems, 1700 Seaport Blvd, 3rd Floor, Redwood City, CA 94063 Identification of causal relationships between upstream regulators that are likely implicated in the data set Building mechanistic networks from upstream regulators and upstream causal edges Summary The Ingenuity® Knowledge Base contains over 4.5 million findings curated from the scientific literature and third party databases, and constitutes the basis for a large-scale network of molecular interactions. Most of these interactions represent direct or indirect causal relationships between genes and chemicals in various experimental situations. We present a method to identify causal interactions that are likely relevant in the context of a given gene expression data set, and construct regulatory networks upstream of the genes whose expression has been observed to change. This identifies potential molecular signaling mechanisms that explain the observed expression changes. Key idea: Relevant causal edges (AB in the diagram below) are enriched in “causal triangles” with respect to regulated genes in the data set: We identify possible signaling cascades that connect an upstream regulator to the up- or down-regulated genes through several steps. Hypothesis networks are constructed “top down”, starting with any identified regulator as the root node. The expectation is that the root node is an indirect regulator (like a ligand or receptor) while the “bottom layer” of the network contains transcription factors that are directly connected to the data through expression edges. Expression edges to the data are not included in the networks shown below. A B C A, B: upstream regulators C: data set gene expression edges Method : For network with “breadth” N and “depth” K: Starting from any upstream regulator, select N regulators that are connected downstream through edges with lowest edge p-values. For each of those regulators perform Step 1 recursively. Stop if maximal path length K is reached. Avoid cycles. Build network from union of all paths. Ingenuity Knowledge Base and literature–based global causal network Given presence of a “causal mechanism” A B C: It is likely (with sufficient coverage) that the causal effect A C has also been observed and is present in the causal network. Ingenuity Knowledge Base Method: Example 1: A mechanistic network for the estradiol data set Determine set S of upstream regulators that pass a given overlap p-value cut off For each causal edge between any pair of regulators in S calculate an edge p-value based on the overlap between the corresponding regulated genes in the data set (FET p-value, data set as universe): Literature findings + public databases Network of cause-effect relationships Ingenuity Knowledge Base contains ~4.5M findings from the biomedical literature Causal network with ~39000 nodes and ~116000 edges Nodes represent genes, chemicals, microRNA Edges represent cause-effect relationships (experimental observations) and binding events Edge types: expression/transcription, molecular modification, activation/inhibition, proteolysis, localization etc. Edges are associated with direction of effect (increase/decrease) Many edges represent indirect cause-effect relationships that were observed in various contexts (tissue, cell-type etc). data set A B Example 2: Primary human endothelial cells stimulated with TNF-a Causal relationships AB with significant overlap (low edge p-value) are more likely part of a regulation mechanism that can explain the observed data. D. Viemann et al., J. Leukoc Biol 2006, 80(1): 174-185 GSE2639 Top-scoring upstream regulators: Inference of upstream regulators from gene expression data Example 1: Top-scoring causal edges (low edge p-values) for estradiol data set Given a data set of up- and down-regulated genes, determine upstream molecules in the causal network that are connected to data set molecules through transcription or expression edges. Regulator A | Regulator B | Edge p -value Regulators can be Transcription factors Any molecule (incl. endogenous chemicals, drugs, microRNA) (using indirect expression findings) + - regulator data set overlap Mechanistic network for TNF as upstream regulator identifies known NFkB pathway: Overlap p-value (used as a score) Measures significant overlap between expression pattern and genes affected by given regulator (Fisher’s Exact Test, right-tailed) B. Activation z-score Infer activation state of regulator by testing for match in up/down regulation pattern (z-score) Inferred upstream causal edges are enriched in canonical pathways Applications: Consider signaling pathways in the Ingenuity Pathway Library and overlay relationships from the global causal network. Example: Generate hypotheses about mechanism of action Find potential regulators with similar response Find potential regulators with opposite response Example 1: Estradiol exposure in MCF-7 breast cancer cells CY Lin et al., PLoS Genet 2007, 3(6):e87 GSE11352 Top-scoring upstream regulators ordered by overlap p-value: Upstream Regulator Analysis and Mechanistic Networks are new features available in Ingenuity® Pathway Analysis (IPA®) pool all 321 signaling pathways in one network use edge p-value to predict actual pathway edges from all overlaid edges ROC for estradiol data set true positive rate false positive rate