. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman
Expression Profiling u An Expression Profile is a simultaneous measurement of the level of all mRNAs in a cell population u Experimental design: Measure profiles of mutated or treated cultures u Goal: infer regulatory and molecular interactions Wild-TypeMutant Profile Compare
Common Approaches u Comparative Analysis (Holstage et al. 1998) u Clustering (Hughes et al. 2000) u Limitations: l Cannot distinguish between direct and indirect interactions l Limited to pair-wise relations l Can not infer a finer context
Bayesian Network Framework Friedman, Linial, Nachman,Pe’er (JCB 2000) u Probabilistic: Characterize statistical relationships between expression patterns of different genes u Multi-variable interactions (beyond pair-wise): l Identify intermediate interactions l Handle combinatorial regulation by several gene-products u Statistical confidence: Asses the statistical significance of interactions found
Our Contributions u Modeling of mutations and treatments into the Bayesian network framework u Novel data discretization based on guided k- means clustering u New features: Mediator and Regulator u Automatic reconstruction of statistically significant sub-networks
Modeling Gene Expression Gene 1 Expression level of each gene = Random variable Gene 3 Gene 4 Gene 5 Gene 2 Gene interaction = Probabilistic dependency Directed Acyclic graph Models dependency structure of distribution P(3 | 1,2) Each node has a probabilistic function Conditioned on its parents in the graph Activator Inhibitor Graph structure + local probability Define a unique multivariate distribution
Mutational Assay Wild-Type Measurements pgk pgk1 P(rap1|pgk1) Equivalence: Two models explain correlation between RAP1 & PGK1 RAP1PGK1 RAP1PGK1 Mutated pgk1 Measurements 0.5 pgk1 0.5 pgk1 P(rap1|pgk1) Note causality into mutated variable
Compendium Dataset ( Hughes et al., 2000) u 300 samples of yeast deletion mutants and other treatments u Deleted genes are from various functional families u A rich variety of profiles, but… u There is only one sample from each mutation
Guided K-means Discretization Guided K-means Discretization Expression data Markov Separator Edge Regulator Bayesian Network Learning Algorithm + Bootstrap Reconstruct SubNetworks Visualize Using Pathway Explorer Visualize Using Pathway Explorer Preprocess Learn model Feature extraction Feature assembly Visualization E R B A C S
Resulting PDAG
Confidence Estimates: Bootstrap D resample D1D1 D2D2 DmDm... Learn E R B A C E R B A C E R B A C Estimate: Bootstrap approach [FGW, UAI99]
Estimating Confidence Common Practice: Pick a single top scoring model Problem: Insufficient information!! In gene expression data: only few hundred experiments => many high scoring models Answer based on one model useless Solution: Search for features common to many likely models! Sample models from posterior distribution P(Model|Data) Confidence of feature : Feature of G, e.g., X Y
Guided K-means Discretization Guided K-means Discretization Expression data Markov Separator Edge Regulator Bayesian Network Learning Algorithm + Bootstrap Reconstruct SubNetworks Visualize Using Pathway Explorer Visualize Using Pathway Explorer Preprocess Learn model Feature extraction Feature assembly Visualization
Markov Relations Question: Do X and Y directly interact? l Parent-child (one gene regulating the other) u Hidden Parent (two genes co-regulated by a hidden factor) (0.91,0.67) SST2STE6 SST2STE6 Mating pathway regulator Exporter of mating factor ARG5 ARG3 (0.84,0.79) ARG3 ARG5 GCN4 Arginine Biosynthesis Transcription factor
Low Correlation Relations u Previously unknown link strongly supported by evidence in the literature u High confidence, Low correlation l Processes occur under specific conditions l Captured by our context specific score ESC4KU70 (0.91, 0.16) DNA ds break repair Chromatin silencing
Separators Question: Given that X and Y are indirectly dependant, who mediates this dependence? u Separator relation: l X affects Z who in turn affects Z l Z regulates both X and Y AGA1FUS1 KAR4 Mating transcriptional regulator of nuclear fusion Cell fusion
Separators: Intra-cluster Context CRH1YPS3 SLT2 Cell wall protein MAPK of cell wall integrity pathway Cell wall protein YPS1 Cell wall protein SLR3 Protein of unknown function + + u All gene pairs have high correlation, l clustering groups them together u assigned putative function to SLR3 - cell wall protein u We can assign regulatory role to SLT2 u Many other signaling and regulatory proteins were identified as direct and indirect separators
Guided K-means Discretization Guided K-means Discretization Expression data Markov Separator Edge Regulator Bayesian Network Learning Algorithm + Bootstrap Reconstruct SubNetworks Visualize Using Pathway Explorer Visualize Using Pathway Explorer Preprocess Learn model Feature extraction Feature assembly Visualization
Sub-Networks u Reconstruct a Conserved sub-network l Provides a more global picture l Allows to include features with lower-confidence l Preserved in most networks with high posterior l Probably reflects a real biological process u Automatic algorithm l Score: high concentration of pairwise features l Greedy search for high scoring subgraphs
Increased Confidence (simulated data) Percent of False positives Confidence Entire network Subnetwork
Guided K-means Discretization Guided K-means Discretization Expression data Markov Separator Edge Regulator Bayesian Network Learning Algorithm + Bootstrap Reconstruct SubNetworks Visualize Using Pathway Explorer Visualize Using Pathway Explorer Preprocess Learn model Feature extraction Feature assembly Visualization
Rosetta networks in Pathway Explorer u
Summary u Primary contribution: automated methodology for finding patterns of interactions among genes l Clear semantics l Principled handing of mutations and interventions u Built in handling of statistical significance l Feature confidence l Extracts significant sub-networks u Differs from clustering l Inter-cluster relations l Finer intra-cluster structure u Provides biologist with promising hypothesis