Download presentation
Presentation is loading. Please wait.
Published byVivien Wilkins Modified over 9 years ago
1
1 Joint analysis of regulatory networks and expression profiles Ron Shamir School of Computer Science Tel Aviv University April 2013 1 Sources: Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007). Igor Ulitsky and Ron Shamir. Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics Vol. 25 no. 9 1158-1164 (2009).
2
Outline Background Joint network and expression profiles –Matisse –Cezanne 2
3
Background 3
4
DNA RNA protein transcription translation The hard disk One program Its output 4
5
DNA Microarrays / RNA-seq Simultaneous measurement of expression levels of all genes / transcripts. Perform 10 5 -10 9 measurements in one experiment Allow global view of cellular processes. The most important biotechnological breakthroughs of the last /current decade http://www.biomedcentral.com/1471-2105/12/323/figure/F2 5
6
The Raw Data genes experiments Entries of the Raw Data matrix: expression levels. Ratios/absolute values/… expression pattern for each gene Profile for each experiment /condition/sample/chip Needs normalization! 6
7
7 EXPression ANalyzer and DisplayER Clustering Identify clusters of co-expressed genes CLICK, KMeans, SOM, hierarchical http://acgt.cs.tau.ac.il/expander A. Maron, R. Sharan Bioinformatics 03 Function. enrichment GO, TANGO Visualization Promoter analysis Analyze TF binding sites of co- regulated genes PRIMA Biclustering Identify homogeneous submatrices SAMBA A. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, R. Elkon BMC Bioinformatics 05 microRNA function inference: FAME Ulitsky et al. Nature Protocols 10
8
Networks of Protein-protein interactions (PPIs) Large, readily available resource Representation: Network with nodes=proteins/genes edges=interactions 8 Analysis methods: Global properties Motif content analysis Complex extraction Cross-species comparison
9
The hairball syndrome 9
10
Potential inroad into pathways and function Can the network help to improve the analysis? 10
11
Analysis of gene expression profiles + a network 11
12
12 Goal Challenge: Detect active functional modules: connected subnetwork of proteins whose genes are co-expressed “Where is the action in the network in a particular experiment?”
13
Ron Shamir, RNA Antalia, April 08 13
14
14
15
15 Ulitsky & Shamir BMC Systems Biology 07
16
Input: Expression data and a PPI network Output: a collection of modules –Connected PPI subnetworks –Correlated expression profiles Interaction High expression similarity http://acgt.cs.tau.ac.il/matisse 16 Modular Analysis for Topology of Interactions and Similarity SEts
17
Probabilistic model Event M ij : i,j are mates = highly co-expressed P(S ij |M ij ) ~ N( m, 2 m ) P(S ij | M ij ) ~ N( n, 2 n ) H 0 : U is a set of unrelated genes H 1 : U is a module = connected subnetwork with high internal similarity R i : gene i transcriptionally regulated m : fraction of mates out of module gene pairs that are transcriptionally regulated m = P(M ij | R i R j, H 1 ) p m : fraction of mates out of all gene pairs that are transcriptionally regulated 17
18
Probabilistic model (2) Is connected gene set U a module? Assuming pair indep: Define m ij = m P(R i )P(R j ) Define n ij = p m P(R i )P(R j ). Likelihood ratio Pr(Data|H 1 )/Pr Data|H 0 ) Taking log: sum of terms ij: 18
19
Probabilistic model - summary Similarities: mixture of two Gaussians For a candidate group U, the likelihood ratio of originating from a module or from the background is Module score = Gene group likelihood ratio = sum over all the gene pairs Find connected subgraphs U with high W U 19
20
Complexity Finding heaviest connected subgraph: NP hard even without connectivity constraints (+/- edge weights) Devised a heuristic for the problem 20
21
MATISSE workflow Seed generation Greedy optimization Significance filtering
22
Finding seeds Three seeding alternatives tested All alternatives build a seed and delete it from the network Building small seeds around single nodes: Best neighbors All neighbors Approximating the heaviest subgraph Delete low-degree nodes and record the heaviest subnetwork found
23
Greedy optimization Simultaneous optimization of all the seeds The following steps are considered: Node addition Node removal Assignment change Module merge
24
Front vs. Back nodes Only a fraction of the genes (front nodes) have meaningful similarity values MATISSE can link them using other genes (back nodes). Back nodes correspond to: –Unmeasured transcripts –Post-translational regulation –Partially regulated pathways 24
25
Advantages of MATISSE No p-vals needed for measurements Works when a fraction of the genes expression patterns are informative Can handle any similarity data No prespecified number of modules 25
26
Test case: Yeast osmotic shock Network: 65,990 PPIs & protein-DNA interactions among 6,246 genes Expression: 133 experimental conditions – response of perturbed strains to osmotic shock (O’Rourke & Herskowitz 04) Front nodes: 2,000 genes with the highest variance 26
27
Pheromone response subnetwork Back Front 27
28
Performance comparison % of modules with category enrichment at p< 10 -3 % annotations enriched at p<10 -3 in modules 28
29
GO and promoter analysis (c) 29
30
Application to stem cells ~150 human stem cell lines of diverse types profiled using microarrays Clustered profiles into groups Adjusted Matisse to seek subnetworks that characteristic to each group Focused analysis on pluripotent stem cells F. Müller, L. Laurent, D. Kostka, I. Ulitsky, R. Williams, C. Lu, I. Park, M. Rao, P. Schwartz, N. Schmidt, J. Loring Nature 08 30
31
Pluripotent stem cells network Highlights the key protein machinery underlying pluripotency 31
32
Ulitsky & Shamir Bioinformatics 2009 32
33
Accounting for PPI confidence PPI-based analysis is made difficult by abundant false positive / negative interactions Various methods can assign confidence (probability) to individual edges Idea: seek modules that are connected with high probability Ulitsky & Shamir Bioinformatics, 2009 33
34
What is a confidently connected module? With high probability, any two parts of the module are connected by an edge ▫Accommodates both sparse and dense pathways ▫Accommodates genes with low-confidence connectivity with many module genes Confidently-connected modules can be found efficiently 34
35
Connected with high probability? Every two genes are connected by a confident path ▫Bias to dense pathways There is a minimum spanning tree with high-confidence edges ▫Same as ignoring low-confidence edges An edge connects any two parts of the module are connected with high probability 35
36
CEZANNE: (Co-Expression Zone ANalysis using NEtworks) Edge probability p(e) Edge weight –log(1-p(e)) For any W U, ≥1 edge connects W with U\W with probability q (e.g. 0.95) The weight of the minimum cut of U is at least -log(1-q) Algorithm: among the subnets whose minimum cut exceeds -log(1-q) find the one with the maximum co-expression score P({A},{B,C,D})=1-0.3*0.3=0.91 P({A,C,D},{B})=0.94 P({A,B},{C,D})=0.94 P({A,B,D},{C})=0.994 minimum cut 0.7 0.9 0.7 0.8 A A B B C C D D 36
37
How to find confidently connected modules? Seed identification: Run MATISSE ignoring edge weights, then “slice” the modules using minimum cut, until all subnetworks are “legal” Greedy optimization (how to find legal moves?): ▫Adding nodes is easy to test (positive edge weights) ▫Merging modules is easy to test ▫(Re)moving modules: requires maintaining the set of ‘crucial’ nodes in each module Solvable in minutes on real world examples 37
38
DNA damage response in S. cerevisiae 47 DNA Damage Response expression profiles (Gasch et al., 01) Front nodes: 2,074 genes with at least two-fold expression change Network and confidence values: purification enrichment (PE) scores (Collins et al. 07) 38
39
Module sizeGO biological processp-valueGO-slim protein complexesp-value 346 ribosome biogenesis and assembly1.2·10 -117 ribosome5.9·10 -91 translation1.0·10 -85 eukaryotic 43S preinitiation complex3.8·10 -49 rRNA processing7.5·10 -79 small nucleolar ribonucleoprotein complex1.5·10 -41 35S primary transcript processing4.6·10 -44 DNA-directed RNA polymerase III complex3.1·10 -17 ribosome assembly4.3·10 -39 exosome (RNase complex)4.4·10 -15 ribosomal large subunit biogenesis9.2·10 -14 DNA-directed RNA polymerase I complex5.7·10 -14 rRNA modification4.4·10 -12 Noc complex3.2·10 -6 38 protein catabolism1.8·10 -46 proteasome complex (sensu Eukaryota)5.7·10 -71 proteolysis9.0·10 -44 proteasome core complex (sensu Eukaryota)9.4·10 -32 ubiquitin cycle1.1·10 -42 12 histone acetylation3.6·10 -13 histone acetyltransferase complex2.1·10 -12 chromatin modification5.9·10 -11 transcription from RNA polymerase II promoter1.4·10 -6 12translation1.1·10 -14 ribosome1.4·10 -15 12 nuclear mRNA splicing, via spliceosome3.5·10 -21 spliceosome complex3.5·10 -17 small nuclear ribonucleoprotein complex2.5·10 -15 10 barbed-end actin filament capping4.8·10 -6 F-actin capping protein complex4.8·10 -6 endocytosis1.1·10 -5 cytoskeleton organization and biogenesis2.8·10 -5 8establishment and/or maintenance of chromatin architecture 1.1·10 -5 chromatin remodeling complex4.6·10 -6 7 glycogen metabolism3.0·10 -8 protein phosphatase type 1 complex3.3·10 -5 sporulation (sensu Fungi)2.0·10 -6 6 translation1.1·10 -7 ribosome4.0·10 -8 6 tRNA processing2.5·10 -14 ribonuclease P complex9.2·10 -8 rRNA processing2.2·10 -9 4trehalose biosynthesis 6.8·10 -14 alpha,alpha-trehalose-phosphate synthase complex (UDP-forming) 6.8·10 -14 4 ubiquitin-dependent protein catabolism5.2·10 -7 3 pseudohyphal growth9.8·10 -7 cAMP-dependent protein kinase complex9.6·10 -7 3 proteasome assembly3.2·10 -6 protein folding3.9·10 -6 DNA damage response modules Cytoplasmic ribosome biogenesis Proteasome Mitochondrial ribosome – small subunit Mitochondrial ribosome – large subunit Spliceosome Novel actin-localized pathway? Hsp90 PKA Trehalose biosynthesis Ribonuclease P Suggests SWS2 a novel member Novel pathway enriched with actin-localized proteins; Supported in other datasets; Similar deletion phenotypes 39
40
Comparison with prior work Combined measure of sensitivity (% of annotations enriched) and specificity (% of modules enriched) with p<0.001 Clustering of only expression data Clustering expression & network (Hanisch et al., 2002) Expression similarity + network connectivity Expression similarity + confident network connectivity 40
41
41
42
Summary Algorithms using co-expression + networks to detect functionally coherent modules Accommodate both sparse and dense subnetworks Subnetworks linked to osmotic shock and DNA damage A general framework for confident connectivity in PPI networks The next steps: ▫Co-expression is not the only interesting way to utilize GE data ▫Scaling to complex human datasets 42
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.