Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005.

Similar presentations


Presentation on theme: "Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005."— Presentation transcript:

1 Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005

2 Outline Motivation Theoretical foundations Biological extensions Implementation Validation techniques Results from yeast

3 Motivation Post-genomics, want to understand organisms’ protein-protein interaction network Model network as a probabilistic graph, with edge weights representing probabilities Interested in protein signaling cascades –Show up as simple paths in the graph Want to find biologically interesting paths efficiently –Score paths, with high scores reflecting importance –Extended graph algorithms provide speed –Automated modelling of signal transduction networks as baseline (Steffen et al 2002)

4 Theoretical Foundation Finding long, simple paths is NP-Hard –Reduce from TSP –Once we find these paths, want the best (lightest) ones Need for paths to be simple is what drives hardness Color-Coding is a randomized, dynamic- programming based algorithm for finding paths of fixed length –Developed by Alon et al (1995) Randomly color graph and require paths be colorful (exactly one vertex of each color) –Number of colors = length of paths –A colorful path is always simple

5 Color-Coding Colorful paths can be found with dynamic programming Key point: a colorful path of length k contains a colorful path of length k-1. Store path information at each node for each subset of k colors –Only 2 k color subsets, rather than O(n k ) node subsets Runtime is O(2 k km)<< O(kn k ) brute force Space is O(2 k n)<< O(kn k ) brute force

6 Coloring Example Two different colorings on toy graph, k=3 In coloring I, W(A,RGB) is built C->BC->ABC In coloring II, W(A,RGB) is built G->BG->ABG ABC is not colorful in coloring II F DEGH C AB F DEGH C AB I II

7 Monte Carlo Details A colorful path is simple, but a simple path may not be colorful under a given coloring Solution: run multiple independent trials After one trial, for paths of length k,

8 Adding Biology Color-Coding gives an algorithmic basis, now introduce biologically motivated extensions Can set the start or end of path by type –E.g. screening by Gene Ontology categories Can force the inclusion of a protein on the path by giving it a unique color Using counters, can specify “path must contain between x and y proteins of a given type” –Computational cost multiplicative in y per counter

9 Adding Biology - Segmented Paths Pathways may be ordered –Signaling pathways going from the membrane, to nuclear proteins and finally transcription factors Assign each protein an integer label based on biological information, build path out of ordered sequences of labeled proteins –Now only need to constrain color collisions among proteins with the same label –If path length is about equally split among labels, probability of correct coloring rises Modifications allow for inability to assign proteins to unique labels

10 Adding Biology - More Structures Modifications to the Color-Coding recurrence allow for the discovery beyond simple paths –Example: Two-terminal series-parallel graphs Capture parallel signaling pathways Example two-terminal series-parallel graph

11 Generating Edge Weights So far, have glossed over how weights (probabilities) on the protein graph are assigned Here, use our previous work, generate logistic function of three variables (for a pair of proteins) –Number of times interaction between them was experimental observed –Pearson correlation coefficient of expressions (for corresponding genes) –Their small world clustering coefficient Used training data from MIPS (gold standard) for training our relative weighting Taking log of weights makes path score additive

12 Application Tested our simple path implementation with the yeast interaction network –~4,500 vertices, ~14,500 edges –Based on interaction data from Database of Interacting Proteins (Feb 2004) –Runtimes varied from minutes (length 8) to under two hours (length 10) –Much faster than brute force for longer paths (14x for paths of length 9) –Focus on paths from membrane proteins to transcription factors

13 Validation Techniques Three methods of validation Two statistical –Functional enrichment p-value based on how many proteins in the path are similar (by GO category) –Weight p-value compares weights of paths to those found when the protein graph undergoes random degree-preserving shuffling Lastly, search for expected pathways –MAP-Kinase, ubiquitin-ligation

14 MAP-Kinase and Ubiquitin-Ligation Concentrated on three MAPK pathways (same as Steffen et al) –Pheromone response –Filamentous growth –Cell wall integrity Looked for shorter (length 4-6) ubiquitin- ligation pathways –Started at a cullin, ended at an F-Box –High functional enrichment under ubiquitin GO category

15 Statistical Results (CDFs) 100 best paths of length 8 @ 99.9% success 100 normal, 2000 random paths used for weight p-value

16 STE2/3 STE4/18 CDC42STE20STE11STE7FUS3DIG1/2STE12 MAPK Recovery Results MID2RHO1PKC1BCK1MKK1/2SLT2RLM1 MID2ROM2RHO1PKC1MKK1SLT2RLM1 A)Cell wall integrity pathway in yeast B) Best path of length 7 found from MID2 to RLM1 STE3AKR1STE4CDC24BEM1STE5STE7KSS1STE12 C) Pheromone response signaling pathway in yeast D) Best path of length 9 found from STE2/3 to STE12

17 Additional MAPK Recovery Results STE2/3 STE4/18 CDC42STE20STE11STE7FUS3DIG1/2STE12 Pheromone response signaling pathway in yeast STE3 STE50 GPA1 FAR1 CDC24 REM1 STE11 CDC42 STE4/18 AKR1KSS1 STE5 STE12 DIG1/2 FUS3 STE7 Pheromone response pathway assembly network

18 Conclusion Presented efficient, color-coding based algorithms for finding simple paths –Added biological extensions, other structures Integrated our well-founded reliability scores Applied our algorithms to yeast –Shown 60% of discovered pathways were significantly enriched –Recovered known MAP-Kinase, ubiquitin- ligation pathways

19 Simple vs. Segmented CDFs Simple: 54% Segmented: 72% p-value (functional enrichment)

20 References Steffen, M., Petti, A., Aach, J., D’haeseleer, P., Church, G.: Automated modelling of signal transduction networks. BMC Bioinformatics 3 (2002) 34–44 Alon, N., Yuster, R., Zwick, U.: Color- coding. J. ACM 42 (1995) 844–856


Download ppt "Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005."

Similar presentations


Ads by Google