Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Dagstuhl 2010 University of Puerto Rico Computer Science Department The power of group algebras for constrained multilinear monomial detection Yiannis.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
A faster reliable algorithm to estimate the p-value of the multinomial llr statistic Uri Keich and Niranjan Nagarajan (Department of Computer Science,
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl
Protein Classification A comparison of function inference techniques.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University.
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Order independent structural alignment of circularly permutated proteins T. Andrew Binkowski Bhaskar DasGupta  Jie Liang ‡ Bioengineering Computer Science.
ReferencesReferences AcknowledgementsAcknowledgements TORQUE server DefinitionsDefinitions MethodsMethods IntroductionIntroduction Experiments & Results.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
A Method for Protein Functional Flow Configuration and Validation Woo-Hyuk Jang 1 Suk-Hoon Jung 1 Dong-Soo Han 1
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Uncovering Signaling Transduction Networks from PPI network by Inductive Logic Programming Woo-Hyuk Jang
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Microarrays.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
 Signal Transduction transmits signals from outside to the inside of the cell  Integer Linear Programming model is used to unravel STN.
Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Graphs David Kauchak cs302 Spring Admin HW 12 and 13 (and likely 14) You can submit revised solutions to any problem you missed Also submit your.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
CSCI2950-C Lecture 12 Networks
FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS
Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan
SEG5010 Presentation Zhou Lanjun.
Anastasia Baryshnikova  Cell Systems 
Presentation transcript:

Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005

Outline Motivation Theoretical foundations Biological extensions Implementation Validation techniques Results from yeast

Motivation Post-genomics, want to understand organisms’ protein-protein interaction network Model network as a probabilistic graph, with edge weights representing probabilities Interested in protein signaling cascades –Show up as simple paths in the graph Want to find biologically interesting paths efficiently –Score paths, with high scores reflecting importance –Extended graph algorithms provide speed –Automated modelling of signal transduction networks as baseline (Steffen et al 2002)

Theoretical Foundation Finding long, simple paths is NP-Hard –Reduce from TSP –Once we find these paths, want the best (lightest) ones Need for paths to be simple is what drives hardness Color-Coding is a randomized, dynamic- programming based algorithm for finding paths of fixed length –Developed by Alon et al (1995) Randomly color graph and require paths be colorful (exactly one vertex of each color) –Number of colors = length of paths –A colorful path is always simple

Color-Coding Colorful paths can be found with dynamic programming Key point: a colorful path of length k contains a colorful path of length k-1. Store path information at each node for each subset of k colors –Only 2 k color subsets, rather than O(n k ) node subsets Runtime is O(2 k km)<< O(kn k ) brute force Space is O(2 k n)<< O(kn k ) brute force

Coloring Example Two different colorings on toy graph, k=3 In coloring I, W(A,RGB) is built C->BC->ABC In coloring II, W(A,RGB) is built G->BG->ABG ABC is not colorful in coloring II F DEGH C AB F DEGH C AB I II

Monte Carlo Details A colorful path is simple, but a simple path may not be colorful under a given coloring Solution: run multiple independent trials After one trial, for paths of length k,

Adding Biology Color-Coding gives an algorithmic basis, now introduce biologically motivated extensions Can set the start or end of path by type –E.g. screening by Gene Ontology categories Can force the inclusion of a protein on the path by giving it a unique color Using counters, can specify “path must contain between x and y proteins of a given type” –Computational cost multiplicative in y per counter

Adding Biology - Segmented Paths Pathways may be ordered –Signaling pathways going from the membrane, to nuclear proteins and finally transcription factors Assign each protein an integer label based on biological information, build path out of ordered sequences of labeled proteins –Now only need to constrain color collisions among proteins with the same label –If path length is about equally split among labels, probability of correct coloring rises Modifications allow for inability to assign proteins to unique labels

Adding Biology - More Structures Modifications to the Color-Coding recurrence allow for the discovery beyond simple paths –Example: Two-terminal series-parallel graphs Capture parallel signaling pathways Example two-terminal series-parallel graph

Generating Edge Weights So far, have glossed over how weights (probabilities) on the protein graph are assigned Here, use our previous work, generate logistic function of three variables (for a pair of proteins) –Number of times interaction between them was experimental observed –Pearson correlation coefficient of expressions (for corresponding genes) –Their small world clustering coefficient Used training data from MIPS (gold standard) for training our relative weighting Taking log of weights makes path score additive

Application Tested our simple path implementation with the yeast interaction network –~4,500 vertices, ~14,500 edges –Based on interaction data from Database of Interacting Proteins (Feb 2004) –Runtimes varied from minutes (length 8) to under two hours (length 10) –Much faster than brute force for longer paths (14x for paths of length 9) –Focus on paths from membrane proteins to transcription factors

Validation Techniques Three methods of validation Two statistical –Functional enrichment p-value based on how many proteins in the path are similar (by GO category) –Weight p-value compares weights of paths to those found when the protein graph undergoes random degree-preserving shuffling Lastly, search for expected pathways –MAP-Kinase, ubiquitin-ligation

MAP-Kinase and Ubiquitin-Ligation Concentrated on three MAPK pathways (same as Steffen et al) –Pheromone response –Filamentous growth –Cell wall integrity Looked for shorter (length 4-6) ubiquitin- ligation pathways –Started at a cullin, ended at an F-Box –High functional enrichment under ubiquitin GO category

Statistical Results (CDFs) 100 best paths of length 99.9% success 100 normal, 2000 random paths used for weight p-value

STE2/3 STE4/18 CDC42STE20STE11STE7FUS3DIG1/2STE12 MAPK Recovery Results MID2RHO1PKC1BCK1MKK1/2SLT2RLM1 MID2ROM2RHO1PKC1MKK1SLT2RLM1 A)Cell wall integrity pathway in yeast B) Best path of length 7 found from MID2 to RLM1 STE3AKR1STE4CDC24BEM1STE5STE7KSS1STE12 C) Pheromone response signaling pathway in yeast D) Best path of length 9 found from STE2/3 to STE12

Additional MAPK Recovery Results STE2/3 STE4/18 CDC42STE20STE11STE7FUS3DIG1/2STE12 Pheromone response signaling pathway in yeast STE3 STE50 GPA1 FAR1 CDC24 REM1 STE11 CDC42 STE4/18 AKR1KSS1 STE5 STE12 DIG1/2 FUS3 STE7 Pheromone response pathway assembly network

Conclusion Presented efficient, color-coding based algorithms for finding simple paths –Added biological extensions, other structures Integrated our well-founded reliability scores Applied our algorithms to yeast –Shown 60% of discovered pathways were significantly enriched –Recovered known MAP-Kinase, ubiquitin- ligation pathways

Simple vs. Segmented CDFs Simple: 54% Segmented: 72% p-value (functional enrichment)

References Steffen, M., Petti, A., Aach, J., D’haeseleer, P., Church, G.: Automated modelling of signal transduction networks. BMC Bioinformatics 3 (2002) 34–44 Alon, N., Yuster, R., Zwick, U.: Color- coding. J. ACM 42 (1995) 844–856