Copyright (c) 2004 by Limsoon Wong Assessing Reliability of Protein-Protein Interaction Experiments Limsoon Wong Institute for Infocomm Research.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Table 2 shows that the set TFsf-TGblbs of predicted regulatory links has better results than the other two sets, based on having a significantly higher.
PPI network construction and false positive detection Jin Chen CSE Fall 1.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Biological networks: Types and sources Protein-protein interactions, Protein complexes, and network properties.
A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
T Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015.
Functional genomics and inferring regulatory pathways with gene expression data.
Biological networks: Types and origin Protein-protein interactions, complexes, and network properties Thomas Skøt Jensen Center for Biological Sequence.
Copyright  2004 limsoon wong Assessing Reliability of Protein- Protein Interaction Experiments Limsoon Wong Institute for Infocomm Research.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Evidence for dynamically organized modularity in the yeast protein- protein interaction network Han, et al
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Ohnologs and Regulatory Networks Robbie Sedgewick Group Meeting March 2, 2006.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Biological networks: Types and origin
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Protein-Protein Interaction Screens. Bacterial Two-Hybrid System selectable marker RNA polymerase DNA binding protein bait target sequence target.
Interaction Networks in Biology: Interface between Physics and Biology, Shekhar C. Mande, August 24, 2009 Interaction Networks in Biology: Interface between.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Exploiting indirect neighbors and topological weight to predict protein function from protein– protein interactions Hon Nian Chua, Wing-Kin Sung and Limsoon.
Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
Protein protein interactions
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Interactions and more interactions
Improving PPI Networks with Correlated Gene Expression Data Jesse Walsh.
Networks and Interactions Boo Virk v1.0.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Knowledge Discovery from Biological and Clinical Data: BASIC BACKGROUND.
Proteome and interactome Bioinformatics.
Analysis of the yeast transcriptional regulatory network.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
TAP(Tandem Affinity Purification) Billy Baader Genetics 677.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Biol 729 – Proteome Bioinformatics Dr M. J. Fisher - Protein: Protein Interactions.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
How many interactions are there? ~6,200 genes ~6,200 proteins x 2-10 interactions/protein ~12, ,000 interactions Yeast.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Robustness, clustering & evolutionary conservation Stefan Wuchty Center of Network Research Department of Physics University of Notre Dame title.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
EQTLs.
Biological networks CS 5263 Bioinformatics.
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Walking the Interactome for Prioritization of Candidate Disease Genes
Principle of Epistasis Analysis
Brandon Ho, Anastasia Baryshnikova, Grant W. Brown  Cell Systems 
Presentation transcript:

Copyright (c) 2004 by Limsoon Wong Assessing Reliability of Protein-Protein Interaction Experiments Limsoon Wong Institute for Infocomm Research

Copyright (c) 2004 by Limsoon Wong Outline Reliability of experimental protein-protein interaction data A couple of existing reliability indices –interaction generality –interaction generality 2 A new reliability index –interaction pathway reliability Further refinements –interaction reliability by belief inference –false negative detection

Copyright (c) 2004 by Limsoon Wong How Reliable are Experimental Protein- Protein Interaction Data? Figure credit: Jeong et al. 2001

Complete genomes are now available Knowing the genes is not enough to understand how biology functions Proteins, not genes, are responsible for many cellular activities Proteins function by interacting w/ other proteins and biomolecules GENOME PROTEOME “INTERACTOME” Why Protein Interactions? Copyright (c) 2004 by Limsoon Wong

FACT: Generating large amounts of experimental data about protein-protein interactions can be done with ease. High-Tech Expt PPI Detection Methods Yeast two-hybrid assays Mass spec of purified complexes (e.g., TAP) Correlated mRNA expression Genetic interactions (e.g., synthetic lethality)...

High-throughput approach sacrifice quality for quantity: (a) limited or biased coverage: false negatives, & (b) high error rates : false positives Key Bottleneck Many high-throughput expt detection methods for protein-protein interactions have been devised. But... Copyright (c) 2004 by Limsoon Wong

Large disagreement betw methods Some Protein Interaction Data Sets Sprinzak et al., JMB, 327: , 2003 GY2H: genome-scale Y2H 2M, 3M, 4M: intersection of 2, 3, 4 methods Copyright (c) 2004 by Limsoon Wong

Ditto wrt co-cellular-role Expected proportion of co-localized pairs among non true interacting pairs Expected proportion of co-localized pairs among true interacting pairs Quantitative Estimates Sprinzak et al, JMB, 327: , 2003

% of TP based on co-localization % of TP based on shared cellular role (I = 1) % of TP based on shared cellular role (I =.95) TP = ~50% Reliability of Protein Interaction Data Sprinzak et al, JMB, 327: , 2003 Copyright (c) 2004 by Limsoon Wong

50-70% of interactions are spurious 10-30% of interactions catalogued Protein interaction data 90% of spots are good data 80-90% of transcripts represented mRNA profiling 99.9% correct99% of genome sequence DNA genome sequence Data qualityCoverage Are We There Yet? Copyright (c) 2004 by Limsoon Wong

Objective Some high-throughput protein interaction expts have as much as 50% false positives Can we find a way to rank candidate interaction pairs according to their reliability? How do we do this? –Would knowing their neighbours help? –Would knowing their local topology help? –Would knowing their global topology help?

Copyright (c) 2004 by Limsoon Wong Would knowing their neighbours help? The story of interaction generality

a b An Observation It seems that configuration a is less likely than b in protein interaction networks Can we exploit this? Copyright (c) 2004 by Limsoon Wong

Interaction Generality Saito et al., NAR, 30: , 2002 ig(YDR412W  GLC7) = 1 + # of yellow nodes The number of proteins that “interact” with just X or Y, and nobody else Copyright (c) 2004 by Limsoon Wong

a b Assessing Reliability Using Interaction Generality Recall configuration a is less likely than b in protein interaction networks The smaller the “ig” value of a candidate interaction pair is, the more likely that interaction is Copyright (c) 2004 by Limsoon Wong

There are 229 pairs in Ito having ig = 1. Of these, 66 (or 34%) are also reported by Uetz Evaluation wrt Intersection of Ito et al. & Uetz et al. Interacting pairs c’mon to Ito et al. & Uetz et al. are more reliable Also have smaller “ig”  “ig” seems to work Copyright (c) 2004 by Limsoon Wong

~60% of pairs in in Ito having ig=1 are known to have common localization Evaluation wrt Co-localization Interaction pairs having common cellular localization are more likely Also have lower “ig”  “ig” seems to work Copyright (c) 2004 by Limsoon Wong

A: before restrict to pairs with “ig = 1” B: after restrict to pairs with “ig = 1” reduced x-talk Evaluation wrt Co-cellular Role Interaction pairs having common cellular role are more likely Also have lower “ig”  “ig” seems to work Copyright (c) 2004 by Limsoon Wong

Would knowing their local topology help? The story of interaction generality 2

Observed 70 times in S. cerevisiaeObserved ~11 times in random data Existence of Network Motifs Milo et al., Science, 298: , 2002 A network motif is just a local topological configuration of the network “Detected” in gene regulation networks, WWW links, etc. Copyright (c) 2004 by Limsoon Wong

11 22 33 44 55 5 Possible Network Motifs Classify a protein C that directly interacts with the pair A  B according to these 5 topological configurations Copyright (c) 2004 by Limsoon Wong

A New Interaction Generality Saito et al., Bioinformatics, 19: , 2002 '

Copyright (c) 2004 by Limsoon Wong ~90% of pairs in intersection of Ito & Uetz have ig 2 < 0. ~60% of pairs not in intersection of Ito & Uetz have ig 2 < 0 Evaluation wrt Reproducible Interactions “ig 2 ” correlates to “reproducible” interactions  “ig 2 ” seems to work

~95% of pairs having ig2 = –6 have common cellular roles Evaluation wrt Common Cellular Role, etc. “ig 2 ” correlates well to common cellular roles, localization, & expression “ig 2 ” seems to work better than “ig” Copyright (c) 2004 by Limsoon Wong

Would knowing their global topology help? The story of interaction pathway reliability

Copyright (c) 2004 by Limsoon Wong Some “Reasonable” Speculations A true interacting pair is often connected by at least one alternative path (reason: a biological function is performed by a highly interconnected network of interactions) The shorter the alternative path, the more likely the interaction (reason: evolution of life is through “add-on” interactions of other or newer folds onto existing ones)

Copyright (c) 2004 by Limsoon Wong Therefore... Conjecture: “An interaction that is associated with an alternate path of reliable interactions is likely to be reliable.” Idea: Use alternative interaction paths as a measure to indicate functional linkage between the two proteins

Copyright (c) 2004 by Limsoon Wong Interaction Pathway Reliability IPR is also called IRAP, “Interaction Reliability by Alternate Pathways”

Copyright (c) 2004 by Limsoon Wong Non-reducible Paths Non-reducible paths are –A  F  E –A  B  E Reducible paths are –A  B  C  D  E –A  B  C  E A DCB E F

Copyright (c) 2004 by Limsoon Wong Evaluation wrt Reproducible Interactions “ipr” correlates well to “reproducible” interactions  “ipr” seems to work The number of pairs not in the intersection of Ito & Uetz is not changed much wrt the ipr value of the pairs The number of pairs in the intersection of Ito & Uetz increases wrt the ipr value of the pairs

At the ipr threshold that eliminated 80% of pairs, ~85% of the of the remaining pairs have common cellular roles Evaluation wrt Common Cellular Role, etc “ipr” correlates well to common cellular roles, localization, & expression  “ipr” seems to work better than “ig 2 ” Copyright (c) 2004 by Limsoon Wong

Part of the network of physical interactions reported by Ito et al., PNAS, 2001 Stability in Protein Networks Maslov & Sneppen, Science, 296: , 2002 According to Maslov & Sneppen –Links betw high-connected proteins are suppressed –Links betw high- & low-connected proteins are favoured This decreases cross talks & increases robustness Copyright (c) 2004 by Limsoon Wong

Evaluation wrt “Many-few” Interactions Number of “Many-few” interactions increases when more “reliable” IPR threshold is used to filter interactions Consistent with the Maslov-Sneppen prediction

Copyright (c) 2004 by Limsoon Wong Evaluation wrt “Cross-Talkers” A MIPS functional cat: –| 02 | ENERGY –| | glycolysis and gluconeogenesis –| | glycolysis methylglyoxal bypass –| | regulation of glycolysis & gluconeogenesis First 2 digits is top cat Other digits add more granularity to the cat  Compare high- & low- IPR pairs that are not co-localised to determine number of pairs that fall into same cat. If more high-IPR pairs are in same cat, then IPR works

Copyright (c) 2004 by Limsoon Wong Evaluation wrt “Cross-Talkers” For top cat –148/257 high-IPR pairs are in same cat –65/260 low-IPR pairs are in same cat For fine-granularity cat –135/257 high-IPR pairs are in same cat. 37/260 low-IPR pairs are in same cat  IPR works  IPR pairs that are not co-localized are real cross-talkers!

Copyright (c) 2004 by Limsoon Wong Example Cross Talkers

Copyright (c) 2004 by Limsoon Wong Would knowing their global topology help? The story of interaction reliability by belief inference

Copyright (c) 2004 by Limsoon Wong Shortcoming of IPR IPR considers only the strongest alternative path  may overlook other interactions close to the target pairs that may provide significant supporting evidence for their reliability  need an improve measure that considers all the protein interactions close to the target pair.

Copyright (c) 2004 by Limsoon Wong Interaction Reliability by Belief Inference

Copyright (c) 2004 by Limsoon Wong IPR Effectiveness of IRBI irbi is more effective than ipr, ig 1, ig 2

Copyright (c) 2004 by Limsoon Wong How a bout discovering false negatives? The story of detecting missing information

Copyright (c) 2004 by Limsoon Wong False Negatives A “false negative” is a failure to detect a real protein-protein interaction

Copyright (c) 2004 by Limsoon Wong IPR (& IRBI) Detects False Negatives To find out if there is a “missing” interaction between X and Y, we do: –compute ipr value of X  Y in G  {X  Y} –predict if X  Y as false negative if ipr is high

Copyright (c) 2004 by Limsoon Wong How do we test if this works? To test this, we mimic false negatives by random removal of 5% to 15% of known interactions. Then we check: –how many removed interactions are rediscovered? –is there diff in rediscovery rates of false negative vs random links? –Is there support in terms of gene expression correlation, common cellular roles, & common cellular locations?

Copyright (c) 2004 by Limsoon Wong Rediscover Hidden Interactions IPR > 0.95 IPR < 0.01

False Negatives vs Random Links Ave IPR Copyright (c) 2004 by Limsoon Wong

Indirect Evidence Randomly select 2 nodes “Pretend” they interact in G Compute their IPR Check if they have gene expression correlation, common cellular roles, common cellular location

Copyright (c) 2004 by Limsoon Wong Case Study 1 Dataset –10,199 non-redundant interactions between 4,336 yeast proteins from MIPS with date Jan. 18, 2005 –833 “verified true” interactions based on Ito et al.’s core set Apply IPR on dataset with the 833 true interactions hidden from the program IPR re-discovered 730 interactions

Copyright (c) 2004 by Limsoon Wong Case Study 1: Re-Discovered True Interactions Have High IPR Values

Only values greater than 0.95 are shown in the table Case Study 1: Comparison With Random Connections Copyright (c) 2004 by Limsoon Wong Example true x-talkers according to MIPS Example likely interaction predicted

Copyright (c) 2004 by Limsoon Wong Case Study 2 Yeast 26S Proteosome Complex –4 sub-structures found in PDB covering 13 proteins –23 physical interactions using the (10, 5)-rule Out of 23 physical interactions in this complex, only 1 was detected in MIPS expt dataset IPR detected 42 interactions (out of 78 possible interactions) betw the 13 proteins: –14 are physical interactions in complex (c.f. 1 detected in MIPS) –9 are expt detected interactions in MIPS –19 new interactions are also predicted MIPS Lots! 1 IPR 19 Known Complexes

Copyright (c) 2004 by Limsoon Wong Conclusions There are latent local & global network “motifs” that indicate likelihood of protein interactions These network “motifs” can be exploited in computational elimination of false positives & false negatives from high- throughput Y2H expt & possibly other highly erroneous interaction data IPR is so far the most effective topologically- based computational measure for assessing the reliability (false positives) of protein- protein interactions detected by high- throughput methods IPR can also discover new interactions (false negatives) not detected in the expt PPI network

Copyright (c) 2004 by Limsoon Wong Acknowledgements NUS SOC Jin Chen, Wynne Hsu, Mong Li Lee I 2 R See-Kiong Ng GIS Prasanna Kolatkar, Jer-Min Chia

Copyright (c) 2004 by Limsoon Wong References J. Chen et al, “Systematic assessment of high-throughput protein interaction data using alternative path approach”, Proc. ICTAI 2004, pages R. Saito et al, “Interaction generality, a measurement to assess the reliability of a protein-protein interaction”, NAR 30: , 2002 R. Saito et al, “Construction of reliable protein-protein interaction networks with a new interaction generality measure”, Bioinformatics 19: , 2002 S. Maslov & K. Sneppen, “Specificity and stability in topology of protein networks”, Science 296: , 2002 E. Sprinzak et al, “How reliable are experimental protein-protein interaction data?”, JMB 327: , 2003