Protein-protein interactions Ia. A combined algorithm for genome-wide prediction of protein function. Edward M. Marcotte, Matteo Pellegrini, Michael J.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Weighing Evidence in the Absence of a Gold Standard Phil Long Genome Institute of Singapore (joint work with K.R.K. “Krish” Murthy, Vinsensius Vega, Nir.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Protein Targeting by Functional Linkage of Non-Homologous Proteins with examples from M. tuberculosis Genome-wide functional linkage map Structural Genomics.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Protein-protein interactions
Biological Gene and Protein Networks
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Protein interaction Computational (inferred) Experimental (observed)
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Protein Modules An Introduction to Bioinformatics.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Affinity chromatography/mass spec Bait protein GST Page 252.
Protein Classification A comparison of function inference techniques.
Protein Interactions and Disease Audry Kang 7/15/2013.
Proteomics Understanding Proteins in the Postgenomic Era.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Interaction Networks in Biology: Interface between Physics and Biology, Shekhar C. Mande, August 24, 2009 Interaction Networks in Biology: Interface between.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Protein-protein interactions Chapter 12. Stable complex Transient Interaction Transient Signaling Complex Rap1A – cRaf1 Interface 1310 Å 2 Stable complex:
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Interactions and more interactions
Protein-protein interactions Courtesy of Sarah Teichmann & Jose B. Pereira-Leal MRC Laboratory of Molecular Biology, Cambridge, UK EMBL-EBI.
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.
Introduction to Proteomics 1. What is Proteomics? Proteomics - A newly emerging field of life science research that uses High Throughput (HT) technologies.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Protein Interaction (domain domain interaction) Bioinformatics in Biosophy Park, Jong Hwa MRC-DUNN Hills Road Cambridge CB2 2XY England 1 Next : 02/06/2001.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Computational prediction of protein-protein interactions Rong Liu
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
The Mammalian Protein – Protein Interaction Database and Its Viewing System That Is Linked to the Main FANTOM2 Viewer Genome Research (2003) Speaker: 蔡欣吟.
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Bioinformatics and Computational Biology
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
1 Protein-Protein Interactions High-throughput strategy –Prediction from sequence In silico analysis –Protein A from species A: domain 1 and 2 –Protein.
Use of Logic Relationships to Decipher Protein Network Organization Peter M. Bowers, Shawn J. Cokus, David Eisenberg, Todd O. Yeates Presented by Krishna.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
1 Computational functional genomics Lital Haham Sivan Pearl.
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
Detecting Protein Function and Protein-Protein Interactions from Genome Sequences TuyetLinh Nguyen.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Sequence based searches:
Large Scale Data Integration
Annotation Presentation
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Basic Local Alignment Search Tool
Presentation transcript:

Protein-protein interactions Ia. A combined algorithm for genome-wide prediction of protein function. Edward M. Marcotte, Matteo Pellegrini, Michael J. Thompson, Todd O. Yeates, David Eisenberg(1999) Nature 402, Protein function in the post-genomic era. David Eisenberg, Edward M. Marcotte, Ioannis Xenarios & Todd O. Yeates(2000) Nature 405,

FUNCTIONAL RELATIONSHIPS AMONG PROTEINS: GENOME-WIDE PREDICTION (FUNCTIONAL GENOMICS) Does not rely on DIRECT SEQUENCE HOMOLOGY 3 independent predictions methods & available experimental data.

STRATEGIES USED TO “FUNCTIONALLY LINK” PROTEINS: 6217 yeast proteins Correlated Evolution: Related Phylogenetic Profiles (pattern of presence or absence of a particular protein across a set of organisms whose genomes have been sequenced): proteins, which operate together in a common pathway or complex, are inherited together. Correlated mRNA Expression Patterns: Correlated mRNA Expression Patterns under different growth conditions Correlated Patterns Of Domain Fusion: Link 2 proteins whose homologs are fused into a single gene (Rosetta stone sequences) in another organism.

STRATEGIES USED TO “FUNCTIONALLY LINK” PROTEINS:(continued) Gene Neighbour Method: if in several genomes, the genes that encode 2 proteins are neighbors on the chromosome, the proteins tend to be functionally linked Experimental Evidence: Mass spectrometry, Coimmunoprecipitaion, Yeast 2-hybrid data (DIP, MIPS yeast genome db) Metabolic pathway neighbours: Proteins, which participate in same metabolic pathway, common structural complex or biological process or closely related physiological function: BLAST homology searches and pairwise links were defined between yeast proteins whose E.Coli homologs catalyse sequential reactions in a metabolic pathway (EcoCyc db)

RESULTS: Phylogenetic profiles: 20,749 links mRNA expression patterns: 26,013 links Domain fusion method: 45,502 links 93,750 pairwise functional links among 76% (4,701) of yeast proteins 4130: “HIGHEST CONFIDENCE” links (experimental proof, valid by 2 of 3 prediction methods) 19,251: “HIGH CONFIDENCE”links: (predicted by phylogenetic profiles) Remainder predicted by domain fusion or correlated mRNA expression patterns

VALIDATION : Excellent reliability if 2 or more prediction methods agreed on a link. These methods link many proteins that are already known to function together on the basis of experiments. (Ribosomal proteins, proteins from flagellar motor apparatus and metabolic pathways) “Keyword recovery”: Prediction could be compared to the actual annotation: compare keyword annotation on SwissPDB, for both members of each pair of proteins, linked by one of the methods-possible when the members have known function. “Keyword recovery”: if keywords match. Average signal to noise ratio for “Keyword recovery”: Phylogenetic profiles: 5 mRNA expression patterns: 2 When 2 prediction methods gave same linkage: 8 Direct experimental data: 8

OUTCOME: Functional links between proteins of unknown function: General function assigned to more than half of 2557 previously uncharacterized yeast proteins: 15% from high and highest confidence links, 62% using all links. Functional Links Between Non-Homologous Proteins: beyond traditional “sequence matching”: Sup35, MSH6 Discovery of potential interactions within and across cellular processes and compartments. Connections represent a “gold mine” for experimentally testing specific hypotheses about gene function. Viewing protein-protein interactions globally as a network and not as binary data sets, increases the confidence levels for individual interactions: inspection of interaction web at different steps identifies “unexpected” links between previously unconnected cellular processes.

Ib. A network of protein-protein interactions in yeast. Schwikowski B, Uetz P, Fields S. (2000). Nat Biotechnol. 18,

DATA SOURCE: MIPS site YPD DIPS Yeast-2-hybrid studies Biochemical experimental data

Prediction of function: Annotated functions of all neighbors of P are ordered in a list, from the most frequent to the least frequent. Functions that occur the same number of times are ordered arbitrarily. Everything after the third entry in the list is discarded, and the remaining three or fewer functions are declared as predictions for the function of P. Evaluation of the quality of the links: For unknown protein, test predicted function

RESULTS: Analyzed 2,709 published interactions involving 2,039 yeast proteins Single large network containing 2,358 links among 1,548 individual proteins.Other networks had few proteins. 65% of the interactions in the complete set of networks occur among proteins with at least one common functional assignment. 78% of the 1,432 interactions between proteins of known localization, the proteins share one or more compartments. Correctly predicted a functional category for 72% of 1393 characterised proteins, with at least one partner of known function. Cross-talk between and within functional groups/subcellular compartments. Local function vs Contextual/cellular function (extended web of interacting molecules) Predicted functions of 364 uncharacterised proteins.

Reliability of the generated networks: 1,393 of the 2,039 proteins were annotated with some function and had at least one neighbor annotated with a function. In 1,005 of these 1,393 cases (72.1%), at least one annotated function was predicted correctly by the above method. Performed the same prediction algorithm 100 times on the basis of randomly generated interactions. Only 12.2% of the predictions yielded a prediction that agreed with the known annotation.

PROBLEMS… Interactions of membrane proteins underrepresented: Y2H data Y2H data: lots of false positives. Only 15% agreement between this interaction data and Marcotte’s “high quality” prediction data. Uncertainities remain that WILL require additional experimentation.

CHALLENGES: Protein complexes are not static: change with metabolic state of cell, external stimuli etc. Protein chip technology: used to study transient interactions: amenable to variety of assays like nucleotide-binding, enzymatic activity etc.

II. Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. Park J, Lappe M, Teichmann SA. (2001). J Mol Biol. 307,

Protein DOMAIN interactions: interactions between whole structural families of evolutionarily related domains as opposed to interactions between individual proteins. Types of domain interactions: 1) Domain-domain(intra-chain) interactions in multi-domain polypeptide chains 2) Inter-chain protein interactions in multi- subunit protein complexes. 3) In transient complexes between proteins, which can also exist independently

METHODS: Protein superfamilies from SCOP db Interactions between families in the PDB: (domains of known 3D structure) coordinates of each domain were parsed to check whether there are 5 or more contacts with 5A  to another domain Interactions between families in the yeast genome: by homology: -Protein structures assigned to the yeast proteins using the domains from SCOP as queries in PSI-BLAST. -Yeast sequences also compared to the PDB-ISL with FASTA Assumption: Within polypeptide chains, structural domains interact if there are less than 30 amino acids separating them. If one family F has 2 domains, a and b, and each of these interacts with a domain from a different family, then the number of interaction families for F will be 2.

RESULTS: 1 st attempt at classifying interactions between all the known structural protein domains according to their families. Could classify 8151 interactions between individual domains in the PDB and the yeast in terms of 664 types of interactions between pairs of protein families. Scale free network: Most protein families only interact with 1 or 2 other families. A few families are extremely versatile in their interactions and are connected to many families (Hubs in the graph)-functional reasons. Eg: -Immunoglobulins, P-loop nucleotide triphosphate hydrolases In 45% of all families in the PDB, domains interact with other domains from the same family: internal duplication and domain oligomerisation is favourable. Pairs of families that interact both within and between polypeptide chains belong mostly to 2 types of domains: enzyme domains and domains from the same family.

PROBLEMS: Multi-domain proteins: cannot resolve exactly which domains are interacting: not used Members of 2 families can sometimes interact in different ways, using different types of interface (different modes of oligomerisation of nucleoside diphosphate kinases) Does not take account of symmetric homooligomers, of which only one monomer is in the PDB entry and hence the number of homomultimeric family interactions may be underestimated.

FUTURE: 51 new interactions between superfamilies: potential targets for structure elucidation and experimental investigation of these interacting polypeptides that do not have analogs in the PDB. For interactions in which one partner does not have a structural assignment, possible structures can be picked up from the set of known family interactions Database of domain-domain interfaces