Predicting interactions between genes based on genome sequence comparisons The “genomic context” component of STRING Bioinformatics seminar series 15-11-2005.

Slides:



Advertisements
Similar presentations
Unravelling the biochemical reaction kinetics from time-series data Santiago Schnell Indiana University School of Informatics and Biocomplexity Institute.
Advertisements

STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg.
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Bioinformatics and Evolutionary Genomics Genome Evolution (I) and Genomics Context for function prediction.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
The STRING database Michael Kuhn EMBL Heidelberg.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics.
STRING Modeling of biological systems through cross-species data integration.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Predicting interactions between genes based on genome Sequence comparisons The “genomic context” component of STRING Bioinformatics seminar series
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis Jonsson.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Protein-protein interactions
COG and GO tutorial.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
1 Protein-Protein Interaction Networks MSC Seminar in Computational Biology
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Protein Interactions and Disease Audry Kang 7/15/2013.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Interaction Networks in Biology: Interface between Physics and Biology, Shekhar C. Mande, August 24, 2009 Interaction Networks in Biology: Interface between.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Protein-protein interactions Chapter 12. Stable complex Transient Interaction Transient Signaling Complex Rap1A – cRaf1 Interface 1310 Å 2 Stable complex:
Ch10. Intermolecular Interactions and Biological Pathways
Metagenomic Analysis Using MEGAN4
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Gene Set Enrichment Analysis (GSEA)
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Networks and Interactions Boo Virk v1.0.
Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Proteome and interactome Bioinformatics.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Protein and RNA Families
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Rui Alves Ciencies Mèdiques Bàsiques Universitat de Lleida
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Introduction to biological molecular networks
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
1 Computational functional genomics Lital Haham Sivan Pearl.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Protein-protein Interactions
Networks and Interactions
Functional organization of the yeast proteome by systematic analysis of protein complexes Presented by Nathalie Kirshman and Xinyi Ma.
FLiPS Functional Linkage Prediction Service.
Genome Annotation Continued
Large Scale Data Integration
Protein Interaction Networks
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Essential knowledge 1.B.1:
Presentation transcript:

Predicting interactions between genes based on genome sequence comparisons The “genomic context” component of STRING Bioinformatics seminar series Berend Snel

TodayToday Announcement: the seminar of Jakob de Vlieg on 22 November is canceled. Please consult the website of the seminar series ( for the new date.Announcement: the seminar of Jakob de Vlieg on 22 November is canceled. Please consult the website of the seminar series ( for the new date. Seminar (today); please ask questions !!!Seminar (today); please ask questions !!! Handing out article and questions : “Identification of a bacterial regulatory system for ribonucleotide reductases by phylogenetic profiling.” Read the article and hand in the answers to the questions by Monday November 28th.Handing out article and questions : “Identification of a bacterial regulatory system for ribonucleotide reductases by phylogenetic profiling.” Read the article and hand in the answers to the questions by Monday November 28th. Announcement: the seminar of Jakob de Vlieg on 22 November is canceled. Please consult the website of the seminar series ( for the new date.Announcement: the seminar of Jakob de Vlieg on 22 November is canceled. Please consult the website of the seminar series ( for the new date. Seminar (today); please ask questions !!!Seminar (today); please ask questions !!! Handing out article and questions : “Identification of a bacterial regulatory system for ribonucleotide reductases by phylogenetic profiling.” Read the article and hand in the answers to the questions by Monday November 28th.Handing out article and questions : “Identification of a bacterial regulatory system for ribonucleotide reductases by phylogenetic profiling.” Read the article and hand in the answers to the questions by Monday November 28th.

ContentsContents Predicting functional interactions between proteins; what & whyPredicting functional interactions between proteins; what & why Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteins; what & whyPredicting functional interactions between proteins; what & why Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data

Complete genomes, now what? Post-genomic era = we have the parts list (complete genomes)Post-genomic era = we have the parts list (complete genomes) to understand the cell we need to know the functions of the genesto understand the cell we need to know the functions of the genes Post-genomic era = we have the parts list (complete genomes)Post-genomic era = we have the parts list (complete genomes) to understand the cell we need to know the functions of the genesto understand the cell we need to know the functions of the genes

A bacterial genome gene gene /gene="dnaA" /locus_tag="BCE33L0001" /old_locus_tag="BCZK0001" CDS CDS /gene="dnaA" /locus_tag="BCE33L0001“ /old_locus_tag="BCZK0001" /inference="non-experimental evidence, no additional details recorded“ /codon_start=1 /transl_table=1111 /product="chromosomal replication initiator protein“ /protein_id="AAU "AAU /db_xref="GI: “ /translation="MENISDLWNSALKELEKKVSKPSYETWLKSTTAHNLKKDVLTIT APNEFARDWLESHYSELISETLYDLTGAKLAIRFIIPQSQAEEEIDLPPAKPNAAQDD SNHLPQSMLNPKYTFDTFVIGSGNRFAHAASLAVAEAPAKAYNPLFIYGGVGLGKTHL MHAIGHYVIEHNPNAKVVYLSSEKFTNEFINSIRDNKAVDFRNKYRNVDVLLIDDIQF LAGKEQTQEEFFHTFNALHEESKQIVISSDRPPKEIPTLEDRLRSRFEWGLITDITPP DLETRIAILRKKAKAEGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLINKDIN ADLAAEALKDIIPNSKPKIISIYDIQKAVGDVYQVKLEDFKAKKRTKSVAFPRQIAMY LSRELTDSSLPKIGEEFGGRDHTTVIHAHEKISKLLKTDTQLQKQVEEINDILK" gene /gene="dnaN" /locus_tag="BCE33L0002" /old_locus_tag="BCZK0002" CDS /gene="dnaN" /locus_tag="BCE33L0002" /old_locus_tag="BCZK0002" /EC_number=" " /inference="non-experimental evidence, no additional details recorded" /codon_start=1 /transl_table=11 /product="DNA polymerase III, beta subunit" /protein_id="AAU " /db_xref="GI: " /translation="MRFTIQKDYLVRSVQDVMKAVSSRTTIPILTGIKVVATEEGVTL TGSDADISIESFIPVEEDGKEIVEVKQSGSIVLQAKYFSEIVKKLPKETVEISVENHL MTKITSGKSEFNLNGLDSAEYPLLPQIEEHHVFKIPTDLLKHMIRQTVFAVSTSETRP ILTGVNWKVYNSELTCIATDSHRLALRKAKIEGIADEFQANVVIPGKSLNELSKILDE SEEMVDIVITEYQVLFRTKHLLFFSRLLEGNYPDTTRLIPAESKTDIFVNTKEFLQAI DRASLLARDGRNNVVKLSTLEQAMLEISSNSPEIGKVVEEVQCEKVDGEELKISFSAK YMMDALKALDSTEIKISFTGAMRPFLIRTVNDESIIQLILPVRTY" geneCDS AAU

For most genes in any genome we need function prediction - - E. Coli, the most intensively studied organism: only 1924 genes (~43%) have been (partially) experimentally characterized. - - E. Coli, the most intensively studied organism: only 1924 genes (~43%) have been (partially) experimentally characterized.

What is function ? Various levels of description: Sequence similarity/homology has the largest relevance for “Molecular Function”. This aspect of protein function is best conserved. Molecular function can often be predicted from similarities between protein sequences (BLAST), or structures. What is function ? Various levels of description: Sequence similarity/homology has the largest relevance for “Molecular Function”. This aspect of protein function is best conserved. Molecular function can often be predicted from similarities between protein sequences (BLAST), or structures. Predicting protein function

Homology: BLAST and / or SMART/PFAM/CDD gi| |gi| |Mayven [Homo sapiens] gi| |gi| |Klhl2 protein [Mus musculus] i| |i| |hypothetical protein [Pongo pygmaeus] gi| |gi| |Klhl3 [Homo sapiens] gi| |gi| |Klhl3 protein [Mus musculus] gi| |gi| | Ring canal kelch protein [Drosophila melanogaster]

“Beyond” homology and molecular function Homology based function prediction works very well, yet: a large fraction of genes are poorly described (no homologs, uncharacterized homologs; this holds for ~60% of the human genes)a large fraction of genes are poorly described (no homologs, uncharacterized homologs; this holds for ~60% of the human genes) There are other aspects of function: functional associations, e.g. the target of a protein kinase or a transcriptional regulator, I.e. to understand the cell we need to know the interactions of the genesThere are other aspects of function: functional associations, e.g. the target of a protein kinase or a transcriptional regulator, I.e. to understand the cell we need to know the interactions of the genes Thus: predicting associations Homology based function prediction works very well, yet: a large fraction of genes are poorly described (no homologs, uncharacterized homologs; this holds for ~60% of the human genes)a large fraction of genes are poorly described (no homologs, uncharacterized homologs; this holds for ~60% of the human genes) There are other aspects of function: functional associations, e.g. the target of a protein kinase or a transcriptional regulator, I.e. to understand the cell we need to know the interactions of the genesThere are other aspects of function: functional associations, e.g. the target of a protein kinase or a transcriptional regulator, I.e. to understand the cell we need to know the interactions of the genes Thus: predicting associations

Transcription regulation Transcription regulation P P Signalling pathways Protein complexes Metabolic pathways There are many types of functional associations (AKA functional interactions, interactions, functional links, functional relations) in molecular biology Cellular process

Types of functional associations metabolic pathways: filling gaps

Types of functional associations Transcription regulation P P Signalling pathways

Types of functional associations Cellular process (“DNA repair”, “Apoptosis”) Cellular process (“DNA repair”, “Apoptosis”) Protein complexes

So how can knowledge of the functional associations help? If we did not know anything about the function of the protein we can now say in which process it is involvedIf we did not know anything about the function of the protein we can now say in which process it is involved If we already knew something about the function, we might now know much more about the function (I.e. if we knew it was a hydrolase we might now know in which metabolic pathway it is active)If we already knew something about the function, we might now know much more about the function (I.e. if we knew it was a hydrolase we might now know in which metabolic pathway it is active) If the gene was already well characterized, we might understand its role better (I.e. new targets for a kinase)If the gene was already well characterized, we might understand its role better (I.e. new targets for a kinase) If we did not know anything about the function of the protein we can now say in which process it is involvedIf we did not know anything about the function of the protein we can now say in which process it is involved If we already knew something about the function, we might now know much more about the function (I.e. if we knew it was a hydrolase we might now know in which metabolic pathway it is active)If we already knew something about the function, we might now know much more about the function (I.e. if we knew it was a hydrolase we might now know in which metabolic pathway it is active) If the gene was already well characterized, we might understand its role better (I.e. new targets for a kinase)If the gene was already well characterized, we might understand its role better (I.e. new targets for a kinase)

ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General (how do we predict functional interactions) –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General (how do we predict functional interactions) –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data

How can we now predict / detect functional associations? Functional genomics / high throughput experimentsFunctional genomics / high throughput experiments GENOMIC CONTEXTGENOMIC CONTEXT Functional genomics / high throughput experimentsFunctional genomics / high throughput experiments GENOMIC CONTEXTGENOMIC CONTEXT

functionally associated proteins leave evolutionary traces of their relation in genomes

We can thus detect evolutionary traces of a functional association by comparing genomes

Use the genome sequences Themselves (through comparative genome analysis) for interaction prediction: genomic context methods Use the genome sequences Themselves (through comparative genome analysis) for interaction prediction: genomic context methods Genomic context is an tool to predict functional associations between genes Genomic context methods have been shown to be reliable indicators for functional interactionGenomic context methods have been shown to be reliable indicators for functional interaction Genomic context is also known as in silico interaction prediction, or genomic associations Genomic context is also known as in silico interaction prediction, or genomic associations Genomic context methods have been shown to be reliable indicators for functional interactionGenomic context methods have been shown to be reliable indicators for functional interaction Genomic context is also known as in silico interaction prediction, or genomic associations Genomic context is also known as in silico interaction prediction, or genomic associations

Three different genomic context methods in STRING Gene fusion, Rosetta stone methodGene fusion, Rosetta stone method Conserved gene order between divergent genomesConserved gene order between divergent genomes Co-occurrence of genes across genomes, phylogenetic profilesCo-occurrence of genes across genomes, phylogenetic profiles Gene fusion, Rosetta stone methodGene fusion, Rosetta stone method Conserved gene order between divergent genomesConserved gene order between divergent genomes Co-occurrence of genes across genomes, phylogenetic profilesCo-occurrence of genes across genomes, phylogenetic profiles

ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data

Gene fusion i.e. the orthologs of two genes in another organism are fused into one polypeptidei.e. the orthologs of two genes in another organism are fused into one polypeptide A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: 3470 distinct fusions when surveying 179 genomesA very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: 3470 distinct fusions when surveying 179 genomes i.e. the orthologs of two genes in another organism are fused into one polypeptidei.e. the orthologs of two genes in another organism are fused into one polypeptide A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: 3470 distinct fusions when surveying 179 genomesA very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: 3470 distinct fusions when surveying 179 genomes FusionFusion

Gene fusion: an example

ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data

Gene order evolves rapidly But …

Differential retention of divergent / convergent gene pairs suggests that conservation implies a functional association “Operons”

Comparison to pathways conservation implies a functional association

Conserved gene order i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene clusteri.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene cluster Contributes by far the most predictionsContributes by far the most predictions i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene clusteri.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene cluster Contributes by far the most predictionsContributes by far the most predictions

Conserved gene order NB1 predicting operons is not trivial; in fact conserved gene order or functional association is a major clue NB2 using ‘only’ operons without requiring conservation results in much less reliable function prediction

Conserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoA “query” “target”

Biochemical assays confirm the function of members of COG0346 as a DL- methylmalonyl-CoA racemase

ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data

Presence / absence of genes Gene content  co-evolution. (The easy case, few genomes. ) Genomes share genes for phenotypes they have in common Differences between gene Content reflect differences in Phenotypic potentialities Differences between gene Content reflect differences in Phenotypic potentialities

Presence / absence of genes L. innocua (non-pathogen) L. monocytogenes (pathogen)

Presence / absence of genes L. innocua (non-pathogenic) L. monocytogenes (pathogenic) Genes involved in pathogenecity

Generalization: phylogenetic profiles / co-occurence Gene 1: Gene 2: Gene 3:.... Gene 1: Gene 2: Gene 3:.... species 1 species 2 species 3 species 4 species species 1 species 2 species 3 species 4 species Gene 1: Gene 2: Gene 3: Gene 1: Gene 2: Gene 3: species 1 species 2 species 3 species 4 species species 1 species 2 species 3 species 4 species

Co-occurrence of genes across genomes i.e. two genes have the same presence/ absence pattern over multiple genomes: they have ‘co- evolved’ i.e. two genes have the same presence/ absence pattern over multiple genomes: they have ‘co- evolved’ AKA phylogenetic profilesAKA phylogenetic profiles

Predicting function of a disease gene protein with unknown function, frataxin, using co-occurrence of genes across genomes Friedreich’s ataxiaFriedreich’s ataxia No (homolog with) known functionNo (homolog with) known function Friedreich’s ataxiaFriedreich’s ataxia No (homolog with) known functionNo (homolog with) known function

A. a e o l i c u s S y n e c h o c y s t i s B. s u b t i l i s M. g e n i t a l i u m M. t u b e r c u l o s i s D. r a d i o d u r a n s R. p r o w a z e k i i C. c r e s c e n t u s M. l o t i N. m e n i n g i t i d i s X. f a s t i d i o s a P. a e r u g i n o s a B u c h n e r a V. c h o l e r a e H. i n f l u e n z a e P. m u l t o c i d a E. coli A. p e r n i x M. j a n n a s c h i i A. t h a l i a n a S. c e r e v i s i a e s C. j e j u n i C. a l b i c a n s S. p o m b e H. s a p i e n s C. e l e g a n H. pylori D.melan. cyaY Yfh1 hscB Jac1 hscA ssq1 Nfu1 iscA Isa1-2 fdx Yah1 Arh1 RnaM IscR Hyp iscS Nfs1 iscU Isu1-2 Atm1 Atm1 Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur cluster assembly

Iron-Sulfur (2Fe-2S) cluster in the Rieske protein

Prediction: Confirmation:

The opposite of co-occurrence: anti-correlation / complementary patterns: predicting analogous enzymes ABAB Genes with complementary phylogenetic profiles tend to have a similar biochemical function.

Complementary patterns in thiamin biosynthesis predict analogous enzymes Morett E, Korbel JO, Rajan E, Saab-Rincon G, Olvera L, Olvera M, Schmidt S, Snel B, Bork P.Morett E, Korbel JO, Rajan E, Saab-Rincon G, Olvera L, Olvera M, Schmidt S, Snel B, Bork P. Nature Biotech 2003 Morett E, Korbel JO, Rajan E, Saab-Rincon G, Olvera L, Olvera M, Schmidt S, Snel B, Bork P.

Prediction of analogous enzymes is confirmed

ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data

Benchmark and integration: KEGG maps

Integrating genomic context scores into one single score Compare each individual method against an independent benchmark (KEGG), and find “equivalency” Compare each individual method against an independent benchmark (KEGG), and find “equivalency” Multiply the chances that two proteins are not interacting and subtract from 1; naive bayesian i.e. assuming independence Multiply the chances that two proteins are not interacting and subtract from 1; naive bayesian i.e. assuming independence

BenchmarkBenchmark Accuracy (fraction of confirmed predictions, i.e. same KEGG map) Fusion (norm.) Fusion (abs.) Gene Order (norm.) Gene Order (abs.) Cooccurrence Integrated Coverage (number of predicted links between orthologous groups)

Accuracy Coverage purified complexes TAP yeast two-hybrid two methods three methods Purified Complexes HMS-PCI combined evidence mRNA co-expression genomic context synthetic lethality fraction of reference set covered by data fraction of data confirmed by reference set filtered data raw data parameter choices Performance of genomic context compared to high-throughput interaction data

ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Biochemistry by other means BolABiochemistry by other means BolA In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data

Data-mining proteins for protein function prediction: BolA

An interaction of BolA with a mono-thiol glutaredoxin ? (STRING) BolA

BolA and Grx occur as neighbors in a number of genomes Bola Grx

BolA and Grx have an (almost) identical phylogenetic distribution

BolA and Grx have been shown to interact in Y2H in S.cerevisiae and D.melanogaster, and in Flag tag in S.cerevisiae BolA phylogeny

BolA does have (predicted) interactions with cell-division / cell-wall proteins. Those appear secondary to the link with GrX  STRING has obtained a higher resolution in function prediction than phenotypic analyses Cell division / Cell wall oxidative) stress (oxidative) stress

BolA is homologous to the peroxide reductase OsmC, suggesting a similar function

OsmC uses thiol groups of two, evolutionary conserved cysteines to reduce substrates Problem: The BolA family does not have conserved cysteines. …It would have to obtain its reducing equivalents from elsewhere… BolA family alignment

BolA is (homologous to) a reductaseBolA interacts with GrX ? GrX provides BolA with reducing equivalents !? (or “scaffolding?”) Prediction of interaction partner and molecular function complement each other

Genomic context: biochemistry by other means Despite the high performance of genomic context methods, as a tool for function prediction it is not a button press method It is more like biochemistry by other means. Often quite a lot of manual input and expert knowledge from the researcher is needed to distill associations into a concrete function prediction Small-scale bioinformatics?

ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Fusion –Gene order –Co-occurrence across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Fusion –Gene order –Co-occurrence across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data

STRING currently in addition includes: Functional association data from large scale / high- throughput biochemical experiments (functional genomics data) Functional association data from large scale / high- throughput biochemical experiments (functional genomics data) protein complex purification protein complex purification yeast-2-hybrid yeast-2-hybrid ChIP-on-chip ChIP-on-chip micro-array gene expression micro-array gene expression “known” functional relations, so called “legacy data”, as present in PubMed abstracts and databases like MIPS or KEGG. “known” functional relations, so called “legacy data”, as present in PubMed abstracts and databases like MIPS or KEGG. STRING currently in addition includes: Functional association data from large scale / high- throughput biochemical experiments (functional genomics data) Functional association data from large scale / high- throughput biochemical experiments (functional genomics data) protein complex purification protein complex purification yeast-2-hybrid yeast-2-hybrid ChIP-on-chip ChIP-on-chip micro-array gene expression micro-array gene expression “known” functional relations, so called “legacy data”, as present in PubMed abstracts and databases like MIPS or KEGG. “known” functional relations, so called “legacy data”, as present in PubMed abstracts and databases like MIPS or KEGG.

Handing out article and questions : “Identification of a bacterial regulatory system for ribonucleotide reductases by phylogenetic profiling.” Read the article and hand in the answers to the questions by Monday November 28th.Handing out article and questions : “Identification of a bacterial regulatory system for ribonucleotide reductases by phylogenetic profiling.” Read the article and hand in the answers to the questions by Monday November 28th.