Download presentation
Presentation is loading. Please wait.
1
Predicting interactions between genes based on genome Sequence comparisons The “genomic context” component of STRING Bioinformatics seminar series 5-10-2004 Berend Snel
2
To do Seminar (today); please ask questionsSeminar (today); please ask questions Article: “a gene co-expression network for global discovery of conserved genetic modules”Article: “a gene co-expression network for global discovery of conserved genetic modules” –Make schedule for article discussion (today) –Read article (next couple of days) –5 minute discussion per person of the article (Preferentially Monday 11 October) Seminar (today); please ask questionsSeminar (today); please ask questions Article: “a gene co-expression network for global discovery of conserved genetic modules”Article: “a gene co-expression network for global discovery of conserved genetic modules” –Make schedule for article discussion (today) –Read article (next couple of days) –5 minute discussion per person of the article (Preferentially Monday 11 October)
3
http://string.embl.de
4
ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data
5
Complete genomes, now what? Post-genomic era = we have the parts list (complete genomes)Post-genomic era = we have the parts list (complete genomes) to understand the cell we need to know the functions of the genesto understand the cell we need to know the functions of the genes Post-genomic era = we have the parts list (complete genomes)Post-genomic era = we have the parts list (complete genomes) to understand the cell we need to know the functions of the genesto understand the cell we need to know the functions of the genes
6
For most genes in any genome we need function prediction - - E. Coli, the most intensively studied organism: only 1924 genes (~43%) have been (partially) experimentally characterized. - - E. Coli, the most intensively studied organism: only 1924 genes (~43%) have been (partially) experimentally characterized.
7
What is function ? Various levels of description: Sequence similarity/homology has the largest relevance for “Molecular Function”. This aspect of protein function is best conserved. Molecular function can often be predicted from similarities between protein sequences (BLAST), or structures. What is function ? Various levels of description: Sequence similarity/homology has the largest relevance for “Molecular Function”. This aspect of protein function is best conserved. Molecular function can often be predicted from similarities between protein sequences (BLAST), or structures. Predicting protein function
8
BLASTBLAST
9
“Beyond” homology and molecular function Homolgy based function prediction works very well, but … … a large fraction of genes are poorly described (no homologs, uncharacterized homologs; this holds for ~60% of the human genes)… a large fraction of genes are poorly described (no homologs, uncharacterized homologs; this holds for ~60% of the human genes) … There are other aspects of function: functional associations, e.g. the target of a protein kinase or a transcriptional regulator… There are other aspects of function: functional associations, e.g. the target of a protein kinase or a transcriptional regulator Thus: predicting these associations Homolgy based function prediction works very well, but … … a large fraction of genes are poorly described (no homologs, uncharacterized homologs; this holds for ~60% of the human genes)… a large fraction of genes are poorly described (no homologs, uncharacterized homologs; this holds for ~60% of the human genes) … There are other aspects of function: functional associations, e.g. the target of a protein kinase or a transcriptional regulator… There are other aspects of function: functional associations, e.g. the target of a protein kinase or a transcriptional regulator Thus: predicting these associations
10
Genome sequences: Allowing us to interpret the function of proteins within the context in which they occur: Use the genome sequences (through comparative genome analysis) for interaction prediction: genomic context methodsReverse this process: predict the function of a protein from the context in which it tends to occur prediction of protein function/pathways from genome sequences: Use the genome sequences (through comparative genome analysis) for interaction prediction: genomic context methods Genomic context methods have been shown to be reliable indicators for functional associationsGenomic context methods have been shown to be reliable indicators for functional associations Genome sequences: Allowing us to interpret the function of proteins within the context in which they occur: Use the genome sequences (through comparative genome analysis) for interaction prediction: genomic context methodsReverse this process: predict the function of a protein from the context in which it tends to occur prediction of protein function/pathways from genome sequences: Use the genome sequences (through comparative genome analysis) for interaction prediction: genomic context methods Genomic context methods have been shown to be reliable indicators for functional associationsGenomic context methods have been shown to be reliable indicators for functional associations
11
Transcription regulation Transcription regulation P P Signalling pathways Protein complexes Metabolic pathways There are many types of functional associations (AKA functional interactions, interactions, functional links, functional relations) in molecular biology Cellular process
12
Types of functional associations metabolic pathways: filling gaps
13
Types of functional associations Transcription regulation P P Signalling pathways
14
Types of functional associations Cellular process Protein complexes
15
ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data
16
Use the genome sequences (through comparative genome analysis) for interaction prediction: genomic context methods Use the genome sequences (through comparative genome analysis) for interaction prediction: genomic context methods Genomic context methods have been shown to be reliable indicators for functional interaction Genomic context methods have been shown to be reliable indicators for functional interaction Genomic context is also known as in silico interaction prediction, or genomic associations Genomic context is also known as in silico interaction prediction, or genomic associations Use the genome sequences (through comparative genome analysis) for interaction prediction: genomic context methods Use the genome sequences (through comparative genome analysis) for interaction prediction: genomic context methods Genomic context methods have been shown to be reliable indicators for functional interaction Genomic context methods have been shown to be reliable indicators for functional interaction Genomic context is also known as in silico interaction prediction, or genomic associations Genomic context is also known as in silico interaction prediction, or genomic associations Genomic context is an tool to predict functional associations between genes
17
trpAtrpB Genomic context methods detect evolutionary traces in genomes of functionally associated proteins
19
Three different genomic context methods in STRING Gene fusion, Rosetta stone methodGene fusion, Rosetta stone method Conserved gene order between divergent genomesConserved gene order between divergent genomes Co-occurrence of genes across genomes, phylogenetic profilesCo-occurrence of genes across genomes, phylogenetic profiles Gene fusion, Rosetta stone methodGene fusion, Rosetta stone method Conserved gene order between divergent genomesConserved gene order between divergent genomes Co-occurrence of genes across genomes, phylogenetic profilesCo-occurrence of genes across genomes, phylogenetic profiles
20
All genomic context methods use orthologs: corresponding genes between genomes Orthologs not just homologs; related by speciationOrthologs not just homologs; related by speciation Orthologs are very likely to have the same functionOrthologs are very likely to have the same function orthologs : genomes = alignment : sequenceorthologs : genomes = alignment : sequence Orthologs not just homologs; related by speciationOrthologs not just homologs; related by speciation Orthologs are very likely to have the same functionOrthologs are very likely to have the same function orthologs : genomes = alignment : sequenceorthologs : genomes = alignment : sequence Gene Duplication Speciation
21
ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data
22
Gene fusion i.e. the orthologs of two genes in another organism are fused into one polypeptidei.e. the orthologs of two genes in another organism are fused into one polypeptide A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: 3470 distinct fusions when surveying 179 genomesA very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: 3470 distinct fusions when surveying 179 genomes i.e. the orthologs of two genes in another organism are fused into one polypeptidei.e. the orthologs of two genes in another organism are fused into one polypeptide A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: 3470 distinct fusions when surveying 179 genomesA very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: 3470 distinct fusions when surveying 179 genomes FusionFusion
23
Gene fusion: an example
24
ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data
25
Gene order evolves rapidly But …
26
Differential retention of divergent / convergent gene pairs suggests that conservation implies a functional association
27
Comparison to pathways conservation implies a functional association
28
Conserved gene order i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene clusteri.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene cluster Contributes by far the most predictionsContributes by far the most predictions i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene clusteri.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene cluster Contributes by far the most predictionsContributes by far the most predictions
29
Conserved gene order NB1 predicting operons is not trivial; in fact conserved gene order or functional association is a major clue NB2 using ‘only’ operons without requiring conservation results in much less reliable function prediction
30
Conserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoA “query” “target”
31
Biochemical assays confirm the function of members of COG0346 as a DL- methylmalonyl-CoA racemase
32
ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data
33
Presence / absence of genes Gene content co-evolution. (The easy case, few genomes. ) Genomes share genes for phenotypes they have in common Differences between gene Content reflect differences in Phenotypic potentialities Differences between gene Content reflect differences in Phenotypic potentialities
34
Presence / absence of genes L. innocua (non-pathogen) L. monocytogenes (pathogen)
35
Presence / absence of genes L. innocua (non-pathogenic) L. monocytogenes (pathogenic) Genes involved in pathogenecity
36
Generalization: phylogenetic profiles / co-occurence Gene 1: Gene 2: Gene 3:.... Gene 1: Gene 2: Gene 3:.... species 1 species 2 species 3 species 4 species 5........... species 1 species 2 species 3 species 4 species 5........... Gene 1: 1 0 1 1 0 1 Gene 2: 1 1 0 0 1 0 Gene 3: 0 1 0 0 1 0.... Gene 1: 1 0 1 1 0 1 Gene 2: 1 1 0 0 1 0 Gene 3: 0 1 0 0 1 0.... species 1 species 2 species 3 species 4 species 5........... species 1 species 2 species 3 species 4 species 5...........
37
… but phylogenetic signal in gene content! Escherichia coli Haemophilus influenzae \s sp1 sp2 sp3 sp4 … sp1 \1 0.2 0.4 0.2 … sp2 \1 0.9 0.1 … sp3 \1 0.3 … sp4 \1 … … … … … … \s sp1 sp2 sp3 sp4 … sp1 \1 0.2 0.4 0.2 … sp2 \1 0.9 0.1 … sp3 \1 0.3 … sp4 \1 … … … … … …
38
Co-occurrence of genes across genomes i.e. two genes have the same presence/ absence pattern over multiple genomes: they have ‘co- evolved’ i.e. two genes have the same presence/ absence pattern over multiple genomes: they have ‘co- evolved’ AKA phylogenetic profilesAKA phylogenetic profiles
39
Predicting function of a disease gene protein with unknown function, frataxin, using co-occurrence of genes across genomes Friedreich’s ataxiaFriedreich’s ataxia No (homolog with) known functionNo (homolog with) known function Friedreich’s ataxiaFriedreich’s ataxia No (homolog with) known functionNo (homolog with) known function
40
A. a e o l i c u s S y n e c h o c y s t i s B. s u b t i l i s M. g e n i t a l i u m M. t u b e r c u l o s i s D. r a d i o d u r a n s R. p r o w a z e k i i C. c r e s c e n t u s M. l o t i N. m e n i n g i t i d i s X. f a s t i d i o s a P. a e r u g i n o s a B u c h n e r a V. c h o l e r a e H. i n f l u e n z a e P. m u l t o c i d a E. coli A. p e r n i x M. j a n n a s c h i i A. t h a l i a n a S. c e r e v i s i a e s C. j e j u n i C. a l b i c a n s S. p o m b e H. s a p i e n s C. e l e g a n H. pylori D.melan. cyaY Yfh1 hscB Jac1 hscA ssq1 Nfu1 iscA Isa1-2 fdx Yah1 Arh1 RnaM IscR Hyp iscS Nfs1 iscU Isu1-2 Atm1 Atm1 Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur cluster assembly
41
Iron-Sulfur (2Fe-2S) cluster in the Rieske protein
42
Prediction: Confirmation:
43
The opposite of co-occurrence: anti-correlation / complementary patterns: predicting analogous enzymes ABAB Genes with complementary phylogenetic profiles tend to have a similar biochemical function.
44
Complementary patterns in thiamin biosynthesis predict analogous enzymes
45
Prediction of analogous enzymes is confirmed
46
ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Gene fusion –Gene order –Presence / absence of genes across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data
47
Benchmark and integration: KEGG maps
48
00.20.40.60.81 Score 0 0.2 0.4 0.6 0.8 1 Fusion Gene Order Co-occurrence Fraction same KEGG map Integrating genomic context scores into one single score Compare each individual method against an independent benchmark (KEGG), and find “equivalency” Compare each individual method against an independent benchmark (KEGG), and find “equivalency” Multiply the chances that two proteins are not interacting and subtract from 1; naive bayesian i.e. assuming independence Multiply the chances that two proteins are not interacting and subtract from 1; naive bayesian i.e. assuming independence
49
BenchmarkBenchmark 0.50.60.70.80.91.0 Accuracy (fraction of confirmed predictions, i.e. same KEGG map) 10 100 1000 10000 100000 Fusion (norm.) Fusion (abs.) Gene Order (norm.) Gene Order (abs.) Cooccurrence Integrated Coverage (number of predicted links between orthologous groups)
50
Accuracy Coverage purified complexes TAP yeast two-hybrid two methods three methods Purified Complexes HMS-PCI combined evidence mRNA co-expression genomic context synthetic lethality fraction of reference set covered by data fraction of data confirmed by reference set filtered data raw data parameter choices Performance of genomic context compared to high-throughput interaction data
51
Genomic context: biochemistry by other means Despite the high performance of genomic context methods, as a tool for function prediction it is not a button press method It is more like biochemistry by other means. Often quite a lot of manual input and expert knowledge from the researcher is needed to distill associations into a concrete function prediction Small-scale bioinformatics?
52
ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Fusion –Gene order –Co-occurrence across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Fusion –Gene order –Co-occurrence across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data
53
STRING allows a network view e.g. see not only to which genes the query gene has an association, but also what the relations are among these other genes
54
STRING Network output (depth=1) Archeal flagellins Archeal flagellin biosynth. ATPase uncharacterized archeal proteins Assigning to a network around
55
STRING Network(depth=2) Archeal flagellins Chemotaxis-related Type IV secretion pathway Archeal flagella components Connectingassociatedcellularprocesses
56
STRING Network(depth=3) Zooming out to other cellular processes
57
Using the local network to detect multi-functional proteins
58
ContentsContents Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Fusion –Gene order –Co-occurrence across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data Predicting functional interactions between proteinsPredicting functional interactions between proteins Genomic context methodsGenomic context methods –General –Fusion –Gene order –Co-occurrence across genomes Integration and benchmarking of predictionsIntegration and benchmarking of predictions Interaction networksInteraction networks In addition to genomic context: functional genomics dataIn addition to genomic context: functional genomics data
59
STRING currently in addition includes: Functional association data from large scale / high- throughput biochemical experiments (functional genomics data) Functional association data from large scale / high- throughput biochemical experiments (functional genomics data) protein complex purification protein complex purification yeast-2-hybrid yeast-2-hybrid ChIP-on-chip ChIP-on-chip micro-array gene expression micro-array gene expression “known” functional relations, so called “legacy data”, as present in PubMed abstracts and databases like MIPS or KEGG. “known” functional relations, so called “legacy data”, as present in PubMed abstracts and databases like MIPS or KEGG. STRING currently in addition includes: Functional association data from large scale / high- throughput biochemical experiments (functional genomics data) Functional association data from large scale / high- throughput biochemical experiments (functional genomics data) protein complex purification protein complex purification yeast-2-hybrid yeast-2-hybrid ChIP-on-chip ChIP-on-chip micro-array gene expression micro-array gene expression “known” functional relations, so called “legacy data”, as present in PubMed abstracts and databases like MIPS or KEGG. “known” functional relations, so called “legacy data”, as present in PubMed abstracts and databases like MIPS or KEGG.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.