Download presentation
Presentation is loading. Please wait.
1
Orthology Analysis Erik Sonnhammer Center for Genomics and Bioinformatics Karolinska Institutet, Stockholm
2
Outline Basic concepts BLAST-based approaches to orthology Tree-based approaches to orthology Domain-level orthology
3
Homologs = genes with a common origin May be genes in the same or in different organisms Does not say that function is identical Can only be true or false, and not a percentage! Homologs have the same 3D-structure layout
4
Homologs Orthologs Paralogs
5
Gene Y1 in human Gene Y in rat Gene Y2 in human D Gene X in ancient animal Gene Y in ancient mammal In-paralogs Orthologs: separated by speciation Gene X in ancient mammal Gene X in human Gene X in rat Time Orthologs Out-paralogs paralogs speciation D S S
6
In/Out-paralog definition In-paralogs ~ co-orthologs paralogs that were duplicated after the speciation and hence are orthologs to a cluster in the other species Out-paralogs = not co-orthologs paralogs that were duplicated before the speciation. Not necessarily in the same species. Sonnhammer & Koonin, Trends Genet. 18:619-620 (2002)
7
Orthologs for functional genomics Co-orthologs / inparalogs are more likely than outparalogs to have identical biochemical functions and biological roles. Co-orthologs can be used to discover human gene function via model organism experiments Co-orthologs are key to exploit functional genomics/proteomics data in in model organisms
8
Orthology and function conservation Orthology does not say anything about evolutionary distance. Close orthologs, e.g. human-mouse are very likely to have the same biological role in the organism. Distant orthologs, e.g. human-worm are less likely to have the same phenotypical role, but may have the same role in the corresponding pathway.
9
Ortholog Databases Sequence databaseOrthology detection method Ortholog database SwTrembl proteomesInparanoid (blast)Inparanoid proteomesCOGs (blast)COGs / KOGs TIGR gene indexCOGs (blast)TOGA/EGO proteomesOrthoMCL (blast)OrthoMCL PfamOrthostrapper (tree)HOPS PfamRIO (tree)
10
How to find orthologs? 1. Calculate phylogenetic tree, look for orthologs in the tree (Orthostrapper, Rio): 2. Two-way best matches between two species can be used to find orthologs without trees. [However, in-paralogs are harder to find this way]
11
Two-way best match approach to finding orthologs
12
COGs COG2813: Out- paralogs orthologs
13
Inpara-n-oid Inparalog ‘n ortholog identification Blue = species 1 Red = species 2
14
Inparanoid Blue = species 1 Red = species 2
15
No overlap - no problems: Partial overlap - separate: Complete overlap - merge: Resolve overlapping clusters
16
Inparalog score Score for inparalog P = (scoreAP - scoreAB) / (scoreAA - scoreAB) 0 20406080 100% A P B
17
Confidence values for main orthologs from sampling TVHIVDDEEPVR---KSLAFM---LTMNGFA T+ ++DD +R K L M +T+ G A TILLIDDHPMLRTGVKQLISMAPDITVVGEA Sampling with replacement; insertions kept intact GAFDEP---LVTHVR.......... GA + ++T +R GAEEHMAPDILTLLR.......... “Bootstrap alignment” -> “bootstrap score” Confidence = (bootstrap alignments best-best matches / nr of bootstraps)
18
http://inparanoid.cgb.ki.se
19
inparanoid.cgb.ki.se Remm et al, J. Mol. Biol. 314:1041-1052 (2001) Homo Sapiens vs. C. elegans
20
Ortholog group sizes, human vs X
21
Nr of inparalogs per ortholog group SpeciesAvg. inparalogs in model organism ortholog groups Avg. inparalogs in human ortholog groups Mouse1.361.56 Fly1.772.75 Worm1.443.13 Mustard weed3.733.33 Yeast1.263.34 E. coli1.733.57
22
No guarantee that the same segment is used in different sequences No evolutionary distance model Does not take multiple domains into account Drawbacks of Blast-based orthology assignment
23
Domain orthology Inparanoid Human-Fly ortholog pairs with domains in Pfam-A 13.0: 20335 Different domain architectures: 5411 –Many of these are minor differences, e.g. 22 vs 21 Spectrin repeats –Sometimes the difference is big: ef-handUCH TBCUCH
24
Tree-based approaches
25
Distance-based tree building Bootstrapping: –randomly pick columns to bootstrap alignment, calculate tree –Repeat 1000 times, frequency of node = bootstrap support A2A3 A148 A210 A1 A2 A3 1 3 5 2 A1 MKFYSLPNFPEN A2 MKYYKLPDLPDE A3 MRFYTACENPRS Distance matrix
26
Orthology by tree reconciliation Species tree Gene tree Infer 2 duplications and 2 losses
27
Assumption that the species tree is fully known Does not give confidence values Gene trees become unreliable when involving a lot of sequences (more data -> less certainty) Computationally expensive Drawbacks of tree reconciliation for orthology assignment
28
Partial tree reconciliation Find pairwise orthologs by computer parsing of tree.
29
99 45 85 100 82 99 C14F5.4 AAF49194.1 AH6.2 F37H8.4 Y6E2A.9 C47D12.3 T04F8.1 AAF52138.1 PIR-S67168 Pairwise orthology confidence by ‘orthostrapping’ The original tree with bootstrap support values
30
C14F5.4 AAF49194.1 AH6.2 F37H8.4 Y6E2A.9 C47D12.3 T04F8.1 AAF52138.1 PIR-S67168 Pairwise orthology confidence by ‘orthostrapping’ 01 C14F5.4 10 T04F8.1 00 C47D12. 3 00 Y6E2A.9 00 F37H8.4 00 AH6.2 AAF52138.1AAF49194.1 Fly Worm
31
C14F5.4 AAF49194.1 AH6.2 F37H8.4 Y6E2A.9 C47D12.3 T04F8.1 AAF52138.1 PIR-S67168 Pairwise orthology confidence by ‘orthostrapping’ 02 C14F5.4 20 T04F8.1 10 C47D12. 3 00 Y6E2A.9 00 F37H8.4 00 AH6.2 AAF52138.1AAF49194.1 Fly Worm
32
99 45 85 100 82 99 C14F5.4 AAF49194.1 AH6.2 F37H8.4 Y6E2A.9 C47D12.3 T04F8.1 AAF52138.1 PIR-S67168 Pairwise orthology confidence by ‘orthostrapping’ 099 C14F5.4 980 T04F8.1 810 C47D12. 3 770 Y6E2A.9 770 F37H8.4 770 AH6.2 AAF52138.1AAF49194.1 Fly Worm
33
orthostrapper.cgb.ki.se
34
Orthology is not transitive! Multiple species at different distances may give erroneous groups, that includes out-paralogs
35
Orthology is not transitive! -> Orthology strictly defined for only 2 species/clades Combining species of different distances is very dangerous But OK to combine multiple equidistant ones Y H1 D1 H2 D2 D1D1 H2 Y
36
Domain-level orthology
37
HOPS - Hierarchy of Orthologs and Paralogs eukaryota metazoa viridiplantae fungi nematoda arthropoda chordata 1.All species in Pfam are bundled in groups according to scheme: 2.Apply Orthostrapper to groups at same level in Pfam families 3.Display results in NIFAS
38
Pfam
39
Pfam in brief: Profile-HMM HMMer-2.0 FULL alignment Search database Manually curatedAutomatically made SEED alignment representative members Description file Release 13.0 (April 2004): –7426 families Pfam-A domain families –Based on 1160000 sequences (Swissprot & Trembl) –21980 unique Pfam-A domain architectures –73% of all proteins have >=1 Pfam-A domain
40
HOPS results Pfam 10, 6190 families: 2450 families (40%) have HOPS orthologs 1319 families (21%) have HOPS orthologs in all 6 pairwise comparisons 286356 pairwise orthology assignments (> 75% orthostrap) Storm and Sonnhammer, Genome Research 13:2353-2362 (2003)
41
Ways to access HOPS NIFAS graphical browser By sequence ID at Pfam.cgb.ki.se/HOPS Flatfiles (Orthostrap tables of 2 clades)
42
Pfam.cgb.ki.se/HOPS
45
Evolution of Domain Architectures NIFAS:
46
ATP sulfurylase /APS kinase
47
Orthologous shuffled domains? ATP sulfurylase domain, metazoa vs fungi
48
APS kinase domain
49
HOPS orthologs of PPS1_HUMAN (ATP sulfurylase/APS kinase)
50
Summary of ATP sulfurylases/APS kinases: Shuffled non-orthologous domains Fungi Metazoa
51
Conclusions Orthologs can be detected by –Blast: fast –tree: slow but less error-prone Species at different evolutionary distances should not be combined in orthology analysis Inparanoid and Orthostrapper were designed to find inparalogs but not outparalogs HOPS/NIFAS can be used to find domain orthologs and analyze domain architecture evolution
52
Future perspectives Multiparanoid – multiple species merging of pairwise Inparalogs. Functional divergence among inparalogs
53
Acknowledgments –Christian Storm –Maido Remm –Andrey Alexeyenko –Volker Hollich –Mats Jonsson http://sonnhammer.cgb.ki.se
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.