Presentation is loading. Please wait.

Presentation is loading. Please wait.

Orthology Analysis Erik Sonnhammer Center for Genomics and Bioinformatics Karolinska Institutet, Stockholm.

Similar presentations


Presentation on theme: "Orthology Analysis Erik Sonnhammer Center for Genomics and Bioinformatics Karolinska Institutet, Stockholm."— Presentation transcript:

1 Orthology Analysis Erik Sonnhammer Center for Genomics and Bioinformatics Karolinska Institutet, Stockholm

2 Outline Basic concepts BLAST-based approaches to orthology Tree-based approaches to orthology Domain-level orthology

3 Homologs = genes with a common origin May be genes in the same or in different organisms Does not say that function is identical Can only be true or false, and not a percentage! Homologs have the same 3D-structure layout

4 Homologs Orthologs Paralogs

5 Gene Y1 in human Gene Y in rat Gene Y2 in human D Gene X in ancient animal Gene Y in ancient mammal In-paralogs Orthologs: separated by speciation Gene X in ancient mammal Gene X in human Gene X in rat Time Orthologs Out-paralogs paralogs speciation D S S

6 In/Out-paralog definition In-paralogs ~ co-orthologs paralogs that were duplicated after the speciation and hence are orthologs to a cluster in the other species Out-paralogs = not co-orthologs paralogs that were duplicated before the speciation. Not necessarily in the same species. Sonnhammer & Koonin, Trends Genet. 18:619-620 (2002)

7 Orthologs for functional genomics Co-orthologs / inparalogs are more likely than outparalogs to have identical biochemical functions and biological roles. Co-orthologs can be used to discover human gene function via model organism experiments Co-orthologs are key to exploit functional genomics/proteomics data in in model organisms

8 Orthology and function conservation Orthology does not say anything about evolutionary distance. Close orthologs, e.g. human-mouse are very likely to have the same biological role in the organism. Distant orthologs, e.g. human-worm are less likely to have the same phenotypical role, but may have the same role in the corresponding pathway.

9 Ortholog Databases Sequence databaseOrthology detection method Ortholog database SwTrembl proteomesInparanoid (blast)Inparanoid proteomesCOGs (blast)COGs / KOGs TIGR gene indexCOGs (blast)TOGA/EGO proteomesOrthoMCL (blast)OrthoMCL PfamOrthostrapper (tree)HOPS PfamRIO (tree)

10 How to find orthologs? 1. Calculate phylogenetic tree, look for orthologs in the tree (Orthostrapper, Rio): 2. Two-way best matches between two species can be used to find orthologs without trees. [However, in-paralogs are harder to find this way]

11 Two-way best match approach to finding orthologs

12 COGs COG2813: Out- paralogs orthologs

13 Inpara-n-oid Inparalog ‘n ortholog identification Blue = species 1 Red = species 2

14 Inparanoid Blue = species 1 Red = species 2

15 No overlap - no problems: Partial overlap - separate: Complete overlap - merge: Resolve overlapping clusters

16 Inparalog score Score for inparalog P = (scoreAP - scoreAB) / (scoreAA - scoreAB) 0 20406080 100% A P B

17 Confidence values for main orthologs from sampling TVHIVDDEEPVR---KSLAFM---LTMNGFA T+ ++DD +R K L M +T+ G A TILLIDDHPMLRTGVKQLISMAPDITVVGEA Sampling with replacement; insertions kept intact GAFDEP---LVTHVR.......... GA + ++T +R GAEEHMAPDILTLLR.......... “Bootstrap alignment” -> “bootstrap score” Confidence = (bootstrap alignments best-best matches / nr of bootstraps)

18 http://inparanoid.cgb.ki.se

19 inparanoid.cgb.ki.se Remm et al, J. Mol. Biol. 314:1041-1052 (2001) Homo Sapiens vs. C. elegans

20 Ortholog group sizes, human vs X

21 Nr of inparalogs per ortholog group SpeciesAvg. inparalogs in model organism ortholog groups Avg. inparalogs in human ortholog groups Mouse1.361.56 Fly1.772.75 Worm1.443.13 Mustard weed3.733.33 Yeast1.263.34 E. coli1.733.57

22 No guarantee that the same segment is used in different sequences No evolutionary distance model Does not take multiple domains into account Drawbacks of Blast-based orthology assignment

23 Domain orthology Inparanoid Human-Fly ortholog pairs with domains in Pfam-A 13.0: 20335 Different domain architectures: 5411 –Many of these are minor differences, e.g. 22 vs 21 Spectrin repeats –Sometimes the difference is big: ef-handUCH TBCUCH

24 Tree-based approaches

25 Distance-based tree building Bootstrapping: –randomly pick columns to bootstrap alignment, calculate tree –Repeat 1000 times, frequency of node = bootstrap support A2A3 A148 A210 A1 A2 A3 1 3 5 2 A1 MKFYSLPNFPEN A2 MKYYKLPDLPDE A3 MRFYTACENPRS Distance matrix

26 Orthology by tree reconciliation Species tree Gene tree Infer 2 duplications and 2 losses

27 Assumption that the species tree is fully known Does not give confidence values Gene trees become unreliable when involving a lot of sequences (more data -> less certainty) Computationally expensive Drawbacks of tree reconciliation for orthology assignment

28 Partial tree reconciliation Find pairwise orthologs by computer parsing of tree.

29 99 45 85 100 82 99 C14F5.4 AAF49194.1 AH6.2 F37H8.4 Y6E2A.9 C47D12.3 T04F8.1 AAF52138.1 PIR-S67168 Pairwise orthology confidence by ‘orthostrapping’ The original tree with bootstrap support values

30 C14F5.4 AAF49194.1 AH6.2 F37H8.4 Y6E2A.9 C47D12.3 T04F8.1 AAF52138.1 PIR-S67168 Pairwise orthology confidence by ‘orthostrapping’ 01 C14F5.4 10 T04F8.1 00 C47D12. 3 00 Y6E2A.9 00 F37H8.4 00 AH6.2 AAF52138.1AAF49194.1 Fly Worm

31 C14F5.4 AAF49194.1 AH6.2 F37H8.4 Y6E2A.9 C47D12.3 T04F8.1 AAF52138.1 PIR-S67168 Pairwise orthology confidence by ‘orthostrapping’ 02 C14F5.4 20 T04F8.1 10 C47D12. 3 00 Y6E2A.9 00 F37H8.4 00 AH6.2 AAF52138.1AAF49194.1 Fly Worm

32 99 45 85 100 82 99 C14F5.4 AAF49194.1 AH6.2 F37H8.4 Y6E2A.9 C47D12.3 T04F8.1 AAF52138.1 PIR-S67168 Pairwise orthology confidence by ‘orthostrapping’ 099 C14F5.4 980 T04F8.1 810 C47D12. 3 770 Y6E2A.9 770 F37H8.4 770 AH6.2 AAF52138.1AAF49194.1 Fly Worm

33 orthostrapper.cgb.ki.se

34 Orthology is not transitive! Multiple species at different distances may give erroneous groups, that includes out-paralogs

35 Orthology is not transitive! -> Orthology strictly defined for only 2 species/clades Combining species of different distances is very dangerous But OK to combine multiple equidistant ones Y H1 D1 H2 D2 D1D1 H2 Y

36 Domain-level orthology

37 HOPS - Hierarchy of Orthologs and Paralogs eukaryota metazoa viridiplantae fungi nematoda arthropoda chordata 1.All species in Pfam are bundled in groups according to scheme: 2.Apply Orthostrapper to groups at same level in Pfam families 3.Display results in NIFAS

38 Pfam

39 Pfam in brief: Profile-HMM HMMer-2.0 FULL alignment Search database Manually curatedAutomatically made SEED alignment representative members Description file Release 13.0 (April 2004): –7426 families Pfam-A domain families –Based on 1160000 sequences (Swissprot & Trembl) –21980 unique Pfam-A domain architectures –73% of all proteins have >=1 Pfam-A domain

40 HOPS results Pfam 10, 6190 families: 2450 families (40%) have HOPS orthologs 1319 families (21%) have HOPS orthologs in all 6 pairwise comparisons 286356 pairwise orthology assignments (> 75% orthostrap) Storm and Sonnhammer, Genome Research 13:2353-2362 (2003)

41 Ways to access HOPS NIFAS graphical browser By sequence ID at Pfam.cgb.ki.se/HOPS Flatfiles (Orthostrap tables of 2 clades)

42 Pfam.cgb.ki.se/HOPS

43

44

45 Evolution of Domain Architectures NIFAS:

46 ATP sulfurylase /APS kinase

47 Orthologous shuffled domains? ATP sulfurylase domain, metazoa vs fungi

48 APS kinase domain

49 HOPS orthologs of PPS1_HUMAN (ATP sulfurylase/APS kinase)

50 Summary of ATP sulfurylases/APS kinases: Shuffled non-orthologous domains Fungi Metazoa

51 Conclusions Orthologs can be detected by –Blast: fast –tree: slow but less error-prone Species at different evolutionary distances should not be combined in orthology analysis Inparanoid and Orthostrapper were designed to find inparalogs but not outparalogs HOPS/NIFAS can be used to find domain orthologs and analyze domain architecture evolution

52 Future perspectives Multiparanoid – multiple species merging of pairwise Inparalogs. Functional divergence among inparalogs

53 Acknowledgments –Christian Storm –Maido Remm –Andrey Alexeyenko –Volker Hollich –Mats Jonsson http://sonnhammer.cgb.ki.se


Download ppt "Orthology Analysis Erik Sonnhammer Center for Genomics and Bioinformatics Karolinska Institutet, Stockholm."

Similar presentations


Ads by Google