Presentation is loading. Please wait.

Presentation is loading. Please wait.

M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.

Similar presentations


Presentation on theme: "M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer."— Presentation transcript:

1 M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer Stockholm

2 Homologs: orthologs and paralogs M ulti P aranoid Homologs: genes that have descended from a common ancestral gene. Manifested by a sequence similarity. We do not believe in sequence similarity without a shared ancestry. Gene 1 Gene2 BLAST hit. Low e-value Ancestral gene Orthologs are related via a speciation S Paralogs are related via a gene duplication. May or may not be in the same species D

3 Homologs: orthologs and paralogs M ulti P aranoid Inparalogs ~ co-orthologs paralogs that were duplicated after the speciation and hence are orthologs to the other species’ genes Outparalogs = not co-orthologs paralogs that were duplicated before the speciation Orthology, paralogy and proposed classification for paralog subtypes Sonnhammer ELL and Koonin EV Trends in Genetics Volume 18, Issue 12Trends in Genetics Volume 18, Issue 12, 1 December 2002, Pages 619-620

4 Orthologs for functional genomics M ulti P aranoid Orthologs are more likely than outparalogs to have identical/similar biochemical functions and biological roles Orthologs are optimal to discover gene function via model organism counterparts Benchmarking ortholog identification methods using functional genomics data. Hulsen T, Huynen MA, de Vlieg J, Groenen PM. Genome Biol. 2006;7(4):R31. Epub 2006 Apr 13. “…the InParanoid program is the best ortholog identification method in terms of identifying functionally equivalent proteins.”

5 Outline M ulti P aranoid 1.InParanoid 2.The world of ortholog resources 3.Why MultiParanoid 4.Limitations 5.Future development

6 Homologs: orthologs and paralogs M ulti P aranoid D Orthologs Outparalogs S S D Inparalogs

7 InParanoid M ulti P aranoid P r o t e o m e A P r o t e o m e B Automatic clustering of orthologs and in-paralogs from pairwise species comparisons Maido Remm, Christian E. V. Storm and Erik L. L. Sonnhammer Journal of Molecular Biology 314, 5 Journal of Molecular Biology 314, 5, 14 December 2001, Pages 1041-1052 Reciprocally best hits ~ seed orthologs Inparalogs

8 Resources using InParanoid Eukaryotic Ortholog Groups 3409 diseases M ulti P aranoid

9 Multi-species ortholog resources Clusters of Orthologous Groups HOVERGEN release 47 “Massive download” friendly: Tree-based, best for detailed analysis

10 M ulti P aranoid S S S D D D Any cluster of more than 2 species’ genes is controversial in terms of orthology as the speciation gives rise to a pair of species.

11 MultiParanoid algorithm M ulti P aranoid 1. Take >2 species with maximally close speciation points 2. Generate 2-species InParanoid clusters A-B B-C A-C ? InParanoid cluster B-C InParanoid cluster A-B InParanoid cluster A-C 3. Find protein counterparts across the clusters

12 M ulti P aranoid However: tree conflicts Fly Worm Human Genes: MultiParanoid validation The MultiParanoid output was benchmarked on a manually curated set of 221 human-fly-worm clusters: - 214 MultiParanoid clusters found - 177 (almost) identical -The rest controversial mainly due to: - differences between pairwise and multiple alignments - the curator’s perception and InParanoid settings InParanoid cluster membership

13 M ulti P aranoid MultiParanoid vs. and

14 M ulti P aranoid Current MultiParanoid release C.elegans H.sapiens C.intestinalisD.melanogaster ??? 40451 protein sequences classified into 7695 clusters http://multiparanoid.cgb.ki.se/

15 A solution: expansion of MultiParanoid clusters M ulti P aranoid 1. Process all the possible 3-species combinations: 2. Merge respective cluster members across the clades:

16 M ulti P aranoid But still, orthology is a pairwise concept! The speciation gives rise to a pair of species.

17 M ulti P aranoid Post-processing (bootstrap, synteny, tree manual curation etc.) Cluster size ~ outparalogs/orthologs ratio HOVERGEN release 47 How the ortholog resources cope with it?

18 Overview and comparison of ortholog databases Alexeyenko A, Lindberg J, Pérez-Bercoff Å, Sonnhammer ELL Drug Discovery Today:Technologies (2006) v. 3; 2, 137-143 M ulti P aranoid EGO COG/KOG HomoloGene InParanoid/MultiParanoid HOPS KEGG OrthoMCL ENSEMBL Compara PhiGs MGD HOGENOM HOVERGEN INVHOGEN TreeFam OrthologID

19 How to reconcile… M ulti P aranoid …the demand for multi-species clusters and pair-wise gene relations? The common feature is a single ancestor gene at the root point: S S S D D D D

20 M ulti P aranoid Cluster of pseudo-inparalogs: a within-clade gene family Pseudo-proteome: a union of proteomes of the same clade 2 new terms:

21 M ulti P aranoid P s e u d o – p r o t e o m e A (reptiles) P s e u d o – p r o t e o m e B (mammals)

22 M ulti P aranoid S S S D D D Another view: “gene-family”-wise: … and all the members of the same cluster ascend to a single gene in the last common ancestor (LCA) of the two major clades LCA

23 Having more than one species in a pseudo-proteome reduces mis- assignments in case of gene loss. Closer pseudo-proteomes increase resolution. Lineage(~pseudo-proteome)-specific expansions should be also available S S S S D D D M ulti P aranoid Orthologs The clustering can be done at different levels For example: Fungi vs. animals Insects vs. mammals Rodents vs. primates

24 Conclusions M ulti P aranoid Most of the ortholog resources may build clusters in form of gene trees, but only InParanoid seems to correctly delineate ortholog/inparalog groups MultiParanoid algorithm has relieved the problem of “hidden outparalogs”, but the number/content of species remains limited The “ LCA-Paranoid ” concept: the long waited solution? –Each of the two clade-specific cluster parts may be regarded as a multi- species cluster –When (in future) all possible “clade clade” clustering solutions will be found, each gene would receive a complete set of orthologs at a desirable level of LCA –With sufficient number of complete proteomes, it would be possible to date each gene pair’s point of divergence


Download ppt "M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer."

Similar presentations


Ads by Google