Download presentation
Presentation is loading. Please wait.
Published byBernard Horn Modified over 9 years ago
1
Identification of Ortholog Groups by OrthoMCL Protein sequences from organisms of interest All-against-all BLASTP Between Species: Reciprocal best similarity pairs Putative orthologs Within Species: Reciprocal better similarity pairs (Recent) paralogs Similarity cutoff: P-value % overlap
2
Similarity Matrix Markov Clustering Ortholog groups with (recent) paralogs Cluster tightness: Inflation values (I)
3
─22000 B2 220─0 150 B1 00─200 A2 0150200─ A1 B2B1A2A1 Species B Ortholog 150 Species A A2A1 Paralog 200 B1B2 Paralog 220 Similarity Matrix Similarity score
4
Markov Clustering (MCL) Algorithm Transition probability matrix Markov Matrix Matrix Inflation (entry powering) Matrix Expansion (matrix powering) Similarity Matrix Final matrix as clustering Terminate when no further change
5
Application of OrthoMCL to Plasmodium, human and other model organisms Plasmodium falciparum, Human, Arabidopsis, Worm, Fly, Yeast E. coli 6241 ortholog groups 160 all included 551 only Eukaryotes 1182 only Metazoa 24 only Plasmodium & Arabidopsis 114 Plasmodium Not human …
6
An Example of Gamma-tubulin Ortholog Group
8
Comparing OrthoMCL with INPARANOID ( two species) INPARANOID clusters both orthologs and in-paralogs from two species by pairwise similarity –Find two-way best hits from pairwise similarity scores as main ortholog pair –Add additional orthologs (in-paralogs) from the same species for each main ortholog by comparing similarity scores between the main ortholog with putative in-paralogs with the score between the main ortholog pair –Resolve overlapping groups by merging, deleting, dividing them based on a set of rules OrthoMCL can cluster orthologs and in-paralogs from multiple species
9
I. Yeast – Worm dataset (estimation ) Yeast: 6358 proteins Worm: 19774 proteins 4985 proteins: Yeast: 2283 Worm: 2702 1805 groups 4428 proteins: Yeast: 2158 Worm: 2270 INPARANOIDOrthoMCL I = ? ? (paralog groups?) 3931 same from both methods ? Coherent grouping
10
Contained groups ∩ OrthoMCL groupINPARANOID group ∩ OrthoMCL groupINPARANOID group Coherent groups = same groups + contained groups
11
Inflation (I) # groups # groups of paralogs % seqs with same grouping * % seqs with contained grouping* % seqs with coherent grouping * 2189215980.216.997.1 1.518578982.414.897.2 1.21814785.411.797.1 1.11811285.411.997.3 * Percentage of 3931 sequences identified by both OrthoMCL and Inparanoid Inflation value (I) regulates cluster tightness tight loose So, choose I = 1.1 as the optimal inflation value
12
Possible reasons for including different sequences OrthoMCLINPARANOID BLAST versionWU-BLASTNCBI-BLAST BLAST Search All-against-all, SEG filtered, fixed database size Pairwise Similarity cutoffP<1e-5 Score>=50bits Overlap > 50% Reciprocal “best” hits P-value, percent identity Score Recent paralogs Bi-directional better within-species similarity One-way better within-species similarity from orthologs
13
Yeast: 6358 proteins Worm: 19774 proteins 4985 proteins: Yeast: 2283 Worm: 2702 1805 groups 3949 proteins: Yeast: 1927 Worm: 2022 INPARANOIDOrthoMCL I = 1.1 1614 groups 3765 same from both methods 86.3% same groups 98.1% coherent groups Default parameters: Similarity cutoff: P-value 50% Cluster tightness: Inflation values I =1.1
14
II. Worm – Fly dataset (test) Worm: 19774 proteins Fly: 13288 proteins 10100 proteins: Worm: 5399 Fly: 4761 3988 groups 9623 proteins Worm: 4997 Fly: 4626 INPARANOIDOrthoMCL I = 1.1 3764 groups 8856 same from both methods 86% same groups 98% coherent groups In conclusion: OrthoMCL and INPARANOID have similar clustering behavior when comparing two species
15
Comparison of OrthoMCL with EGO (multiple species) III. Yeast – Worm – Fly dataset EGO: TC/NPProtein sequences BLASTP 4776 unique proteins formed 3125 unique groups 10260 seqs 4776 proteins Remove redundancy OrthoMCL: 12459 proteins formed 4033 groups
16
4392 same proteins from both 2.3% OrthoMCL contained in EGO 44.2% same groups 62% EGO contained in OrthoMCL 93.8% coherent groups
17
Hsc70-1 Hsc70-4 Fly SSA1 SSA2 SSA3 SSA4 Hsp-1 Worm Yeast An Example: EGO Groups contained by OrthoMCL Groups EGO : Hsp-1, Hsc70-4, SSA2 OrthoMCL: Hsp-1, Hsc70-1, Hsc70-4, SSA1, SSA2, SSA3, SSA4
18
Back to Apicomplexa … 5333 Proteins 1846 orthologous to the other 6 organisms 1693 orthologous to Arabidopsis 483 orthologous to E. coli 1421 orthologous to yeast 1771 orthologous to fly, worm or human 1824 non- orthologous to human
19
Summary OrthoMCL automatically delineates the many-to-many orthologous relationship across multiple eukaryotic genomes When applied to pairwise comparison of two species, the performance of OrthoMCL is comparable to INPARANOID which was designed for comparing two species When applied to multiple species and compared with EGO database, OrthoMCL tend to identify more orthologous genes The underlying object-based relational storage model permits integration with organismal data and queries based on user-defined species distribution provides a snapshot of shared/diversified biological processes across species
20
Related Posters and Reference 114A. Web-Based Biological Discovery using an Integrated Database. 146A. The Genomics Unified Schema (GUS). 170A. TESS-II: Describing and Finding Gene Regulatory Sequences with Grammars. Remm et al. Automatic Clustering of Orthologs and In- paralogs from Pairwise Species Comparisons. J.MOL.Biol. (2001) 314 Lee et al. Cross-Referencing Eukaryotic Genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res. (2002) 12 Enright et al. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. (2002) 30
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.