Alignment of primary structure is the basis of detection of putative homologous proteins. The software BLAST is the most popular and efficient tool for calculation of these alignments. This software prints a value (called by E-value) that represents the number of alignments which scores equal or better than those obtained without homology relationship. However, when score is under a certain limit (< 100), the E-value will not achieve a trustable value. In this work we used the pair-wise overlapping of secondary structure to build clusters of homolog groups, using as a model the COG of Superoxide dismutase (COG0605) bearing 74 sequences, enriched with UniRef50 members. HIERARCHICAL CLUSTERING OF HOMOLOG GROUPS BASED ON OVERLAP OF SECONDARY STRUCTURE Oto Coelho Jr., Lucas Santos, Adriano Silva, Izabella Pena, J Miguel Ortega¹ Supported by UFMG INTRODUCTION Metodology Results A total of 551 sequences were submitted to the prediction of secondary structure with the software SSPro4, resulting in output files using the alphabet of H: alpha-helix, E: beta-strand and C: unstructured coil. Predictions were pair-wised aligned (Clustal/W) and used to calculate the Structural overlap (SOV). The SOV result is a percentage of secondary structure overlap. After this, a local implementation of hierarchical clustering (Average and Simple Linkage politic) was used to create clusters (named CSO after clusters of structural overlap) over the filtered similarity values in a cutoff of 70%. In a Simple Linkage politic (considering the maximum SOV value between sequences) was able to create 15 clusters containing from two up to 224 members. In a Average Linkage politic (considering the average SOV between sequences) Using Sov values, a local implementation of hierarchical clustering was able to create 6 clusters containing from one up to 510 members We also calculated the hierarchical clustering in simple and average politics for four other groups of proteins: Ferredoxin, Peroxiredoxin, Hydrogenase Maturation Factor and Transcription Elongation Factor. 1. Laboratório de Biodados. Departamento de Bioquímica e Imunologia, ICB-UFMG Figure 3. Peroxiredoxin clusters using a simple and average linkage politic of hierarchical clustering. Figure 2. Ferredoxin clusters using a simple and average linkage politic of hierarchical clustering. Figure 4. Simple and Average politics of hierarchical clustering for Hydrogenase maturation factor group. Figure 5. Simple and Average politics of hierarchical clustering for Transcription elongation factor. We evaluated the clustering procedure by analyzing the CSO in respect of their UniRef_50 clusters composition. We found that CSO0605_1 (arrow) gathered members of 22 different UniRef50 clusters. Conversely, Seed Linkage software and PSI-BLAST grouped, respectively, 541 and 531 sequences.Conclusions Thus, hierarchical clustering of secondary structure represents a novel and more stringent clustering procedure than COG, Seed Linkage and PSI-BLAST although not so stringent as UniRef50 procedure. These pictures below show two representants of the superoxide dismutases group that have resolved structures in the PDB database. These proteins are aparently similar, but they are in a different clusters. Figure 6. Seed Linkage in Ferredoxin goup. Figure 7. Two proteins of Superoxide dismutases group that are in different clusters.. Figure 1. Superoxide dismutase clusters using a simple and average linkage politic of hierarchical clustering.