Scratch Protein Predictor Result Q:S and percent identity with Lore The Use of Bioinformatics Tools to Predict the Functions of Hypothetical Proteins from Pham 6637 Cluster AN of Bacteriophages Guynup, Taylor; Reyes, Andrea; Chang, Joseph Abstract Results Materials and Methods Used Phagesdb to find genomic sequences of different genomes of Pham 3367 of the AN cluster This study was conducted to attempt to discover the function of proteins whose functions were previously unknown with the use of bioinformatics tools on the internet. The problem here is that since the protein functions are unknown, there is a very slight chance that any of the websites/bioinformatics tools already available will be able to detect the function. To solve the problem and answer the question at hand, Lore_14, Jessica_15, StewieGriff_14, and Toulouse_13 were ran through scratch protein predictor, NCBI, and TMHMM. The hopes were that these tools would be able to determine what the function of such genes would be without having to enter the wet lab and do the work for it. Afterwards, the same methods were applied to singletons from BrockDraft and genes 36, 37, and 39 were analyzed. All in all, this research helped prove that the majority of hypothetical proteins need to be taken to a wet lab to have their function determined since the bioinformatics tools were not as reliable as thought. Bacteriophage NCBI Results TMHMM Results Scratch Protein Predictor Result Q:S and percent identity with Lore Lore 14 Hypothetical Protein SEA_JESSICA [Arthrobacter phage Jessica] E value: 0.0 Negative Capsid Sequence: Yes (.994914) Tail Sequence: Yes (.141622) 1:1 100% Toulouse 13 Minor Tail Protein: [Arthrobacter phage Toulouse] Capsid Sequence: Yes (1.206675) Tail Sequence: No (-0.032368) 1:2 92% Jessica 15 Minor Tail Protein [Arthrobacter phage Jessica] Capsid Sequence: Yes (1.156418) Tail Sequence: No (-0.092472) StewieGriff 14 2:1 Ran genomic sequences through NCBI blast to find protein function Ran genomic sequence through: Scratch Protein Predictor: predicts tail or capsid protein TMHMM: predicts transmembrane proteins Compared NCBI blast results and results from the different data bases Introduction Phage genome mapping is a relatively new subject in biology, but it has profound applications in the field. Inherently, a vital aspect of this subject is discovering phage gene functions. Before its function is verified, a potential gene is labeled a “hypothetical protein.” As more genomes are discovered, similar genomes are grouped into clusters, whereas the genomes without other known related genomes are called singletons. Currently, there are several bioinformatics prediction programs that infer relationships between unknown and known genes, but it is often the human’s responsibility to determine a gene’s function based on results from those tools, as there are often other factors those programs do not include in calculations. Nevertheless, protein function prediction programs are constantly improving, and we are interested in evaluating their reliability in predicting functions of hypothetical proteins. Compared the functions of the different Pham members to see commonalities Discussion From the data collected, seen in Figure 1, it can be predicted that Lore 14 will be a minor tail protein. The negative results through TMHMM predicted that Lore 14 and the rest of the Pham 3367 proteins were not transmembrane proteins. StewieGriff 14 supports Lore 14 as being a Minor Tail Protein both through NCBI 1:1 score and the Scratch Protein Predictor Results. Scratch Protein Predictor has a 90% to 97% accuracy and uses protein amino acid composition, the secondary structure, and alignment contact fragments of both capsid and tail proteins. Both Jessica 15 and Toulouse 13 support Lore 14 as a minor tail protein in NCBI Q:S scores but not through Scratch Protein Predictor. The scores for Jessica and Toulouse are not far enough away from the average sequence therefore it does not meet the criteria to be a tail protein. The use of Bioinformatics tools can be used to help predict the functions of hypothetical proteins, in addition, the integration of Homology-based inference can help limit down options that a protein could be. Homology-based inference has been used before, using both BLAST and e values to compare proteins (Hamp 2013). Figure 1 Application of Methods to Singletons BRock_Draft Cluster Singleton Streptomyces Bacteriophage Gene 36 : Pham 22979; 1503 base pairs; 17538 to 16036 (reverse gene) NCBI Results: Hypothetical protein [Streptomyces phage BRock] TMHMM results: Negative Scratch Protein predictor results: Capsid sequence: Yes (0.585782) Tail Sequence: No (0.324156) Identity with Lore: No significant similarity found Awknoledgements Special Thanks to Dr. Tamarah Adair and and Lathan Lucas for advisement throughout the research process Works Cited Figure 2l: Phamerator Map of BRock Hamp, T., Kassner, R., Seemayer, S., Vicedo, E., Schaefer, C., Achten, D., … Rost, B. (2013). Homology-based inference sets the bar high for protein function prediction. BMC Bioinformatics, 14(Suppl 3), S7. http://doi.org/10.1186/1471-2105-14-S3-S7