Download presentation
Presentation is loading. Please wait.
1
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca Segall, and Peter Salamon, Department of Biology, San Diego State University Methods: Bacteriophages are the single most abundant biological entity on earth, and influence every environment in which bacteria exist. There are no current algorithms which reliably analyze phage structural protein sequences and predict their function. The research conducted allows the classification of phage structural proteins using Artificial Neural Networks, a computational method of analysis inspired by biological neurons. Features of phage protein sequences with known classifications were used to train neural networks, which then predict the specified function of unknown sequences. Analysis of the predictions will allow biologists to decide, with some accuracy, which proteins are the most appropriate candidates for their research needs. Background: Training of ANNs Annotation Data Filtering Conversion of Sequences to Quantitative Features Phage Major Capsid Protein Sequence Collection Major Capsid Proteins (MCPs) and Tail Proteins were obtained from the NCBI Refseq database using the keywords Phage, Proteins, and: Testing of ANNs using Phage Protein Sequences Training Set Manipulation Sequences with non-MCP/Tail annotations were removed from the positive data-set Sequences with MCP/Tail annotations were removed from the negative set Only positive MCP sequences greater than 300 Amino Acids in length and Tail sequences greater than 150 Amino Acids were used., Isoelectric Points One ANN input was one of four amino acid features that were translated into quantitative representations: Masses, Isoelectric Points, Hydrophobicity Ratings, and Volumes. A second input was the feature described above divided by the sequence length. 20 ANN inputs were represented by Amino Acid Percent Compositions. ANNs were trained according to each of the five features and combinations of two or more features. Architecture included one hidden layer with 100 neurons. One input neuron was used for each feature. Classification was executed on a test set containing a randomly selected 20% portion of Positive and Negative protein sequences, not included in training. Major Shell Coat Capsid Head Prohead Non-MCP/Tail sequences were also downloaded as Negative examples. Hidden Layer This research was funded in part by the NSF 0827278 UBM Interdisciplinary Training in Biology and Mathematics grant to AMS and PS. Procapsid Tail Neck Future Directions: 1.Find a distribution of positive and negative protein sequence examples appropriate for Phage or Bacterial genomes to improve classification ability 2.ClustalW analysis of Tail/Major Capsid proteins against ANN-tested false positive proteins from Bacterial genomes 3.Experimentally validate ANN predictions of unknown virus sequences a. Gene constructs of sequences predicted by ANNs to be MCPs or Tail proteins b. Gene expression in bacterial cells c. Visually verify potential Tail and Capsid proteins by Electron Microscopy Phage Major Capsid Proteins and Tail Proteins are distinguishable from other Phage Proteins by trained Artificial Neural Networks. Classification of the test sets reveal the ANN's ability to distinguish phage Major Capsid proteins more accurately than Tail proteins. Protein sequence sets may be contrasted by analyzing their combinations of physical features from the perspective of ANNs. Conclusions: Example of T4 Bacteriophage http://www.cosmosmagazine.com/node/1024 Output Layer Diagram of Artificial Neural Network with 5 input layer neurons, 5 hidden layer neurons, 1 output layer neuron. Results: Input Layer
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.