Mike Arnoult 9/30/2010 The role of Artificial Neural Networks in Phage Research
What is an Artificial Neural Network? Mathematical and computational model Motivated by biological neurons Trained by using features to learn patterns and commonalities Uses values of its neuron connections to classify an example The neural network can be trained to recognize features of phage proteins, and distinguish between them. I have trained ANNs to recognize and classify phage major capsid proteins Why Apply Artificial Neural Networks to Phage Research?
What is a Bacteriophage? A virus that infects bacteria The most common biological entity on earth A major impact on any environment with Bacteria A type of virus with a highly unique structure, which injects its genome into a host, through its tail A possible alternative to Antibiotics in medicine
How the ANN works:
Why Apply Artificial Neural Networks to Bioinformatics? The Neural Network can be trained to recognize features of proteins, and distinguish between them. In my research, I will train Neural Networks to recognize phage major capsid or tail proteins.
What I’ve done so far: I’ve collected Positive and Negative Data sets from NCBI Positive data sets included Phage Major Capsid Proteins and synonyms: Major Shell Protein Major Head Protein Major Coat Protein Major Procapsid Protein Major Prohead Protein… Negative data sets included phage proteins unrelated to Major capsid proteins Packaging proteins Spike proteins DNA and RNA Polymerase Assembly proteins Contractile Sheath proteins
What I’ve done so far: I have written and used Perl scripts to filter the Training Data Any sequences with conspicuously incorrect GenPept annotations were removed from the positive data-set. All sequences with Major Capsid Protein related annotations were removed from the negative data-set.
What I’ve done so far: I’ve turned the sequences into percent compositions of Amino Acids and side-chain groups, to Train Neural Networks The positive entries are labeled with a 1 and the negative entries are labeled with a –1. Using a Matlab Script, a random 20% of the positive data-set is set aside and used as a test set against the other 80%.
What I’m doing now: To find which criteria are best suited to Training the Neural Network to recognize Phage Major Capsid Proteins… I am training neural networks using different characteristics of Amino Acid side-chains (Polar, Nonpolar, Aromatic, Positive and Negative) Adjusting parameters of the way the Matlab script trains Neural Networks.
Classification of Known Sequences: The values are average percentages of correctly classified sequences, of 1000 separately trained Neural Networks. Amino Acid and Side- chain Percent Compositions used as features Amino Acid Percent Compositions used as features No Side chains %
What I’m going to do Soon: Test The Neural Networks using other Phage Major Capsid Proteins Ramy’s curated Phage Major Capsid Proteins Eventually verify the Neural Network predictions in the lab.
THE END