Download presentation
Presentation is loading. Please wait.
Published byMercy O’Connor’ Modified over 9 years ago
1
Matching Protein -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB2000), pp. 25-36 P. Baldi, G. Pollastri, C. Anderson, and S. Brunak Cho, Dong-Yeon
2
Introduction Prediction of the Secondary Structure of Proteins Understanding their three dimensional conformations -helices are built up from one contiguous region of the polypeptide chain. -sheets are built up from a combination of several disjoint regions. Previous Studies The best existing methods for predicting protein secondary structure achieve prediction accuracy in 75-77% range. -sheet is almost invariably the weakest category in terms of correct percentages. Prediction of Amino Acid Partners in -sheets
3
Data Preparation Selecting the Data 826 protein chains from the PDB select list of June 1998 Assigning -sheets Partners A2-B2 A3-B3 B2-C2 B3-C3 C2-D2 C3-D3
4
Statistical Analysis First Order Statistics The frequency of occurrence of each amino acid General amino acid frequencies in the data Amino acid frequencies in -sheets
5
The ratio of the frequencies in -sheets over data
6
Second Order Statistics The conditional probabilities P(X|Y) of observing a X knowing that the partner is Y in a -sheet
7
Logo representation
8
Length Distribution Interval distances between paired -strands, measured in residue positions along the chain
9
Artificial Neural Network Architecture Feedforward Neural Network Large input windows They tend to dilute sparse information present in the input that is really relevant for the prediction. Two-window approach One can either provide the distance information as a third input to the system or one can train a different architecture for each distance type.
10
The architecture Two input windows of length W The number D of amino acid is also given as an input unit to the architecture with scaled activity D/100. The goal is to output a probability reflecting whether the two amino acids located at the center of each window are partners or not.
11
Recurrent Neural Network Bi-directional recurrent neural network (BRNN) Input layer Forward and backward Markov chain Output layer
12
Experiments and Results Data Randomly split the data 2/3 for training and 1/3 for test Extremely unbalanced At each epoch, all the 37008 positive examples are presented with 37008 randomly selected negative examples. The total balanced percentage is the average of the two percentages obtained on the positive and negative examples.
13
Results Feedforward neural network The best architecture
14
The predicted second order statistics
15
Five-fold cross validation BRNN Architecture Three values (7, 9, and 11) are used as the size of two input windows. Length 7 yields again the best performance.
16
Five-fold cross validation Ensemble architecture The ensemble of 3 BRNNS Five-fold cross validation
17
Summary of all the five-fold cross validation results Profile approach The profile approach was used as input to the artificial neural network. The overall performance is comparable, but not any better. Profiles may provide more robust first order statistics, but weaker intrasequence correlation.
18
Discussion We have developed a NN architecture that predicts -sheet amino acid partners with a balanced performance close to 84% correct prediction. It is insufficient by itself to reliably predict strand pairing because of the large number of false positive predictions. Some of directions for future work Profiles on the BRNNs Reduce the number of false positive predictions Improve the quality of the match Use of raw sequence information in addition to profiles -sheet predictor Various combinations of the present architectures
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.