Download presentation
Presentation is loading. Please wait.
Published byMaude Ford Modified over 9 years ago
1
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science MSCBB 2007 Acknowledgements : This research was supported in part by a grant from the National Institutes of Health (GM066387) to Vasant Honavar and Drena Dobbs, an Integrative Graduate Education and Research Training (IGERT) fellowship to Fadi Towfic, funded by the National Science Foundation grant (DGE 0504304) to Iowa State University, and a Bioengineering and Bioinformatics Summer Institute (BBSI) fellowship to David Gemperline, funded by a National Science Foundation award (EEC 0608769) to Iowa State University. This work has benefited from discussions with Dr. Robert Jernigan of Iowa State University. Prediction of RNA-Protein interfaces Using Structural Features Fadi Towfic, David C. Gemperline, Cornelia Caragea, Feihong Wu, Drena Dobbs, and Vasant Honavar Abstract RNA-protein interactions play a critical role in gene expression: From splicing to translation, proteins must be able to recognize and interact with specific sites of RNA in order to perform their respective functions. In this paper, 147 different chains from RNA-binding proteins in the Protein Databank were characterized according to multiple structural features and the type of RNA bound to each protein chain. Furthermore, Naive Bayes classifiers were constructed to predict protein-RNA interfaces on the surface residues of the proteins. The three structural features used in this study were surface roughness, solid angle and CX value. Dataset and Classification The protein chains in the RB147 dataset available from the RNAbindr website (http://bindr.gdcb.iastate.edu/) were classified according to the type of RNA bound by each chain. Each type of RNA was then clustered using ANOVA as described by Towfic et al. (Towfic et al., 2007) as shown in Table 1. A Naïve- Bayes classification algorithm with 10-fold cross-validation with a window size of 12 (Witten and Frank, 2005) was then used to classify each of the groups shown in table 1. A possible reason for the aforementioned discrepancy is that the preliminary clustering using ANOVA may have not been sophisticated enough to identify subclusters that lie within each group. The poor clustering may have contributed to the poor classification performance by Naïve Bayes. However, it is appropriate to note that each of the structural features had at least one cluster where classification performance was increased compared to the “No clustering” baseline. This result demonstrates the potential of using more sophisticated clustering as well as classification algorithms to improve the performance of RNA-protein interface prediction algorithms. Structural FeatureGroup 1Group 2Group 3 CX Value (Alpha Carbon) tRNA, mRNA snRNA, rRNA, dsRNA, other siRNA, SRP RNA, Viral RNA Roughness Value tRNA, SRP RNA, snRNA, rRNA, dsRNA Viral RNA, siRNA, mRNA, other Solid Angle ValuetRNA, SRP RNA, snRNA, rRNA, dsRNA Viral RNA, siRNA, mRNA, other Table 1: Clustering of each RNA-binding type based on ANOVA analysis of the propensities for each chain. Method/GroupAccuracy Correlation Coefficient Sensitivity+Specificity+ CX Value (Alpha Carbon)– No clustering 0.7240.1960.3460.399 CX Value (Alpha Carbon)– Group 1 0.7120.0170.2350.142 CX Value (Alpha Carbon)– Group 2 0.6450.2040.4080.524 CX Value (Alpha Carbon)– Group 3 0.513-0.0450.4010.088 Roughness Value–No clustering 0.7360.2120.3330.425 Roughness Value–Group 10.7200.2360.3620.477 Roughness Value–Group 20.7940.0440.1710.153 Solid Angle Value–No clustering 0.7000.1940.3570.428 Solid Angle Value–Group10.7090.2500.3740.517 Solid Angle Value– Group20.7220.0480.2630.167 Table 2: Comparison of the performance of the Naïve Bayes classifier with and without clustering. Results As shown in table 2, the clustering of the RNA types seems to improve the prediction accuracy, correlation, sensitivity and specificity in some cases (alpha carbon group2, roughness value group 1, solid angle value group 1) while contributing to poor performance in others (alpha carbon group 3, roughness value group 2, solid angle value group 2) compared to the classifiers that do not use clustering. References F. Towfic, D. C. Gemperline, C. Caragea, F. Wu, D. Dobbs, and V. Honavar. Structural Characterization of RNA-Binding Sites of Proteins: Preliminary Results. Computational Structural Bioinformatics Workshop proceedings, 2007. In Press. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, 2 nd Edition, Morgan Kaufmann, 2005
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.