Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.

Slides:



Advertisements
Similar presentations
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin and Vasant Honavar. BigData2013.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Text Classification With Support Vector Machines
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
1 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting A Computational.
1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.
Cancer classification using Machine Learning Techniques on Microarray Data Yongjin Park 1 and Ming-Chi Tsai 2 1 Department of Biology, Computational Biology.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
1 correlating graph-theoretical centrality indices with interface residue propensity or: where do things stick together? Stefan Maetschke Teasdale Group.
Bioinformatics and Computational Biology Graduate Program Carla Mann December 11, 2014 Rocky Mountain Bioinformatics Conference Snowmass, CO RNABindRPlus.
Problem Statement and Motivation Key Achievements and Future Goals Technical Approach Investigators: Yang Dai Prime Grant Support: NSF High-throughput.
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
The virochip (UCSF) is a spotted microarray. Hybridization of a clinical RNA (cDNA) sample can identify specific viral expression.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris Lin, Neeraj Koul, and Vasant.
Analysing Microarray Data Using Bayesian Network Learning Name: Phirun Son Supervisor: Dr. Lin Liu.
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2
Predicting MicroRNA Genes and Target Site using Structural and Sequence Features: Machine Learning Approach Malik Yousef Institute of Applied Research,
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
Center for Computational Intelligence, Learning, and Discovery Artificial Intelligence Research Laboratory Department of Computer Science Supported in.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
CSBSI 2007 Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Department of Computer Science Generating.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
B IOINFORMATICS AND C OMPUTATIONAL B IOLOGY A Computational Method to Identify RNA Binding Sites in Proteins Jeff Sander Iowa State University Rocky 2006.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Bioinformatics and Computational Biology
COMPUTATIONAL BIOLOGIST DR. MARTIN TOMPA Place of Employment: University of Washington Type of Work: Develops computer programs and algorithms to identify.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Introduction Hereditary predisposition (mutations in BRCA1 and BRCA2 genes) contribute to familial breast cancers. Eighty percent of the.
 Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues  Results show that.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
يادگيري ماشين Machine Learning Lecturer: A. Rabiee
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
Cluster Analysis Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Data Mining Concepts and Techniques Course Presentation by Ali A. Ali Department of Information Technology Institute of Graduate Studies and Research Alexandria.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
Integrating Bioinformatics and Biochemistry Research into Middle School Physical Sciences Lessons Ricardo Sanchez 1, Brianna Rojas 2, and Dr. Jamil A Momand.
Discovery and Dissemination
Introduction to translational and clinical bioinformatics Connecting complex molecular information to clinically relevant decisions using molecular.
Protein Synthesis Part 3
Artificial Intelligence Research Laboratory
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Protein Synthesis Part 3
Ontology-Based Information Integration Using INDUS System
Discovery and Dissemination

Protein Synthesis Part 3
Sequential Hierarchical Clustering
Deep Learning in Bioinformatics
mRNA adenosine-to-inosine editing increases under DR.
Presentation transcript:

Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science MSCBB 2007 Acknowledgements : This research was supported in part by a grant from the National Institutes of Health (GM066387) to Vasant Honavar and Drena Dobbs, an Integrative Graduate Education and Research Training (IGERT) fellowship to Fadi Towfic, funded by the National Science Foundation grant (DGE ) to Iowa State University, and a Bioengineering and Bioinformatics Summer Institute (BBSI) fellowship to David Gemperline, funded by a National Science Foundation award (EEC ) to Iowa State University. This work has benefited from discussions with Dr. Robert Jernigan of Iowa State University. Prediction of RNA-Protein interfaces Using Structural Features Fadi Towfic, David C. Gemperline, Cornelia Caragea, Feihong Wu, Drena Dobbs, and Vasant Honavar Abstract RNA-protein interactions play a critical role in gene expression: From splicing to translation, proteins must be able to recognize and interact with specific sites of RNA in order to perform their respective functions. In this paper, 147 different chains from RNA-binding proteins in the Protein Databank were characterized according to multiple structural features and the type of RNA bound to each protein chain. Furthermore, Naive Bayes classifiers were constructed to predict protein-RNA interfaces on the surface residues of the proteins. The three structural features used in this study were surface roughness, solid angle and CX value. Dataset and Classification The protein chains in the RB147 dataset available from the RNAbindr website ( were classified according to the type of RNA bound by each chain. Each type of RNA was then clustered using ANOVA as described by Towfic et al. (Towfic et al., 2007) as shown in Table 1. A Naïve- Bayes classification algorithm with 10-fold cross-validation with a window size of 12 (Witten and Frank, 2005) was then used to classify each of the groups shown in table 1. A possible reason for the aforementioned discrepancy is that the preliminary clustering using ANOVA may have not been sophisticated enough to identify subclusters that lie within each group. The poor clustering may have contributed to the poor classification performance by Naïve Bayes. However, it is appropriate to note that each of the structural features had at least one cluster where classification performance was increased compared to the “No clustering” baseline. This result demonstrates the potential of using more sophisticated clustering as well as classification algorithms to improve the performance of RNA-protein interface prediction algorithms. Structural FeatureGroup 1Group 2Group 3 CX Value (Alpha Carbon) tRNA, mRNA snRNA, rRNA, dsRNA, other siRNA, SRP RNA, Viral RNA Roughness Value tRNA, SRP RNA, snRNA, rRNA, dsRNA Viral RNA, siRNA, mRNA, other Solid Angle ValuetRNA, SRP RNA, snRNA, rRNA, dsRNA Viral RNA, siRNA, mRNA, other Table 1: Clustering of each RNA-binding type based on ANOVA analysis of the propensities for each chain. Method/GroupAccuracy Correlation Coefficient Sensitivity+Specificity+ CX Value (Alpha Carbon)– No clustering CX Value (Alpha Carbon)– Group CX Value (Alpha Carbon)– Group CX Value (Alpha Carbon)– Group Roughness Value–No clustering Roughness Value–Group Roughness Value–Group Solid Angle Value–No clustering Solid Angle Value–Group Solid Angle Value– Group Table 2: Comparison of the performance of the Naïve Bayes classifier with and without clustering. Results As shown in table 2, the clustering of the RNA types seems to improve the prediction accuracy, correlation, sensitivity and specificity in some cases (alpha carbon group2, roughness value group 1, solid angle value group 1) while contributing to poor performance in others (alpha carbon group 3, roughness value group 2, solid angle value group 2) compared to the classifiers that do not use clustering. References F. Towfic, D. C. Gemperline, C. Caragea, F. Wu, D. Dobbs, and V. Honavar. Structural Characterization of RNA-Binding Sites of Proteins: Preliminary Results. Computational Structural Bioinformatics Workshop proceedings, In Press. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, 2 nd Edition, Morgan Kaufmann, 2005