Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.

Slides:



Advertisements
Similar presentations
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Advertisements

Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin and Vasant Honavar. BigData2013.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Intelligent Systems and Software Engineering Lab (ISSEL) – ECE – AUTH 10 th Panhellenic Conference in Informatics Machine Learning and Knowledge Discovery.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Protein Tertiary Structure Prediction
Bioinformatics and Computational Biology Graduate Program Carla Mann December 11, 2014 Rocky Mountain Bioinformatics Conference Snowmass, CO RNABindRPlus.
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris Lin, Neeraj Koul, and Vasant.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
School of Pharmacy Medical University of Sofia
An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat.
Center for Computational Intelligence, Learning, and Discovery Artificial Intelligence Research Laboratory Department of Computer Science Supported in.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Bioinformatics and Computational Biology
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
 Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues  Results show that.
A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Li Lihong (Anna Lee) Cumputer science 22th,Apr.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Protein Folding recognition with Committee Machine Mika Takata.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Improving compound–protein interaction prediction by building up highly credible negative samples Toward more realistic drug-target interaction predictions.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Learning to Detect and Classify Malicious Executables in the Wild by J
Debesh Jha and Kwon Goo-Rak
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Bag-of-Visual-Words Based Feature Extraction
Hood College Master of Science in Bioinformatics (Proposed)
Pfam: multiple sequence alignments and HMM-profiles of protein domains
Introduction Feature Extraction Discussions Conclusions Results
Artificial Intelligence Research Laboratory
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Ontology-Based Information Integration Using INDUS System
Support Vector Machine (SVM)
Systems-wide Identification of cis-Regulatory Elements in Proteins
Presentation transcript:

Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science Sixth Annual Joint Bioinformatics Symposium 2006 Acknowledgements : This work is supported in part by grants from the National Science Foundation (IIS ), and the National Institutes of Health (GM ) to Vasant Honavar. Machine Learning Versus Profile-Based Methods for Protein Phosphorylation Site Prediction Yasser EL-Manzalawy, Cornelia Caragea, Drena Dobbs, and Vasant Honavar Prediction of Phosphorylation Sites-Motivation Protein phosphorylation, performed by protein kinases, is a very important process involved in signal transduction pathways. Predicting phosphorylation sites is an essential step towards understanding phosphorylation, which in turn, is essential in understanding diseases and, ultimately, designing drugs that can prevent or cure diseases. Phospho.ELM Data Set – a resource containing 1805 proteins from different species covering 1372 Tyr, 3175 Ser and 767 Thr experimentally verified phosphorylation sites manually curated from the literature. We constructed separate data sets for kinase families that are well represented in terms of the data available in the database (i.e., they are known to recognize more than 50 phosphorylation sites) (see Table 1) In this study, we empirically compare a number of Machine Learning (ML) and profile-based methods for predicting kinase-specific protein phosphorylation sites. Fig.1: Addition of a phosphate to an amino acid Table 1: Kinase families considered in our study and the number of Ser and Thr sites known to be phosphorylated Fig.2: Conformation changes caused by phosphorylation We propose a method for combining PSSM profiles and ML approaches. Our proposed method yields fast and simple classifiers that consistently outperform profile-based methods for predicting kinase-specific phosphorylation sites. KinaseCDKCK2MAPKPKAPKBPKC Ser Thr Total Sequence-Based Machine Learning Methods The set of features for each Ser or Thr is based on windows n amino acids (n=15) centered around each Ser or Thr residue. Encode each window as a 20*n binary vector, in which entries denote whether or not a particular amino acid appears at a particular position Using this binary encoding, evaluate the performance of Support Vector Machine with Gaussian kernel (Bin(SVM)), Naïve Bayes (Bin(NB)), and Decision Tree (Bin(C4.5)) machine learning algorithms PSSM-Based Representation – Our Approach (PSSMPhos) Combines profile-based and machine learning approaches PSSM motifs are obtained as before for each kinase family Encode each window as an n+1 vector, using the computed PSSM,, where e i (x i ) is the PSSM emitted score of observing amino acid x i at position i and Score(x) is the sum of the n emitted PSSM scores Train kinase-specific classifiers (PSSMPhos(SVM), PSSMPhos(NB), PSSMPhos(C4.5)) on the PSSM based representation Results Table 2 compares the performance of ML methods against profile- based methods for predicting kinase-specific phosphorylation sites. We also report the ROC curves for basic PSSM and basic HMM in Fig. 3 Table 2: Prediction accuracy of different methods using 5-fold cross validation test Method/ Kinase BasicHMMBasicPSSMPSSMPhos (SVM) PSSMPhos (NB) PSSMPhos (C4.5) Bin (SVM) Bin (NB) Bin (c4.5) Scansite (low) Scansite (med) Scansite (high) KinasePhos (default) KinasePhos (90) CDK CK MAPK PKA PKB PKC Fig.3: Comparison of ROC curves for BasicPSSM and BasicHMM for the six kinase families considered Conclusions We proposed PSSMPhos, a method for combining PSSM profiles and ML methods. Our study demonstrates the superiority of ML over profile-based methods when enough training data is available. Our experiments suggest that ML methods and profile-based methods should complement each other to produce more efficient phosphorylation site prediction tools.  Profile-Based Approaches  Scansite  A web service that is using 63 experimentally developed motifs, represented as PSSM, for identifying potential Ser/Thr phosphorylated sites.  KinasePhos  Another web service that uses Kinase-specific HMMs for predictions.  Basic PSSM  Our implementation of PSSM motifs using PROFILEWEIGHT program.  Basic HMM  Our implementation of HMM motifs using HMMER software package.