Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.

Slides:



Advertisements
Similar presentations
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Advertisements

Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin and Vasant Honavar. BigData2013.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Mike Arnoult 9/30/2010 The role of Artificial Neural Networks in Phage Research.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.
Dynamic Face Recognition Committee Machine Presented by Sunny Tang.
Reduced Support Vector Machine
Remote homology detection  Remote homologs:  low sequence similarity, conserved structure/function  A number of databases and tools are available 
Herpes Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella.
Wayne State University, 1/31/ Multiple-Instance Learning via Embedded Instance Selection Yixin Chen Department of Computer Science University of.
Diagnosis of Ovarian Cancer Based on Mass Spectrum of Blood Samples Committee: Eugene Fink Lihua Li Dmitry B. Goldgof Hong Tang.
Oregon State University – Intelligent Systems Group 8/22/2003ICML Giorgio Valentini Dipartimento di Scienze dell Informazione Università degli Studi.
Remote Homology detection: A motif based approach CS 6890: Bioinformatics - Dr. Yan CS 6890: Bioinformatics - Dr. Yan Swati Adhau Swati Adhau 04/14/06.
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
CISC Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS:
Masquerade Detection Mark Stamp 1Masquerade Detection.
Bioinformatics and Computational Biology Graduate Program Carla Mann December 11, 2014 Rocky Mountain Bioinformatics Conference Snowmass, CO RNABindRPlus.
From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris Lin, Neeraj Koul, and Vasant.
Transcription of Text by Incremental Support Vector machine Anurag Sahajpal and Terje Kristensen.
Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM) Dr. Bernard Chen Assistant Professor Department of Computer.
Comparative genomics of zbtb7b between human and mouse.
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat.
Center for Computational Intelligence, Learning, and Discovery Artificial Intelligence Research Laboratory Department of Computer Science Supported in.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
CSBSI 2007 Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Department of Computer Science Generating.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.
B IOINFORMATICS AND C OMPUTATIONAL B IOLOGY A Computational Method to Identify RNA Binding Sites in Proteins Jeff Sander Iowa State University Rocky 2006.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
 Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues  Results show that.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
COMP24111: Machine Learning Ensemble Models Gavin Brown
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
Learning Chaotic Dynamics from Time Series Data A Recurrent Support Vector Machine Approach Vinay Varadan.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Protein Folding recognition with Committee Machine Mika Takata.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
An Improved Search Algorithm for Optimal Multiple-Sequence Alignment Paper by: Stefan Schroedl Presentation by: Bryan Franklin.
Intrinsically disordered proteins Zsuzsanna Dosztányi EMBO course Budapest, 3 June 2016.
Dec 7, 2003 Poster Prediction of Half Activation Voltages of Voltage- gated Potassium Channels Based on Amino Acid Sequences Using Machine Learning Bin.
Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.
SMA5422: Special Topics in Biotechnology
COMP61011 : Machine Learning Ensemble Models
Introduction Feature Extraction Discussions Conclusions Results
BCB 444/544 F07 ISU Dobbs#33 - Genomics
Artificial Intelligence Research Laboratory
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Ontology-Based Information Integration Using INDUS System
Alignment of putative chicken HB-EGF to mammalian HB-EGF proteins and the domains of HB-EGF. Alignment of putative chicken HB-EGF to mammalian HB-EGF proteins.
Presentation transcript:

Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science Rocky 2006 Acknowledgements : This work is supported in part by a grant from the National Institutes of Health (GM ) to Vasant Honavar & Drena Dobbs Glycosylation Site Prediction using Machine Learning Approaches Cornelia Caragea, Jivko Sinapov, Adrian Silvescu, Drena Dobbs and Vasant Honavar Biological Motivation Glycosylation is one of the most complex post-translational modifications (PTMs). It is the site-specific enzymatic addition of saccharides to proteins and lipids. Most proteins in eukaryotic cells undergo glycosylation. Types of Glycosylation M K LI TI L C F L S R L L P S L T Q E S S Q EID Non-Glycosylated? Glycosylated? N-linked?O-linked?C-linked? H3N+H3N+ COO - Problem: Predict glycosylation sites from amino acid sequence Previous Approaches Trained Neural Networks used in netOglyc prediction server (Hansen et al., 1995) Dataset: mucin type O-linked glycosylation sites in mammalian proteins Trained SVMs based on physical properties, 0/1 system and a combination of these two (Li et al., 2006) Dataset: mucin type O-linked glycosylation sites in mammalian proteins Negative examples extracted from sequences with no known glycosylated sites Trained/tested using different ratios of positive and negative sites Our Approach We investigate 3 types of glycosylation and use an ensemble classifier approach Dataset: N-, C- and O-linked glycoslation sites in proteins from several different species: human, rat, mouse, insect, worm, horse, etc. Negative examples extracted from sequences with at least one experimentally verified glycosylated site Dataset O-GlycBase v6.00: O-, N- & C- glycosylated proteins with 242 glycosylated entries available at Glycosylation Type Positive Sites Negative Sites O-Linked (S/T) N-Linked (N) C-Linked (W)4773 Total Train DB Sampling.... train.... Bag of Trained Classifiers Test DB Weighted Majority Vote Predictions train Training an ensemble classifier Classifiers SVM 0/1 String Kernel Substitution Matrix Kernel Blast - Polynomial Kernel J48 Naïve Bayes Identity windows Identity plus additional information C-mannosylation Glycosylation N-linked glycosylation GPI anchor N-acetylglucosamine (N-GlcNAc) O-N-acetylgalactosamine (O-GalNAc) O-N-acetylglucosamine (O-GlcNAc) O-fucose O-glucose O-mannose O-hexose O-xylose C-mannose O-linked glycosylation ROC Curves for N-Linked ROC Curves for O-Linked ROC Curves for C-Linked Comparison of ROC Curves for single and ensemble classifier Results Conclusion In this work we addressed the problem of predicting glycosylation sites. Three types of machine learning algorithms were used: SVM, NB, and DT. We built predictive ensemble classifiers based on data corresponding to three forms of glycosylation: O-, N-, and C-Linked glycosylation. Our experiments show encouraging results.