Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,

Slides:



Advertisements
Similar presentations
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Advertisements

Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent.
Alpha-helical transmembrane protein fold prediction using residue contacts Timothy Nugent and David Jones Bioinformatics Group, Department of Computer.
Progress in Transmembrane Protein Research 12 Month Report Tim Nugent.
Structural Classification and Prediction of Reentrant Regions in Alpha-Helical Transmembrane Proteins: Application to Complete Genomes Håkan Viklunda,
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month.
Support Vector Machine-based Transmembrane Protein Topology Prediction Tim Nugent.
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Secondary structure prediction from amino acid sequence.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Computer Aided Vaccine Design Dr G P S Raghava. Concept of Drug and Vaccine Concept of Drug Concept of Drug –Kill invaders of foreign pathogens –Inhibit.
50%, guessing 100%, all correct Accuracy = Figure 2 Predictive Accuracy of SMO algorithm using each attribute separately Prediction of catalytic residues.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
An Introduction to Bioinformatics Protein Structure Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Protein Tertiary Structure Prediction
Truncation of Protein Sequences for Fast Profile Alignment with Application to Subcellular Localization Man-Wai MAK and Wei WANG The Hong Kong Polytechnic.
We introduce the use of Confidence c as a weighted vote for the voting machine to avoid low confidence Result r of individual expert from affecting the.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Transmembrane proteins in the Protein Data Bank: identification and classification Gabor, E. Tusnady, Zsuzanna Dosztanyi and Istvan Simon Bioinformatics,
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Protein Secondary Structure Prediction: A New Improved Knowledge-Based Method Wen-Lian Hsu Institute of Information Science Academia Sinica, Taiwan.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein-Protein Interaction Hotspots Carved into Sequences Yanay Ofran 1,2, Burkhard Rost 1,2,3 1.Department of Biochemistry and Molecular Biophysics,
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Protein Secondary Structure Prediction G P S Raghava.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Cluster validation Integration ICES Bioinformatics.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Protein Family Classification using Sparse Markov Transducers Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology.
HMMs and SVMs for Secondary Structure Prediction
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Improving compound–protein interaction prediction by building up highly credible negative samples Toward more realistic drug-target interaction predictions.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improvement of SSR Redundancy Identification by Machine Learning Approach Using Dataset from Cotton Marker Database Pengfei Xuan 1,2, Feng Luo 2, Albert.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Madhavi Ganapathiraju Graduate student Carnegie Mellon University
Hybrid Features based Gender Classification
Beta sheets come in two flavors: parallel (shown on this slide) and anti parallel. The geometry of the individual beta strandis are almost identical in.
Feature Extraction Introduction Features Algorithms Methods
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Introduction to Bioinformatics II
Combining HMMs with SVMs
Support Vector Machine (SVM)
Virtual Screening.
Deep Learning Hierarchical Representations for Image Steganalysis
Protein Structure Prediction
Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng 
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (IV)
Ligand Binding to the Voltage-Gated Kv1
Profile HMMs GeneScan TMMOD
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT Introduction Alpha-helical transmembrane (TM) proteins constitute roughly 30% of a typical genome and are involved in a wide variety of important biological processes. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under represented in structural databases, making up only 1% of known structures in the PDB. Given the biological and pharmacological importance of TM proteins, an understanding of their topology - the total number of TM helices, their boundaries and in/out orientation relative to the membrane - is essential for structural and functional analysis, and directing further experimental work. In the absence of structural data, bioinformatic strategies thus turn to sequence-based prediction methods. Signal Peptides and Re-entrant Helices One problem faced by modern topology predictors is the discrimination between TM helices and other features composed largely of hydrophobic residues. These include targeting motifs such as signal peptides and signal anchors, amphipathic helices, and re-entrant helices – membrane penetrating helices that enter and exit the membrane on the same side, common in many ion channel families (Figure 1). The high similarity between such features and the hydrophobic profile of a TM helix frequently leads to crossover between the different types of predictions. Should these elements be predicted as TM helices, the ensuing topology prediction is likely to be disrupted. Figure 1. A chain from a Potassium channel (PDB code 1r3j) showing a re-entrant helix, thought to function as a selectivity filter. A Novel Topology Predictor We have developed a new TM topology predictor trained and benchmarked with full cross-validation on a novel data set of 131 sequences, with topologies derived solely from crystal structures. The method uses evolutionary information and four support vector machine (SVM) classifiers, combining the outputs using a dynamic programming algorithm, to return a list of predicted topologies ranked by overall likelihood, and incorporates signal peptide and re-entrant helix prediction. In training the SVMs, PSI-BLAST profiles were generated for each sequence and a sliding window approach was applied, with values normalised by Z-score to improve convergence time. Jack knife cross-validation was used to access SVM performance, with sequences with >25% sequence identity removed from training sets. Window size and SVM parameters were optimised using Mathew's correlation coefficient (Table 1). Table 1. Per residue SVM prediction accuracy. MCC: Mathew's correlation coefficient. A modified version of the original MEMSAT dynamic programming algorithm was used, treating TM helices as discrete units, rather than separating them into inside, outside and middle components. Re-entrant helix and signal peptide states were added. Residues were therefore predicted to lie in one of five different topological regions: inside loop, outside loop, TM helix, re-entrant helix and signal peptide. The SVM-based method ('TMSVM') was benchmarked against a selection of leading topology predictors (Table 2), scoring 89% overall accuracy, an improvement of 10% over the next best method. TMSVM was able to detect signal peptides with 92% accuracy and re-entrant helices with 44% accuracy, with no false positives predicted. Table 2. Overall prediction accuracy. OCTOPUS [1] results were not cross-validated therefore are likely to be overestimated as there is considerable overlap between test and training sets. The graphical output from the program is shown in Figure 2. Figure 2. Results for Ubiquinol Oxidase showing correct topology and signal peptide prediction. The raw SVM scores are shown below the topology schematic. Discriminating between Globular and Transmembrane Proteins An additional SVM was trained to discriminate between globular and transmembrane proteins, using a data set of 2685 non-redundant chains from globular proteins of known structure, combined with our novel set of 131 TM proteins. PSI-BLAST profiles were generated for all sequences and 10-fold cross validation was used to assess performance. A 0% false positive rate (FP) and 0.4% false negative (FN) rate was achieved, which improved on the MEMSAT3 [2] neural network-based methods (0.5% FP, 0.5% FN). Figure 3. Ten genomes were filtered using the TM/globular discriminator. Those predicted to be TM proteins were subject to full TM topology prediction. X-axis: TM helix count. Y- axis: Number of proteins. Conclusions Overall, the method predicted the correct topology and location of TM helices for 89% of the test set, a significant improvement over recent methods. The SVM trained to discriminate between TM and globular proteins achieved a false positive rate of 0% and false negative rate of 0.4%, making this method highly suitable for whole genome analysis (Figure 3). However, there is still room for significant improvement in the detection of re-entrant helices. [1] Viklund H., Elofsson A. (2008) OCTOPUS: Improving topology prediction by two-track ANN- based preference scores and an extended topological grammar. Bioinformatics (In press). [2] Jones D.T. (2007) Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics.23: Due to the paucity of alpha-helical transmembrane protein crystal structures, in silico approaches are essential for structural analysis. We present a support vector machine-based topology predictor that integrates both signal peptide and re-entrant helix prediction, and present the results of application to a number of complete genomes. This work was funded by the Biotechnology and Biological Sciences Research Council