Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent.

Slides:



Advertisements
Similar presentations
The Structure and Function of Proteins Bioinformatics Ch 7
Advertisements

Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Alpha-helical transmembrane protein fold prediction using residue contacts Timothy Nugent and David Jones Bioinformatics Group, Department of Computer.
Progress in Transmembrane Protein Research 12 Month Report Tim Nugent.
Structural Classification and Prediction of Reentrant Regions in Alpha-Helical Transmembrane Proteins: Application to Complete Genomes Håkan Viklunda,
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month.
Support Vector Machine-based Transmembrane Protein Topology Prediction Tim Nugent.
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Secondary structure prediction from amino acid sequence.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
An Introduction to Bioinformatics Protein Structure Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
By: Manchikalapati Myerow Shivananda Monday, April 14, 2003
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction
BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Intelligent Systems for Bioinformatics Michael J. Watts
Transmembrane proteins in the Protein Data Bank: identification and classification Gabor, E. Tusnady, Zsuzanna Dosztanyi and Istvan Simon Bioinformatics,
Levels of Protein Structure
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Predicting the Cellular Localization Sites of Proteins Using Decision Tree and Neural Networks Yetian Chen
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Secondary Structure Prediction
Secondary structure prediction
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Protein-Protein Interaction Hotspots Carved into Sequences Yanay Ofran 1,2, Burkhard Rost 1,2,3 1.Department of Biochemistry and Molecular Biophysics,
Protein Secondary Structure Prediction G P S Raghava.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Protein Properties Function, structure Residue features Targeting Post-trans modifications BIO520 BioinformaticsJim Lund Reading: Chapter , 11.7,
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
HMMs and SVMs for Secondary Structure Prediction
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Chapter 17 How to read a table of codons. These are two forms in which you might see a table of codons.
V2 SS 2009 Membrane Bioinformatics 1 V2 - Predicting TM helices from sequence Review % of all genes code for transmembrane proteins (1) High energetic.
Fibrous Proteins Examples 1. a-keratins 2. Silk Fibroin 3. Collagen
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Secondary Structure Prediction
Madhavi Ganapathiraju Graduate student Carnegie Mellon University
Functional Annotation of Transcripts
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Introduction to Bioinformatics II
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Combining HMMs with SVMs
Support Vector Machine (SVM)
Protein Structure Prediction
Sequence Based Analysis Tutorial
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (IV)
Profile HMMs GeneScan TMMOD
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent

Alpha-helical Transmembrane Proteins Transmembrane proteins fulfil many critical cellular functions. Comprise about 30% of the human proteome. Composed of hydrophobic, membrane-spanning alpha-helices, connected with loop regions. Poorly represented in structural databases. Predicting their structure and topology is therefore an important challenge for bioinformatics.

Transmembrane Protein Topology Topology of a transmembrane protein describes which portions of the amino- acid sequence lie within the plane of the surrounding lipid bilayer and which portions protrude into the watery environment on either side. Regions of the polypeptide chain span the membrane. Position of the N-terminal.

Identification of Transmembrane Regions To generate data for a plot, the protein sequence is scanned with a moving window of size residues. At each position, the mean hydrophobic index of the amino acids within the window is calculated and that value plotted as the midpoint of the window. Aquaporin V KGVWTQAFWKAVTAEFLAMLIFVLLSVGSTINWGGSEN

Discriminating between Inside and Outside Loops Hydrophobic: Val, Phe, Ile, Leu, Met. Positive: Lys, Arg, His. Cytoplasmic loops are enriched in positively charged residues: the 'positive-inside rule' of von Heijne

Using Evolutionary Information PSI-BLAST takes a single protein sequence as an input and compares it to a protein database. The program constructs a multiple alignment, and then a profile, from any significant local alignments found. The profile is compared to the protein database, again seeking local alignments. PSI-BLAST estimates the statistical significance of the local alignments found. Finally, PSI-BLAST iterates, by returning to step (2), an arbitrary number of times or until convergence

Using Support Vector Machines for Topology prediction Earlier approaches have relied on physiochemical properties such as hydrophobicity to identify transmembrane helices (e.g Kyte-Doolittle). Recently, more advanced methods using machine learning algorithms such as hidden Markov models (e.g. TMHMM, PHOBIUS) and neural networks (MEMSAT3) have been developed, They have achieved significant improvements in prediction accuracy (~80%). However, none of the top scoring methods use SVMs. While hidden Markov models and neural networks may have multiple outputs, SVMs are binary classifiers. In order to deal with TM topology prediction, multiple SVM will have to be combined, e.g. TM helix / Loop Inside Loop / Outside Loop Signal Peptide / TM helix Re-entrant Loop / TM helix

Helix / Loop SVM Prediction Accuracy TM helix / Loop SVM: Database of 135 non-redundant protein sequences Jack knife cross-validation PSI-BLAST profiles Normalised by Z-score 33 residue sliding window Radial Basis Function Kernel: Gamma = 0.09, C = 0.8 SVM Mathews Correlation Coefficient = 0.82 TP=9129 FP=1351 TN=22140 FN=1320 Kyte-Doolittle MCC: 0.66 MEMSAT3 MMC: 0.76

Inside Loop/Outside Loop SVM Prediction Accuracy Inside Loop/Outside Loop SVM 33 residue sliding window Mathews Correlation Coefficient = 0.64 Precision = 0.86 Recall = 0.59 Signal Peptide/TM Helix and Re-entrant Loop/TM Helix SVMs in training...

SVM Results – Glycerol uptake facilitator

SVM Results – Photosystem II subunit A

SVM Results – Particulate Methane Monooxygenase subunit C

SVM Results – Cytochrome b6f subunit A

Further work Expand training set. Additional sequences where the TMH are known but the topology is not can be used to train the Helix/Loop classifier. Parameter optimisation. Window size Kernel type Transduction. Signal peptide SVM Re-entrant loop SVM. Combine SVM raw scores/probabilities into a topology.