Progress in Transmembrane Protein Research 12 Month Report Tim Nugent.

Slides:



Advertisements
Similar presentations
Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Advertisements

Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent.
Alpha-helical transmembrane protein fold prediction using residue contacts Timothy Nugent and David Jones Bioinformatics Group, Department of Computer.
Structural Classification and Prediction of Reentrant Regions in Alpha-Helical Transmembrane Proteins: Application to Complete Genomes Håkan Viklunda,
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month.
Support Vector Machine-based Transmembrane Protein Topology Prediction Tim Nugent.
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Secondary structure prediction from amino acid sequence.
CSE & CSE Multimedia Processing Lecture 09 Pattern Classifier and Evaluation for Multimedia Applications Spring 2009 New Mexico Tech.
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Corrections. SEQUENCE 4 >seq4 MSTNNYQTLSQNKADRMGPGGSRRPRNSQHATASTPSASSCKEQQKDVEH EFDIIAYKTTFWRTFFFYALSFGTCGIFRLFLHWFPKRLIQFRGKRCSVE NADLVLVVDNHNRYDICNVYYRNKSGTDHTVVANTDGNLAELDELRWFKY.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Intelligent Systems and Software Engineering Lab (ISSEL) – ECE – AUTH 10 th Panhellenic Conference in Informatics Machine Learning and Knowledge Discovery.
High Throughput Computing and Protein Structure Stephen E. Hamby.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
An Introduction to Bioinformatics Protein Structure Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.
PREDICTION OF PROTEIN FEATURES Beyond protein structure (TM, signal/target peptides, coiled coils, conservation…)
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Levels of Protein Structure
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Protein Secondary Structure Prediction: A New Improved Knowledge-Based Method Wen-Lian Hsu Institute of Information Science Academia Sinica, Taiwan.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland.
Protein-Protein Interaction Hotspots Carved into Sequences Yanay Ofran 1,2, Burkhard Rost 1,2,3 1.Department of Biochemistry and Molecular Biophysics,
Protein Secondary Structure Prediction G P S Raghava.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Li Lihong (Anna Lee) Cumputer science 22th,Apr.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
HMMs and SVMs for Secondary Structure Prediction
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Bioinformatics in Vaccine Design
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
V2 SS 2009 Membrane Bioinformatics 1 V2 - Predicting TM helices from sequence Review % of all genes code for transmembrane proteins (1) High energetic.
Proteins Structure Predictions Structural Bioinformatics.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Madhavi Ganapathiraju Graduate student Carnegie Mellon University
Functional Annotation of Transcripts
SMA5422: Special Topics in Biotechnology
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Introduction to Bioinformatics II
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Combining HMMs with SVMs
Support Vector Machine (SVM)
Protein Structure Prediction
Sequence Based Analysis Tutorial
CISC 841 Bioinformatics (Fall 2007) Hybrid models
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (IV)
Profile HMMs GeneScan TMMOD
Deep Learning in Bioinformatics
Practice Project Overview
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Progress in Transmembrane Protein Research 12 Month Report Tim Nugent

Assignment of PROSITE motifs to topological regions We explored the possibility that motifs from the PROSITE database could be used as constraints in subsequent topology prediction steps, by identifying a bias in their inside/outside frequency. Extracelullar Cytoplasm

Alpha-helical protein PROSITE motif assignments

Using PROSITE motifs to enhance topology prediction

CLN3 Topology Prediction

Model is in agreement with all published experimental data. Potential amphipathic helix. Bias is hydrophobic/polar residue placement 2 Arginine residues in close proximity – possible anion channel?

Using Support Vector Machines for Topology prediction Earlier approaches have relied on physiochemical properties such as hydrophobicity to identify transmembrane helices (e.g Kyte-Doolittle). Recently, more advanced methods using machine learning algorithms such as hidden Markov models (e.g. TMHMM, PHOBIUS) and neural networks (MEMSAT3) have been developed, They have achieved significant improvements in prediction accuracy (~80%). However, none of the top scoring methods use SVMs. While hidden Markov models and neural networks may have multiple outputs, SVMs are binary classifiers. In order to deal with TM topology prediction, multiple SVM will have to be combined, e.g. TM helix / Loop Inside Loop / Outside Loop Signal Peptide / TM helix Re-entrant Loop / TM helix

Helix / Loop SVM Prediction Accuracy TM helix / Loop SVM: PSI-BLAST profiles Normalised by Z-score 29 residue sliding window 3 rd order polynomial kernel function Mathews Correlation Coefficient = 0.75 Precision = 0.86 Recall = 0.32 TP= 8384 FP= 1355 TN= FN= 1969 Kyte-Doolittle MCC: 0.64 MEMSAT3 MMC: 0.76 Overlap of at least 37 sequences between Moller dataset and novel training set.

Inside Loop/Outside Loop SVM Prediction Accuracy Inside Loop/Outside Loop SVM 27 residue sliding window Mathews Correlation Coefficient = 0.60 Precision = 0.78 Recall = 0.50 TP= 4060 FP=1028 TN=4081 FN=1007 Signal Peptide/TM Helix and Re-entrant Loop/TM Helix SVMs in training!

SVM Results – Glycerol uptake facilitator

SVM Results – Photosystem II subunit A

SVM Results – Particulate Methane Monooxygenase subunit C

SVM Results – Cytochrome b6f subunit A

Further work Expand training set: ~45 sequences to add. Additional sequences where the TMH are known but the topology is not can be used to train the Helix/Loop classifier. Parameter optimisation. Window size Kernel type Signal peptide SVM. Re-entrant loop SVM. Combine SVM raw scores/probabilities into a topology.

Whole-Proteome TM Protein Analysis

Identifying Pore-forming TM Helices