Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center.

Slides:



Advertisements
Similar presentations
Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Advertisements

Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent.
Progress in Transmembrane Protein Research 12 Month Report Tim Nugent.
Structural Classification and Prediction of Reentrant Regions in Alpha-Helical Transmembrane Proteins: Application to Complete Genomes Håkan Viklunda,
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month.
Support Vector Machine-based Transmembrane Protein Topology Prediction Tim Nugent.
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Targeting and assembly of proteins destined for chloroplasts and mitochondria How are proteins targeted to chloroplasts and mitochondria from the cytoplasm?
Secondary structure prediction from amino acid sequence.
Corrections. SEQUENCE 4 >seq4 MSTNNYQTLSQNKADRMGPGGSRRPRNSQHATASTPSASSCKEQQKDVEH EFDIIAYKTTFWRTFFFYALSFGTCGIFRLFLHWFPKRLIQFRGKRCSVE NADLVLVVDNHNRYDICNVYYRNKSGTDHTVVANTDGNLAELDELRWFKY.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Membrane Protein Structure and Assembly Ross Dalbey The Ohio State University Department of Chemistry.
Profiles for Sequences
Andreas Bender - Research Group Gisbert Schneider - Goethe-University Frankfurt1 Analysis of mitochondrial transit peptides of Plasmodium falciparum Andreas.
©CMBI 2005 Exploring Protein Sequences – Part 1 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.
An Introduction to Bioinformatics Protein Structure Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
M.W. Mak and S.Y. Kung, ICASSP’09 1 Conditional Random Fields for the Prediction of Signal Peptide Cleavage Sites M.W. Mak The Hong Kong Polytechnic University.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
PREDICTION OF PROTEIN FEATURES Beyond protein structure (TM, signal/target peptides, coiled coils, conservation…)
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...
Truncation of Protein Sequences for Fast Profile Alignment with Application to Subcellular Localization Man-Wai MAK and Wei WANG The Hong Kong Polytechnic.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Predicting the Cellular Localization Sites of Proteins Using Decision Tree and Neural Networks Yetian Chen
Copyright  2003 limsoon wong Recognition of Protein Features Limsoon Wong Institute for Infocomm Research BI6103 guest lecture on ?? March 2004.
V6 SS 2006 Membrane Bioinformatics – Part II 1 V6 – Secondary Structure of TM proteins suggested reading for this lecture: Appl. Bioinf. 1, 21 (2002) Introduction.
Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki.
Prediction of protein disorder Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry Eotvos Lorand University, Budapest,
PART II. Prediction of functional regions within disordered proteins Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry.
Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
PREDICTING THE EXPRESSION AND SOLUBILITY OF MEMBRANE PROTEINS Center for High Throughput Structural Biology Mark E. Dumont *†, Michael A. White *, Kathy.
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland.
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Protein Properties Function, structure Residue features Targeting Post-trans modifications BIO520 BioinformaticsJim Lund Reading: Chapter , 11.7,
GFP-based membrane protein overexpression and purification in E. coli and S. cerevisiae Joy Kim Center for Biomembrane Research Department of Biochemistry.
Fehérjék 3. Simon István. p27 Kip1 IA 3 FnBP Tcf3 Bound IUP structures.
HMMs and SVMs for Secondary Structure Prediction
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
(H)MMs in gene prediction and similarity searches.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
V2 SS 2009 Membrane Bioinformatics 1 V2 - Predicting TM helices from sequence Review % of all genes code for transmembrane proteins (1) High energetic.
1 Computational Approaches(1/7)  Computational methods can be divided into four categories: prediction methods based on  (i) The overall protein amino.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Prediction of protein features. Beyond protein structure
Madhavi Ganapathiraju Graduate student Carnegie Mellon University
Functional Annotation of Transcripts
Combining HMMs with SVMs
Protein Structure Prediction
Sequence Based Analysis Tutorial
Protein Disorder Prediction
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (IV)
Profile HMMs GeneScan TMMOD
S.N.U. EECS Jeong-Jin Lee Eui-Taik Na
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center Stockholm University

Stockholm Bioinformatics Center sorting

Protein localization

Protein sorting in a eukaryotic cell SP

The ’canonical’ signal peptide n-region: positively charged h-region: hydrophobic c-region: more polar, small residues in -1, -3 mTP

mTPs are rich in R & K and can form amphiphilic helices (Abe et al., Cell 100:551) cTP mTP bound to Tom20

Typical chloroplast transit peptide ANN

A simple artificial neural network (ANN) output layer input layer Inside ANN

Artificial neural networks: a summary - a high-quality dataset (positive and negative examples) - an ANN architecture (can be optimized) - all internal parameters in the ANN are systematically optimized during a training session - evaluate the predictive performance using cross- validation ChloroP

ChloroP (Prot.Sci. 8:978) TargetP

TargetP - a four-state SP/mTP/cTP/other predictor (JMB 300:1105) performance

TargetP sensitivity/specificity sensspec SP mTP cTP other sens = tp/(tp+fn)spec = tp/(tp+fp) Other predictors

Other ways to predict localization - amino acid composition - sequence homology - domain structure - phylogenetic profiles - expression profiles Membrane proteins

Popular prediction programs SignalP (NN, HMM) ChloroP TargetP LipoP MitoProt PSORT Membrane proteins

Membrane protein topology

A simulated lipid bilayer (Grubmüller et al.)

Only two basic structures ( Quart.Rev.Biophys. 32:285) Helix bundle ß-barrel Lipid/prot interactions

Most MPs are synthesized at the ER SP

The basic model (courtesy Bill Skach) prediction

Topology prediction

TM helix lengths are typically residues (Bowie, JMB 272:780) Trp, Tyr

Trp & Tyr are enriched in the region near the lipid headgroups (Prot.Sci. 6:808; 7:2026) Loop lengths

Loops tend to be short (Tusnady & Simon, JMB 283:489) PI rule

The ’positive inside’ rule (EMBO J. 5:3021; EJB 174:671, 205:1207; FEBS Lett. 282:41) Bacterial IM in: 16% KR out: 4% KR Eukaryotic PM in: 17% KR out: 7% KR Thylakoid membrane in: 13% KR out: 5% KR Mitochondrial IM In: 10% KR out: 3% KR in out prediction

The positive-inside rule applies to all organisms (Nilsson, Persson & von Heijne, submitted) number of genomes amino acid

Topology can be manipulated (Nature 341:456) Lep constructs expressed in E. coli PK

Topology prediction - a classical problem in bioinformatics 4 characteristics

Three important characteristics ~20 hydrophobic residues predictors ’Positive inside’ rule Trp, Tyr

Popular topology predictors TMHMM (HMM) HMMTOP (HMM) TopPred (h-plot + PI-rule) MEMSAT (dynamic programming) TMAP (h-plot, mult. alignment) PHD (NN, mult. alignment) toppred

TopPred (JMB 225:487) seqanal/interfaces/ toppred.html - construct all possible topologies - rank based on  + E. coli LacY TMHMM

TMHMM (Sonnhammer et al., ISMB 6:175, Krogh et al., JMB 305:567) h & l models A hidden Markov model-based method

HMMTOP (Tusnady & Simon, JMB 283:489) performance

Helix & loop models in TMHMM HMMTOP

TMHMM performance (Krogh et al., JMB 305:567; Melén et al. JMB 327:735) Discrimination globular/membrane: sens & spec > 98% Correct topology: 55-60% Single TM identification: sensitivity: 96% specificity: 98% Training set: 160 membrane proteins 650 globular proteins # of TM proteins

Can performance be improved? Consensus predictions Multiple alignments Experimental constraints # of TM proteins

’Consensus’ predictions indicate reliability (FEBS Lett. 486:267) 60 E. coli proteins majority level fraction correct/coverage 5 prediction methods used 46% of 764 predicted E. coli IM proteins are in the 5/0 or 4/1 classes Partial consensus

TMHMM reliability scores (Melén et al. JMB 327:735) TMHMM output: 1. Mean probability p mean 2. Minimum probability p min (label) 3. P bestPath /P allPaths Sequence: M C Y G K C I p(i): p(h): p(o): Label: i i i i i h h S3 results

TMHMM (score 3) Prediction accuracy vs. coverage Test set bias percent correct coverage ~70%~45% 92 bacterial proteins

”Experimentally known topologies” is a biased sample percent score interval Estimate true performance

Correlation between accuracy and TMHMM S3 score mean score percent correct genomes

Expected TMHMM performance on proteomes E. coli S. cerevisiae test set C. elegans coverage percent correct Add C-term.

Original TMHMM prediction, one TM helix missing TMHMM prediction with C-terminus fixed to inside Experimental information helps (JMB 327:735) improvement

When the location of the C-terminus is known, the correct topology is predicted for an estimated ~70% of all membrane proteins (~ 55% when not known) Reporter fusions Experimental information helps (JMB 327:735)