Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center.

Similar presentations


Presentation on theme: "Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center."— Presentation transcript:

1 Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center Stockholm University

2 Stockholm Bioinformatics Center www.sbc.su.se sorting

3 Protein localization

4 Protein sorting in a eukaryotic cell SP

5 The ’canonical’ signal peptide n-region: positively charged h-region: hydrophobic c-region: more polar, small residues in -1, -3 mTP

6 mTPs are rich in R & K and can form amphiphilic helices (Abe et al., Cell 100:551) cTP mTP bound to Tom20

7 Typical chloroplast transit peptide ANN

8 A simple artificial neural network (ANN) output layer input layer Inside ANN

9 Artificial neural networks: a summary - a high-quality dataset (positive and negative examples) - an ANN architecture (can be optimized) - all internal parameters in the ANN are systematically optimized during a training session - evaluate the predictive performance using cross- validation ChloroP

10 ChloroP (Prot.Sci. 8:978) TargetP

11 TargetP - a four-state SP/mTP/cTP/other predictor (JMB 300:1105) performance

12 TargetP sensitivity/specificity sensspec SP.91.96 mTP.82.90 cTP.85.69 other.85.78 sens = tp/(tp+fn)spec = tp/(tp+fp) Other predictors

13 Other ways to predict localization - amino acid composition - sequence homology - domain structure - phylogenetic profiles - expression profiles Membrane proteins

14 Popular prediction programs SignalP (NN, HMM) ChloroP TargetP LipoP ------- MitoProt PSORT Membrane proteins www.cbs.dtu.dk

15 Membrane protein topology

16 A simulated lipid bilayer (Grubmüller et al.)

17 Only two basic structures ( Quart.Rev.Biophys. 32:285) Helix bundle ß-barrel Lipid/prot interactions

18 Most MPs are synthesized at the ER SP

19 The basic model (courtesy Bill Skach) prediction

20 Topology prediction

21 TM helix lengths are typically 20-30 residues (Bowie, JMB 272:780) Trp, Tyr

22 Trp & Tyr are enriched in the region near the lipid headgroups (Prot.Sci. 6:808; 7:2026) Loop lengths

23 Loops tend to be short (Tusnady & Simon, JMB 283:489) PI rule

24 The ’positive inside’ rule (EMBO J. 5:3021; EJB 174:671, 205:1207; FEBS Lett. 282:41) Bacterial IM in: 16% KR out: 4% KR Eukaryotic PM in: 17% KR out: 7% KR Thylakoid membrane in: 13% KR out: 5% KR Mitochondrial IM In: 10% KR out: 3% KR in out prediction

25 The positive-inside rule applies to all organisms (Nilsson, Persson & von Heijne, submitted) number of genomes amino acid

26 Topology can be manipulated (Nature 341:456) Lep constructs expressed in E. coli 10+ 2+ 4+ 0+ PK

27 Topology prediction - a classical problem in bioinformatics 4 characteristics

28 Three important characteristics ~20 hydrophobic residues predictors ’Positive inside’ rule Trp, Tyr

29 Popular topology predictors TMHMM (HMM) HMMTOP (HMM) TopPred (h-plot + PI-rule) MEMSAT (dynamic programming) TMAP (h-plot, mult. alignment) PHD (NN, mult. alignment) toppred

30 TopPred (JMB 225:487) http://bioweb.pasteur.fr/ seqanal/interfaces/ toppred.html - construct all possible topologies - rank based on  + E. coli LacY TMHMM

31 TMHMM (Sonnhammer et al., ISMB 6:175, Krogh et al., JMB 305:567) h & l models www.cbs.dtu.dk www.sbc.su.se A hidden Markov model-based method

32 HMMTOP (Tusnady & Simon, JMB 283:489) performance

33 Helix & loop models in TMHMM HMMTOP

34 TMHMM performance (Krogh et al., JMB 305:567; Melén et al. JMB 327:735) Discrimination globular/membrane: sens & spec > 98% Correct topology: 55-60% Single TM identification: sensitivity: 96% specificity: 98% Training set: 160 membrane proteins 650 globular proteins # of TM proteins

35 Can performance be improved? Consensus predictions Multiple alignments Experimental constraints # of TM proteins

36 ’Consensus’ predictions indicate reliability (FEBS Lett. 486:267) 60 E. coli proteins majority level fraction correct/coverage 5 prediction methods used 46% of 764 predicted E. coli IM proteins are in the 5/0 or 4/1 classes Partial consensus

37 TMHMM reliability scores (Melén et al. JMB 327:735) TMHMM output: 1. Mean probability p mean 2. Minimum probability p min (label) 3. P bestPath /P allPaths Sequence: M C Y G K C I p(i): 0.78 0.78 0.78 0.76 0.76 0.08 0.03 p(h): 0.00 0.00 0.02 0.02 0.15 0.85 0.93 p(o): 0.22 0.22 0.20 0.20 0.08 0.07 0.04 Label: i i i i i h h S3 results

38 TMHMM (score 3) Prediction accuracy vs. coverage Test set bias percent correct coverage ~70%~45% 92 bacterial proteins

39 ”Experimentally known topologies” is a biased sample percent 0-0.25 0.25-0.5 0.5-0.750.75-1 score interval Estimate true performance

40 Correlation between accuracy and TMHMM S3 score mean score percent correct genomes

41 Expected TMHMM performance on proteomes E. coli S. cerevisiae test set C. elegans coverage percent correct Add C-term.

42 Original TMHMM prediction, one TM helix missing TMHMM prediction with C-terminus fixed to inside Experimental information helps (JMB 327:735) improvement

43 When the location of the C-terminus is known, the correct topology is predicted for an estimated ~70% of all membrane proteins (~ 55% when not known) Reporter fusions Experimental information helps (JMB 327:735)


Download ppt "Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center."

Similar presentations


Ads by Google