Presentation is loading. Please wait.

Presentation is loading. Please wait.

Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland.

Similar presentations


Presentation on theme: "Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland."— Presentation transcript:

1 Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland

2 Maetschke et al, The University of Queensland 2 Protein classes  -helical  -barrel TransmembraneAnchored Integral Peripheral Protein Soluble Membrane Single-spanning Multi-spanning

3 Maetschke et al, The University of Queensland 3 Transmembrane protein types N N C C Type-IType-II Type-IV (multi-spanning) Cytosol (inside) signal peptide Type-III N C

4 Maetschke et al, The University of Queensland 4 Nucleus Mitochondrion Peroxisome Lysosome Endoplasmic Reticulum Golgi Complex ERGIC Endosome RNA Ribosome Eukaryotic cell

5 Maetschke et al, The University of Queensland 5 Secretory and endocytic pathway

6 Maetschke et al, The University of Queensland 6 Problem and hypothesis Sorting signals for transmembrane proteins serve multiple purposes (targeting, retention, retrieval, avoidance) and are largely unknown (the problem is challenging/multi- faceted) Current localization prediction of eukaryotic transmembrane proteins is poor (models based on soluble proteins are ill-suited) (previous work is inadequate/incomplete) Localization prediction for transmembrane proteins is virtually unexplored (paucity/variance of data) (it is an open problem) Explicit modelling of protein topology should enhance localization prediction accuracy (parameter tuning receives explicit guidance to biologically sensible solutions) (the way to do it!)

7 Maetschke et al, The University of Queensland 7 Hidden Markov model Inital state probabilities: State transition probabilities: a 12 S1S1 S2S2 S3S3 b1b1 a 23 a 11 a 33 b3b3 b2b2 a 22 Observation probabilities: A R 1 V... 2 20 A R 1 V... 2 20 A R 1 V... 2 20 s 1 s 1 s 1 s 2 s 2 s 2 s 2 s 2 s 2 s 3 State sequence: Observation sequence:

8 Maetschke et al, The University of Queensland 8 2-order Hidden Markov model Inital state probabilities: State transition probabilities: a 12 S1S1 S2S2 S3S3 b1b1 a 23 a 11 a 33 b3b3 b2b2 a 22 Observation probabilities: AA AR 1 VV... 2 400 s 1 s 1 s 1 s 2 s 2 s 2 s 2 s 2 s 2 s 3 State sequence: Observation sequence: AN AD 3 4 AA AR 1 VV... 2 400 AN AD 3 4 AA AR 1 VV... 2 400 AN AD 3 4

9 Maetschke et al, The University of Queensland 9 3-order Hidden Markov model Inital state probabilities: State transition probabilities: a 12 S1S1 S2S2 S3S3 b1b1 a 23 a 11 a 33 b3b3 b2b2 a 22 Observation probabilities: AAA AAR 1 VVV... 2 8000 s 1 s 1 s 1 s 2 s 2 s 2 s 2 s 2 s 2 s 3 State sequence: Observation sequence: AAN AAD 3 4 AAC AAQ 5 6 AAA AAR 1 VVV... 2 8000 AAN AAD 3 4 AAC AAQ 5 6 AAA AAR 1 VVV... 2 8000 AAN AAD 3 4 AAC AAQ 5 6

10 Maetschke et al, The University of Queensland 10 Signal peptide cleavage region hydrophobic core N-terminal region mature protein

11 Maetschke et al, The University of Queensland 11 Transmembrane domain icapTMDocap

12 Maetschke et al, The University of Queensland 12 Protein topology model ocapTMDicapC-termN-termSP outsideinside

13 Maetschke et al, The University of Queensland 13 Localization model (5 x topology models) Nucleus Mitochondrion Peroxisome Lysosome Endoplasmic Reticulum Golgi Complex ERGIC Endosome

14 Maetschke et al, The University of Queensland 14 LOCATE dataset Subset LOCATE database FANTOM3, Mouse proteome Filter for transmembrane proteins No multi-targeted proteins Redundancy reduced (<25%) TMDs and SPs are labeled (predicted) High quality localization annotation 873 Plasma Membrane 261 Endoplasmic Reticulum 141 Golgi Complex 45 Lysosome 31 Endosome 1351

15 Maetschke et al, The University of Queensland 15 Prediction performance Prediction Performance (MCC) LOCATE dataset Mean correlation coefficient 10 fold, 10 times Five locations (ER, PM, GO, EN, LY) SVM: linear kernel 1-, 2- and 3-order HMMs Confusion Matrix HMM-2 => Di-peptide composition superior to single amino acid composition => Topological model superior to non-topological model

16 Maetschke et al, The University of Queensland 16 Predictor comparison Prediction accuracy in % CELLO 2.5: http://cello.life.nctu.edu.tw/ WolfPSort: http://wolfpsort.seq.cbrc.jp/http://cello.life.nctu.edu.tw/http://wolfpsort.seq.cbrc.jp/ ProteomeAnalyst 2.5:http://www.cs.ualberta.ca/~bioinfo/PA/Sub/http://www.cs.ualberta.ca/~bioinfo/PA/Sub/ HMM-2: http://pprowler.itee.uq.edu.au/TMPHMMLochttp://pprowler.itee.uq.edu.au/TMPHMMLoc Test set (20 PM, 20 ER, 20 Golgi) HMM: only three classes but test set  train set Other predictors: more classes but test set  train set → difficult to compare!

17 Maetschke et al, The University of Queensland 17 Conclusion Novel predictor for subcellular localization of transmembrane proteins along the secretory pathway: http://pprowler.itee.uq.edu.au/TMPHMMLoc http://pprowler.itee.uq.edu.au/TMPHMMLoc Protein model has less states than topology predictors (TMHMM, HMMTOP, etc) but is of second order Localization model is trained and tested using LOCATE, a recent, high-quality localization dataset Overall better performance than current localization predictors (transmembrane proteins, eukaryotic, secretory pathway) –Di-peptide composition superior to single amino acid composition –"Topological" model superior to "non-topological" baseline model


Download ppt "Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland."

Similar presentations


Ads by Google