Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated mass measurement error (should be seen in peptide view graphs, – Incorrect determination of precursor charge state – Peptide sequence is not in the database. – Missed cleavage & unexpected cleavage, – Unexpected chemical & post-translational modification. The biological structure, function and activity of a protein can be determined by the modification of the given protein. An increasing part of the proteins that have been mapped to e.g. different diseases, not only change in expression levels but also or exclusively in the level of posttranslational modifications. 1
Post-Translational Modifications (PTMs) PTM alters the weight of amino acids and the peptide that results peak shifts in the spectrum: b 1 : H b 2 : HQ b 3 : HQS b 4 : HQSV b 5 : HQSVM … b 10 :HQSVMVGMVQ QSVMVGMVQK:y 10 SVMVGMVQK: y 9 VMVGMVQK: y 8 MVGMVQK: y 7 VGMVQK: y 6 … K: y 1 m/z b1b1 y1y1 b2b2 b3b3 y 10 b 10 b3b3 y 10 b 10 y7y7 …… H Q S V M V G M V Q K b1b1 y 10 b2b2 y9y9 b3b3 y8y8 b4b4 y7y7 b5b5 y6y6 b9b9 y5y5 b6b6 y4y4 b7b7 y3y3 b8b8 y2y2 b 10 y1y1 2
PTMs Complete modifications (chemical modifications) Variable modifications 3
PTMs Obstacles – Complexity (means longer execution time) Can increase the search space 1,10, fold – Significance 4
Obstacles - Complexity Let the theoretical peptide be: – HQSVMVGMVQK (11 amino acids) – Each amino acid can be modified by, let’s say, 5 PTMs # included PTMs# modified theoretical spectratime 011 sec 111*5 = 5555 seconds (1min) 211*25 = mins 311*15*125 = hours hours (3.5 years) In general: Peptide length = L Included PTMs = K PTMs/aa = M 5
– Inserting many PTMs make the theoretical spectra too flexible and in the end all theoretical spectra can be aligned to the experimental spectra % 0% 1 0
Significance Increases the random matches – Inserting many PTMs make the theoretical spectra too flexible and in the end all theoretical spectra can be aligned to the experimental spectra. 7
Computational Identification of PTMs 3 approaches: – Targeted, – Untargeted or also called restricted – Unrestricted, de novo, blind search 8
Targeted approach Almost all search engine supports it. – Experimenter needs to guess the PTMs in the sample. Two pass strategy – Two rounds, refinement on a smaller – Sequest, Mascot 9
Targeted approach – X!Tandem 10
Targeted approach – InsPecT 11
Untargeted approaches Uses a big list of databases – Search space is limited but can be very huge. – if we allow 5 of the 10 most frequent modifications to occur in a peptide at the same type, the search space grows 3 orders of magnitude. – The growth is more dramatic if instead of 10 types of modifications we wish to consider all of roughly 500 known types. 12
Database of PTMs Unimod – – Contains 906 modifications Resid – – 559 Entries 13
Untargeted PILOT_PTM – Uses a large dataset of modifications. – Binary Linear programming. Objective function is the number of the matched peaks Linear constrain functions are guarantee meaningful modifications of the peptide. 14
Unrestricted No priori information about PTMs. De novo identification of PTMs Search space is infinite. In practice no more than one or two PTMs can be identified on the same peptide. 15
TwinPeaks approach Based on the Sequest idea. Shifts the experimental spectra over a range, and plots the similarity score as a function of the mass shift. 16
TwinPeaks approach 17 Sum of matched intensity
MS-Alignment Based on the alignment of the theoretical spectra to the experimental spectra 18
19 Theoretical Spectrum Experimental Spectrum
MS-alignment 20
Comparison of targeted and unrestricted results 21 Scan IDlog(-E)Peptide fqyr 295 ILTAAALCHF TSIEVVK 311 kasg (130)ILTAAALCHF TSIEVVK rihr 159 FVEKPQVFVS NK 170 inag (471)FVEKPQVFVS NK rtcr 30 SPEPGPSSSI GSPQASSPPR PN 51 hyll (48)SPEPGPSSSI GSPQASSPPR PN dvtr 473 TMHFGTPTAY EK 484 ecft (306)TMHFGTPTAY EK ietk 133 FFDDDLLVST SR 144 vrlf (176)FFDDDLLVST SR pskr 237 QTNGCLNGYT PSR 249 krqa (112)QTNGCLNGYT PSR ntpr 149 KNGGLGHMNI ALLSDLTK 166 qisr (1776)KNGGLGHMNI ALLSDLTK pqgr 19 IHQIEYAMEA VK 30 qgsa (10317)IHQIEYAMEA VK kefk 80 DREDLVPYTG EK 91 rgkv (137)DREDLVPYTG EK dyhr 131 YLAEFATGND R 141 keaa (9406)YLAEFATGND R grar 16 QYTSPEEIDA QLQAEK 31 qkar (2754)QYTSPEEIDA QLQAEK rlar 172 QDPQLHPEDP ER 183 raai (644)QDPQLHPEDP ER iflh 92 ISDVEGEYVP VEGDEVTYK 110 mcsi (73)ISDVEGEYVP VEGDEVTYK mrsr 328 TASGSSVTSL DGTR 341 srsh (2698)TASGSSVTSL DGTR lgnk 29 YVQLNVGGSL YYTTVR 44 altr (71)YVQLNVGGSL YYTTVR dlqk 183 EGEFSTCFTE LQR 195 dflk (239)EGEFSTCFTE LQR pkek 135 QPVAGSEGAQ YR 146 kkql (694)QPVAGSEGAQ YR lsar 446 ASNAWILQQH IATVPSLTHL CR 467 leir (107)ASNAWILQQH IATVPSLTHL CR evyr 175 NSMPASSFQQ QK 186 lrvc (7099)NSMPASSFQQ QK iygk 81 QFEDELHPDL K 91 ftga (491)QFEDELHPDL K Scan IDP-valuePeptide 31.00E-05R.ILTAAALCHFTSIEVVK.K 61.00E-05R.FVEKPQVFVSNK.I E-05K.FFDDDLLVSTSR.V E-05R.IHQIEYAMEAVK.Q A.V+172LTAFANGR.S E-05K.QFEDELHPDLK.F R.ETFY+18LAQDFFDR.F E-05R.TCLSQLLDIMK.S E-05K.EYFSTFGEVLM+16VQVK.K E-05K.QH-18LENDPGSNEDTDIPK.G Q.L+128GVSHVFEYIR.S C.T+160EDMTEDELR.E E-05R.EFFD-18SNGNFLYR.I E-05R.LVLESPAPVEVNLK.L E-05K.LQEFAYVTDGAC+14SEEDILR.M E-05K.SFDENGFDYLLTYSDNPQTVFP+156.R E-05R.GPATVEDLPSAFEEK.A E-05Y.ITD+163VLTEEDALEILQK.G E-05R.IYSYQMALTPVVVTLWYR.A X!Tandem targeted MS-Alignment Unrestricted (de novo)
Validate your results 22
Summary What you should remember: – PTM identification is computationally expensive – 3 approaches (targeted, untargeted, unrestricted) – Always examine the results, omit weird PTMs, – Decreases the statistical significance – The more you are looking for the less you get (due to significance) 23