Download presentation
Presentation is loading. Please wait.
Published byTimothy Hayden Modified over 9 years ago
1
1 st MS 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th Relative Intensity Fill Times Scan Times “shotgun sequencing”
2
MS/MS Spectrum Protein Database spectral matching
3
time “shotgun sequencing”
4
ms 1 ms 2 time “shotgun sequencing”
5
LTQ Orbitrap base peak chromatogram 37 min LC-MS/MS run-time 6186 MS/MS spectra 2308 peptide IDs (false-positive rate 1%) 287 protein IDs 6000 spectra x 10s/spectrum = 16 CPU hours Server single CPU search time 16 hours Server 20 nodes parallel CPUs 0.8 hours distributed spectral matching
6
XCorr: goodness of fit between theoretical b and y ions from peptides in the database dCn: fractional XCorr difference between the highest XCorr and next highest XCorr sequest yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994)
7
ms 1 ms 2 time 5000 - 25000 ms 2 spectra all ms 2 in LC run sequest
8
all ms 2 in LC run 1 dta all raw 501.000 (precursor m/z) +2 (charge state) ms2 array (all ms2 = 1 file) 1 ms2 = 1 file (all ms2 = ~10000 files) 2 dta 1001.500 (precursor m/z) +3 (charge state) ms2 array sequest
9
2 x 3,250,000 times3 x 3,250,000 times 10000 x 3,250,000 times all ms 2 in LC run 1 dta, 2 3 10000 dta 1000.000 +/- 1Da human ipi database 61236 proteins peptide mass: MSQVQVQVQNPSAALSGSQILNK digest to next peptide calculate peptide mass 2426.258812 compare with precursor not a candidate if cand., calc. theoretical spectrum correlate, score & return 3000.000 +/- 1Da 3,250,000 times sequest
10
yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994) theoretical “candidate” spectrumexperimental peptide spectrum correlation spectrum
11
yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994) correlation spectrum
12
yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994) correlation spectrum
13
yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994) correlation spectrum similarity scoring Xcorr score
14
Xcorr (cross-correlation) Dot product similarity scoring – cross-correlation vs dot product Dot product
15
human ipi database 61236 proteins >ipi00000001.2 MSQVQVQVQNPSAALSGSQILNKNQSLLSQ PLMSIPSTTSSLPSENAGRPIQNSALPSASITST SAAAESITPTVELNAL…. 1 st >ipi00853644.1 ….AKPNINLITGHLEEPMPNPIDEMTEEQKEY EAMKLVNMLDKLSREELLKPMGLKPDGTIT 61236 th 1200 +/- 1Da non-indexed searching
16
human ipi database 61236 proteins >ipi00001234.11 G 75 Da >ipi00853644.1 AKPNINLITGHLEEPMPNPIDEMTEEQEYEA MLVNMLDLSEELLKPMGLKPDGTITAKPNINL ITGHLEEPMPNPIDEMTEEQEYEAMLVNML DLSEELLKPMGLKPDGTIT 20245 Da indexed >ipi00344567.1 WEFGGHTVLR 1200 +/- 1Da indexed searching
17
scoring & analysis score/criterion frequency TP TN cutoff/threshold FN FP Score/Metric 1Score/Metric 2Score/Metric 3 Peptide A7.650.9997 Peptide B6.990.8797 Peptide C6.210.6597 Peptide D5.570.7196 Peptide E3.310.4450 Peptide F1.850.4141 sensitivity = TP TP + FN precision = TP TP + FP specificity = TN TN + FP accuracy = TP + TN TP + TN + FN + FP
18
The Results: Distinguishing Right from Wrong In large proteomics data sets (for which manual data inspection is impossible), how can we distinguish between correct and incorrect peptide assignments? Use “decoy” sequences to distract non-peptidic, non- uniquely matchable, or otherwise unmatchable spectra into a search space that is known a priori to be incorrect Use the frequency of “decoy” sequences among total sequences to estimate the overall frequency of wrong answers (False Positive Rate) Adjust filtering criteria to achieve a ~ 1% False Positive Rate
19
Decoy Sequences? A “Reversed” Database! We generate decoy sequences by reversing each protein sequence in a given database, such that the resultant in silico digest contains nonsense peptides, then append the reversed database to the end of the forward database Decoy references are labeled with # Database searching with SEQUEST occurs from top to bottom – when decoy references are found, there is an equal probability it could have also mapped to a non-decoy sequence. So our FPR is (# of decoys) x 2 / total matches. S E A R C H I N G
20
Forward database 1.MAGFA→ → →SHTRP Reversed database 1.PRTHS→ → →AFGAM Composite Database Sequest Right Wrong (random) F FR 50% 100% Filter (scoring, mass accuracy, etc) Generate final list Estimate FP rate from 2 x Rev (i.e., 4%) Known FP Unknown FP Target/Decoy Database Searching
21
Cn XCorr Forward Sequences Cn XCorr Forward + Reverse TPFP PSM number sequest scores: finding true positives XCorr
22
Precision of mass errors between observed and actual m/z LTQ Orbitrap & LTQ FT 0.1 ± 0.4 ppm LTQ FT (SIM) AGC target 50,000 to avoid space-charge effects Olsen et al. (2004) Mol. Cell. Proteomics 3, 608 -0.2 ± 1.0 ppm High Mass Accuracy Haas et al. (2006) Mol. Cell. Proteomics 5, 1326 Mass “Accuracy” in Proteomics: Performance is related to the width of the distribution, not the average error
23
MMA: True Positives and False Positives MMA0 True Positives False Positives TPFP PSM number False positives are distributed evenly across MMA space
24
MS/MS vs MMA: Precision vs Sensitivity MMA0 0 MS/MS criteria are strong precision filters – require TP / FP separation for sensitivity MMA criteria are weak precision filters – assists MS/MS criteria in improving sensitivity
25
Distracting Wrong from Right: MMA MMA0 True Positives False Positives True Positives False Positives MMA 0 Extended Search Space Search Space Filtered
26
Mass Accuracy: Another dimension of selectivity Cn XCorr Cn XCorr Forward Sequences Cn XCorr Forward + Reverse Tryptic Search +/- 2Da Cn XCorr Tryptic Search +/- 2Da 5ppm filter
27
Distracting Wrong from Right: Trypticity True Positives False Positives K/R-PeptideK/R- True Positives False Positives A-G-C-S-T-I-L-F-P-M-V-H-D-E-Y-W-Q-N- A-G-C-S-T-I-L-F-P-M-V-H-D-E-Y-W-Q-N- PeptideK/R- K/R-Peptide Filtered Tryptic Search Partial Enzyme Search
28
Phosphorylated Unphosphorylated XCorr dCn n = 286 What do we have here, hm? 0 0.2 0.4 0.6 0.8 1 02468 Reversed Hits
29
dCn (Phosphorylated) dCn (Unphosphorylated) Doubly Phosphorylated (n=79)Singly Phosphorylated (n=207) n = 286 Phosphopeptides: Chemically disadvantaged… XCorr (Unphosphorylated) XCorr (Phosphorylated) n = 286 0 2 4 6 8 02468 Dataset of phosphorylated and unphosphorylated peptide MS/MS pairs MSFEILR P
30
Doubly Phosphorylated Singly Phosphorylated XCorr (Ph/UnPh) 86% Phosphopeptides: Less power in XCorr & dCn Unphosphorylated 93% 0 0.5 1 1.5 2 dCn (Ph/UnPh) Unphosphorylated
31
Yeast Whole-Cell Lysate Red., Alkyl. SDS-PAGE 60-80 kDa Trypsin IMAC-purification Mass Accuracy: Can it help for phosphorylation?
32
-50500 Mass Accuracy: Rescuing phosphopeptides +2: 1.3 +3: 2.3 +2: 2.7 +3: 3.5 XCorr n=1390 LTQ TOP10 SEQUEST partial enzyme search, fully tryptic peptide spectral matches n=1311 MMA (ppm) Orbitrap TOP10 XCorr
33
LTQ Orbitrap 600 1.0% FP 1046 0.4% FP 74% increase Mission: Phosphopeptide rescue – accomplished! 715 1.0% FP No MMAMMA # of phosphopeptides
34
search algorithms & phosphorylation Bakalarski et al., Anal. Bioanal. Chem., 2007 sequest omssa 936 928 98
35
phosphorylation site localization GFDSNQpTWR or GFDpSNQTWR? Beausoleil et al., Nat. Biotechnol, 2006
36
phosphorylation site localization Beausoleil et al., Nat. Biotechnol, 2006
37
phosphorylation site localization Taus et al., JPR, 2011
38
phosphorylation localization rate (FLR) Chalkey & Clauser, MCP, 2012 Baker et al., MCP, 2011 use non-native phosphoacceptors as “decoys” Ser + Thr (human proteome): 14.1% Pro + Glu (human proteome): 14.5% allow search engine / localization assessment tools to consider pP and pE as true negative “decoys” calculate dataset FLR based on frequency of pP + pE “decoys”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.