Download presentation
Presentation is loading. Please wait.
Published byDorcas Terry Modified over 9 years ago
1
Analysis of tandem mass spectra - I Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology
2
Outline Tandem mass spectrometry overview Peptide identification –De novo –Peptide database search –Spectrum database search Score calibration Statistical confidence estimation
3
Mass spectrometry background
4
EAMPK GDIFYPGYCPDVK LPLENENQGK ASVYNSFVSNGVK YVMTFK ENQGVVNR
5
Peptide fragmentation
6
Peptide fragmentation spectrum m/z Intensity VVVTGLGMLSPVGNTVESTWK +2 1304.4 +1 888.14 +1
8
EAMPK EAMPK? Our first goal is to identify each spectrum
9
Peptide identification
10
Two approaches Frank et al. JPR. 2006.
11
The peptide can be inferred using pairs of nearby peaks IYEVEGMR
12
The spectrum graph considers all possible peptides Frank et al. JPR. 2006.
13
What are the pros and cons of the de novo approach? +Dynamic programming can quickly find peptides that fit a given spectrum graph. -Many real spectra don’t yield a very good spectrum graph. +It is possible to match against a noisy spectrum by allowing missing peaks. -If you allow lots of missing peaks, then the DP is slow and leads to many false positive matches. -If instead you search a database, you limit the number of possible false positives. -In practice, the de novo approach does not identify many spectra. +The de novo approach is the only one that can find previously unknown peptides.
14
The database search approach Nesvizhskii et al. Nature Methods. 2007.
15
SEQUEST theoretical spectrum y-ion, charge 1b-ion, charge 2 flanks H 2 O loss NH 3 loss a-ion
16
16 Theoretical Peaks Observed Peaks
17
SEQUEST cross-correlation score Define R i as the scalar product of the two spectra, with one offset by i. The score, called XCorr, is R 0 minus the average R i for i in -75, …, 75.
18
Spectrum Peptide database Spectrum comparison function Theoretical spectrum generator
19
The X!Tandem hyperscore Number of matched b-ions Number of matched y-ions Boolean: Is the peak at m/z value i a b- or y- ion? Intensity of the peak at m/z value i
20
A third approach to identifying spectra 1.De novo identification 2.Search against a database of theoretical spectra 3.Search against a database of previously observed, identified spectra
21
Spectrum Identification Database: fasta file … SEQUEST Peptide ID list >MEKK1 (kinase) MDRILARMKKSTRRGGDKNIT PVRRLERR… >ATMKK5 (kinase kinase) MKPIQSPSGVASPMKNRLRK RPDLSPPLPHRDVALAVLP… MS/MS query spectra ID proteins from peptides… Scan1 0.7 EGSSDEEVP… Scan1 0.3 TFAEILNPI… Scan1 0.2 ARFDLNNHD… ------------------- Scan2 0.5 EDEESIRAV… Scan2 0.2 WLGDDCFMV… Scan2 0.1 IDRAAWKAV… ------------------- Scan3 0.2 EITTRDMGN… Scan3 0.1 GRNMCTAKL… BiblioSpec 3 NGISLTIVR 3 QWDKEPPR 2 FMACSDEK 1 CGCCLYNT 2 GDTIENFK Library of identified spectra
22
Spectrum Identification SEQUEST Peptide ID list MS/MS query spectra Scan1 0.7 EGSSDEEVP… Scan1 0.3 TFAEILNPI… Scan1 0.2 ARFDLNNHD… ------------------- Scan2 0.5 EDEESIRAV… Scan2 0.2 WLGDDCFMV… Scan2 0.1 IDRAAWKAV… ------------------- Scan3 0.2 EITTRDMGN… Scan3 0.1 GRNMCTAKL… BiblioSpec 3 NGISLTIVR 3 QWDKEPPR 2 FMACSDEK 1 CGCCLYNT 2 GDTIENFK Library of identified spectra 765.1 940.4 593.9 300.4 522.3 m/z 594.2 score = 0.2
23
What are the pros and cons of library searching? +Because the spectrum library contains peak intensity information, matching can be done accurately. +Library searching is faster than database searching. ̶ Library searching can only identify peptides that have been previously identified.
24
Database search tools Greylag MASCOT MS-Tag (ProteinProspector) Massmatrix MyriMatch OMSSA Olav Pepfrag (Prowl) Pepprobe Pepsplice Pfind Phenyx ProLuCID (YADA) ProbID RAId_DbS SEQUEST SpectrumMill VEMS X!Tandem Landscape of Peptide Identification Software Spectral matching tools Bibliospec SpectraST X! P3 De novo sequencing tools Lutefisk PEAKS PepNovo Sequit Sequence tag/hybrid approaches ByOnic/LookupPeaks GutenTag Inspect Paragon Popitam Others Proteinlynx SonarMS/MS (knexus) Xproteo
25
Score calibration
26
Searching many spectra yields a set of peptide-spectrum matches PSMs Spectra Peptides
27
Identifying spectra requires solving two distinct problems well PSMs Task 1: Ranking candidate peptides with respect to a single spectrum Task 2: Ranking PSMs with respect to one another
28
Different spectra yield different score distributions
30
Different charge states yield different score distributions
31
Estimating a p-value can improve calibration The probability of observing a score >4 is the area under the curve to the right of 4.
32
XCorr scores fit a Weibull distribution (Klammer J Proteome Research 2009)
33
X!Tandem and Comet use a log linear fit XCorr log(count) XCorr of 5 corresponds to count of 10 -5 (Eng J Proteome Research 2009)
34
Calibration improves statistical power to identify spectra Calibrated Uncalibrated FDR threshold Identified spectra
35
Statistical confidence estimation Elias & Gygi Nat Biotech 2007
36
Our second goal is to identify a set of spectra at a given false discovery rate
37
Spectrum identification must account for two types of multiple testing MSEDEIER VDPSSWFNN CSSSTEAEQR CIVGLTK QFIDFSTVFQP ISLSGK ALNDVGK Minimum p-value False discovery rate
38
Decoy peptides can be used to estimate FDR. MSEDEIER ISLSGK CSSSTEAEQR CIVGLTK ALNDVGK Search Decoy Target MSEDEIER 2.2 ISLSGK 1.6 CSSSTEAEQR 1.9 CIVGLTK 2.8 ALNDVGK 2.7 VDPSSWFNN 1.2 QFIDFSTVFQP 1.7 FDR = 1/4 = 25% Elias & Gygi Nat Biotech 2007
39
stage 1square root XCorr can be calculated as a dot product between two spectra Observed spectrum stage 2normalize regions stage 3cross-correlation pre-processing Theoretical spectrum VNIQEELGK for each peptide bond:b ion y ion neutral losses dot product XCorr score (Eng J Proteome Research 2008)
40
XCorr can be refactored to make the theoretical spectrum binary stage 3 observed spectrumtheoretical spectrum fingerprint of b / y / neutral losses centered at m i = 347 vector of cleavage evidence sum of evidence for cleavage at m i = 347 VNIQEELGK binary markers of backbone cleavage dot product refactored Xcorr score (Howbert Molecular & Cellular Proteomics 2014)
41
The refactored XCorr is very similar to the original XCorr ρ=0.995
42
Distribution of XCorr scores can be computed using dynamic programming
43
Each column holds a score distribution for all peptides with a given mass O( m ( s max – s min ) | AA | ) ~ 1 sec for m = 1500
44
Calibration removes charge state dependency of scores
45
XCorr p-values must be corrected for multiple testing Sidak correction is similar to Bonferroni Accounts for fact that database search considers many peptides for each spectrum
46
The resulting p-values are distributed uniformly log(rank p-value) log(p-value) p-value Frequency
47
Exact calibration improves statistical power to identify spectra yeast-01 MS1 / MS2 low resolution
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.