Download presentation
Published byKimberly Adele Terry Modified over 9 years ago
1
Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael) Tian CNCP 11/15/2012
2
What is mass? Monoisotopic mass (m/z, z=+1)
L. C. Dias, et al. J. Org. Chem. 2012, 77, 4046.
3
(13C/12C ratio’s variability)
Missing monoisotopic mass in protein Monoisotopic mass : most significant & accurate Mass of the most abundant isotope Error: ±1 Da or more (mis-assignment of # of contributing heavy isotopes ) Average mass: Error: ±1 u at 16,000 u (13C/12C ratio’s variability) Monoisotopic mass (12C, 1H, 14N, 16O, 32S) Average mass (average of isotopic peak masses weighted by abundance) The increased probability for multiple heavy isotopes as the mass of a molecule increases causes a decrease in the relative abundance of the monoisotopic peak. The observation of the monoisotopic peak is unlikely for molecules larger than 15 KDa.
4
Deisotoping (Deconvolution)
Algorithms: AID-MS, ESI-ISOCONV, LASSO, MapQuant, MasSPIKE, MATCHING, msInspect, Peplist, quadratic deisotoping, RAPID, THRASH, Wang’s method, Zhang’s program, and ZSCORE Steps: Calculate background noise level Determine charge state using FT/Patterson technique Calculate theoretical profile Fit with observed isotopic profile Monoisotopic mass Search Engines: ProSightPC, SEQUEST, Mascot, X!Tandem, InsPecT, OMSSA, Andromeda, pFind 2. C. D. Wenger, M. T. Boyne, J. T. Ferguson, D. E. Robinson, N. L. Kelleher, Versatile Online-Offline Engine for Automated Acquisition of High-Resolution Tandem Mass Spectra. Anal Chem 80, 8055 (Nov 1, 2008). 3. J. K. Eng, A. L. Mccormack, J. R. Yates, An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino-Acid-Sequences in a Protein Database. J Am Soc Mass Spectr 5, 976 (Nov, 1994). 4. D. N. Perkins, D. J. C. Pappin, D. M. Creasy, J. S. Cottrell, Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551 (Dec, 1999). 5. S. Tanner et al., InsPecT: Identification of posttransiationally modified peptides from tandem mass spectra. Anal Chem 77, 4626 (Jul 15, 2005). 6. L. Y. Geer et al., Open mass spectrometry search algorithm. J Proteome Res 3, 958 (Sep-Oct, 2004). 7. J. Cox et al., Andromeda: A Peptide Search Engine Integrated into the MaxQuant Environment. J Proteome Res 10, 1794 (Apr, 2011). 8. D. Q. Li et al., pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry. Bioinformatics 21, 3049 (Jul 1, 2005).
5
Peptide Mass Fingerprinting (PMF)
Protein Database RAW File Input MS Spectrum (iE) MS/MS Spectra (iE) A1/P1 A1/P2 A2/P3 Search Engine Parent (Theo. mass) Parent (Exp. mass) A2/P4 Fragments (Theo. mass) Fragments (Exp. mass) Candidates Output Final IDs Initial IDs
6
Ubiquitin - MS spectrum (profile)
7
Ubiquitin – MS/MS (ETD) Spectrum (Profile)
8
Database search with PMF using ProSightPC
NMFs = 92 NUMFs = 219 P score = 4.86E-98
9
Definition of P_Score f - the total number of observed fragments (NMFs + NUMFs); n - the number of matching fragments (NMFs). x - the mean probability that a mass of an observed fragment ion will randomly match one from a generic protein the mass of the average amino acid, weighted for its occurrence in proteins; 2 - the number of fragment ions generated from each bond cleavage, which is assumed to be 2 (b- and y-type ions or c-and z•-type ions); Ma - the mass accuracy (a Ma of ±1 Da translates to a 2 Da window). Neil L. Kelleher, et al. Nat. Biotechnol. 2001, 19, 952
10
Is “MFs” really good? ?
11
Is “NUMFs” really good? RAPID (28+49=77) THRASH (92+219=311)
PeakPicking: SNRThreshold = 3.0 BackgroundRatio = 5.0 FitType = Lorentzian DeconvPep: MaxCharge = 25 ThScore = 0.0 AdvDeconv: MaxAbundancePeak = 3 ScanNoModifier = 0 MaxMissPeak = 3 MassErr = 1.0E-05 ThClustExt = 0.0 IntsRangeErr = 0.5 Better “deisotoping”? NO “deisotoping”?
12
What is a mass spectrum? MS of Ubiquitin
13
The nature of the iE of an ion
x, y coordinates Profile Exp. m/z Abundance 6061 21811 52841 82342 93523 96019 75857 60680 42420 27294 14752 5685 1120 919 316 147 Centroid
14
What are in a protein database?
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG x, y coordinates Exp. m/z Abundance 3.95 18.83 45.88 76.13 96.65 100.00 87.76 67.12 45.63 27.99 15.67 8.09 3.87 1.73 0.73 0.29 C378H630N105O118S1 Centroid
15
iMEF(isotopic m/z & Envelope Fingerprinting)
Protein Database RAW File Input A2/P3 A2/P4 Parent (Theo. mass) Fragments Parent (Theo. iE) Fragments A/P1 A/P2 MS Spectrum (iE) MS/MS Spectra (iE) A1/P1 Parent (Exp. mass) Fragments A1/P2 Search Candidates Output Final IDs Initial IDs
16
Top-down Screening – MS/MS2 ( Targeted Screening - MS2)
1st isotopic peak DB A1/F1 Parent ion exp. iE Parent ion theo. iE A2 F2 Protein candidates Fragment ion exp. iEs Fragment ion theo. iEs A2/F3 Preliminary protein IDs 2nd isotopic peak Y 3rd isotopic peak Initial protein ID NMFs PTM_Scores Initial protein IDs Final IDs Remove duplicates Isotopic peak exclusion list Norm. isotopic peaks removed N Combined initial protein IDs Preliminary protein candidates N Top-down Screening – MS/MS2 ( Targeted Screening - MS2) N iMEF = iMF (A1) + iEF (A2) Y Y Y N
17
Pre-Step 1: Customized database
MS Precursor ions MS/MS fragment ions
18
Pre-Step 2: Noise level determination
19
Ubiquitin - MS spectrum (profile)
20
Ubiquitin – MS/MS (HCD) spectrum (profile)
21
Step 1: Profile to centroid (MS & MS2)
22
isolation window (±3 m/z units)
Step 2: iMF of precursor ion candidates (4 ppm) Top-down Screening IPMD 15 ppm isolation window (±3 m/z units) … … … … … …
23
Step 3: iEF of precursor ion candidates
IPACO 5% IPMD 15ppm IPAD 30%
24
Targeted Screening IPMD 10 ppm
Step 4: iMF of fragment ion candidates Targeted Screening IPMD 10 ppm (5 ppm) C1;MAX_MZ= &C2;MAX_MZ= &C3;MAX_MZ= &C4;MAX_MZ= &C5;MAX_MZ= &C6;MAX_MZ= &C7;…
25
Step 5: iEF of fragment ion candidates
IPACO 5% IPMD 10ppm IPAD 50%
26
Exemplary PTM_Score assignment
Human histone H4_S1acK16acK20me2
27
IPMDO=20, IPMDOM=30, IPADO=20, IPADOM=200
ID of ubiquitin from ETD NMFs = 91 IPACO=10, IPMD=15, IPAD=100 IPMDO=20, IPMDOM=30, IPADO=20, IPADOM=200 NMFs vs. IPACO NMFs vs. IPMD NMFs vs. IPAD
28
Pros and Cons Pros: As-strict-as-you-choose confidence
Strict quality control (QC) Fine discrimination of close iEs In-situ unwrapping of overlapped iEs Cons: More complex and bigger database More data points for fingerprinting
29
Pros: As-strict-as-you-choose confidence
Comparison with ProSightPC
30
Layman’s choice of parameters
Default values with statistical significance!
31
Pros: Fine discrimination of close iEs
b b or b (b6-22-H2O)3+ Exp. m/z Theo. m/z IPMD 16 11 -3 13 8 -6 18 -1
32
Pros: In-situ unwrapping of overlapped iEs
The abundance of an overlapped isotopic peak is divided into individual overlapped isotopic envelopes according to the calculated proportional abundance using the experimental abundance and theoretical relative abundance ratios Proportional partition k: # of overlapped isotopic peaks m: # of isotopic peak in each iE n: # of overlapped iEs
33
Other improvements and utilities
Bi-section method for fast indexing of candidates LASSO-like approach to untangle overlapped iEs Additional utilities: A comprehensive confidence score False discovery rate (FDR) Customized ion types to look for new dissociation channels Customized MODs for the search of new modification or labeled proteins MS/MS spectrum annotation with matching fragments
34
Conclusions An as-confident-as-you-choose protein database search algorithm, iMEF, has been created and implemented in the search engine ProteinGoggle The principle of iMEF with ProteinGoggle is demonstrated with identification of ubiquitin from its tandem mass spectrum using ETD iMEF as implemented in ProteinGoggle has been able to unwrap complex overlapping isotopic envelopes and confidently provide embedded fragment ions iMEF could be adapted for peptide and glycan database search with customized databases
35
Acknowledgements DNL2003 Li Li Bo Wang Jing Li Xu Zhao
The KENES. Co. Ltd. Miao Zhou Shijin Liu Bin Yang Funding: DICP “Research Start” China “Youth 1000-talents Theme”
36
Thank you very much!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.