Download presentation
Presentation is loading. Please wait.
Published byGriselda McCormick Modified over 9 years ago
1
MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass Spectrometry Data Center
2
Library searching in not new Organize for Reuse
3
MS Library Searching Hertz, Hites and Biemann Anal. Chem. (1971). PBM: McLafferty, Hertel, Villwock Org. Mass Spectrom. (1974). SISCOM: Damen, Henneberg, Weimann, Anal. Chem. Acta (1978). INCOS: Sokolow, Karnofsky, Gustafson, Finnigan Application Report 2 (March 1978). Stein, Scott J. Amer. Soc. Mass Spectrom., (1994).
4
‘Dot Product’ (cosine of ‘angle’ between a pair of spectra) Measured = f(m/z abundance) Reference = f(m/z abundance) f(abundance) : Weight as you like Sum over all peaks in common Normalize
5
Traditional GC/MS Library Search
6
Variability Depends on S/N ~7,000 Radiodurans Peptides, LCQ (PNNL/NCRR) Medians
7
Library Searching for Peptides LIBQUEST (Yates) –Yates et al, Anal. Chem., 1998, 70, 3557 X!Hunter (Beavis) –Craig et al, J. Proteome Res., 2006, 5, 1843 BiblioSpec (MacCoss) –Frewen et al., Anal. Chem. 2006, 78, 5678 Spectral Comparison (Kearney) –Liu et al, Proteome Science 2007, 5:3 SpectraST (Aebersold) –Lam et al., Proteomics 2007 6, 655-667 NIST Peptide Ion Fragmentation Library –June 2006 release (US-HUPO – March 2004)
9
Why Spectrum Libraries? More sensitive Better scoring Faster Annotation Unrestricted precursor ion
10
Identification by Spectrum Matching is More Sensitive than by Spectrum/Sequence Matching Simple Protein Mix
11
Spectrum/Spectrum Scores are More Robust than Sequence/Spectrum Scores Sequence score 99% Confidence
12
0.005/s vs. 6.2/s per query spectrum Matching Spectra is Faster than Matching Sequence
13
Reference Library Building Extract identified spectra from sequence search –Multiple search engines –Instrument-class specific Create ‘consensus’ spectra –Two or more matching spectra, also save best Assign probability of being correct –Refine confidence starting from decoy FDR –Classify peptides – tryptic, missed cleavage, semi, mods Create searchable spectral library –Resolve conflicts, add annotation
14
Three Classes of Libraries I. Conventional Target Identification –Peptides (Proteins) II. Identifiable –By unconventional searching III. Not Identifiable –Account for all recurring spectra –QA/QC
15
I. OMSSA overlap with MS/MS Library Search 747 1350 353 34K 6/06 318 1752 833 78K 6/07 Identified spectra (1% FDR) for 1-D Yeast NCI/CPTAC – Vanderbilt
17
II. Identify What we Can Derive Class-specific FDR Tryptic –Simple –Expected missed cleavages –Unexpected missed cleavages Semitryptic (cleaved tryptic) –No missed cleavage In source (with parent at same retention) In sample –Missed cleavage In source (with parent) In sample (obey rules) Uncommon – reject Others …
18
Atypical Peptide Ions use Sequence Search Method Tryptic only with many mods Less common: Methylation, Phosphorylation, … Artifacts: Na, K, Carbamyl InsPecT/Pevzner (Unidentified, +70) High charge states, >2 missed cleavages Use class specific score thresholds
19
HSA/Fibrinogen/Transferrin Mix 6124 Consensus Peptide Spectra, IT, Qtof, TofTof Ion Trap Peptide Ions: 1300 HSA, 1100 Fibrinogen, 700 Transferrin
20
contiguous = tryptic, exploded = semitryptic
21
III. Library of Recurring, Unidentified Spectra Create consensus spectra –From similar spectra from an experiment Combine from multiple experiments Identify spectra in other experiments –QA/QC: Artifacts, in standards, … –Apply other sequencing methods
22
Assign all Spectra Identified Spectrum –Matches library peptide or unidentified spectrum –Subset of peaks match library spectrum (impure) –Similar to a matched spectrum (cluster) Not a Peptide –Low S/N Maximum/Median <15 –High charge state (many large peaks) Proteins, large fragments, … –One dominant peak Stable ion, not peptide –Singly charged (high/low abund < 1.2) Probable artifact, lower probability of identification –Narrow m/z range Peptide?
23
exploded = identified, contiguous = unidentified
25
Library Pipeline of the Future assigned No ID Pep. Lib Unass. Lib unassigned No ID Garbage filter Sequence Search, De Novo, Theoretical Spec, Similarity,... No ID assigned Mass spectrometer
26
NCI/NIH - CPTAC: Clinical Proteomic Technology Assessment for Cancer http://proteomics.cancer.gov Technology assessment; develop standard protocols and clinical reference sets; and evaluate methods to ensure data reproducibility. Broad Institute of MIT and Harvard, Memorial Sloan-Kettering Cancer Center, Purdue University, University of California, San Francisco,, and Vanderbilt University School of Medicine. NCI grants (U24CA126476-01, U24CA126485-01, U24CA126480-01, U24CA126477-01, and U24CA126479-01).
28
Run-to-Run Chromatographic Reproducibility
29
Broad Orbitrap Vandy Orbitrap NYU Orbitrap INCAPS LTQ NIST LTQ Vandy LTQ Purdue LTQ YICENQDSISSK Lab-to-Lab Chromatography
30
HSA_CAM_SigmaA9511_5H_8MS2_m2_10de_040406_05
31
Measures of Reproducibility Identified ions –Unique peptides, Ions, Spectrum counts Unidentified components –Classify by type, link to origin Ion cluster analysis –MS1 linked to MS2 Chromatography –Time evolution of ion clusters
32
Ion Component Analysis
33
Ion Component Analysis (Yeast)
34
Components in Replicate Runs total sampled identified ▲▼ run 1,2 ■ in both
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.