A Reference Library of Peptide Ion Fragmentation Spectra: Yeast S.E. Stein, L.E. Kilpatrick, P. Neta, Q.L. Pu, J. Roth, X. Yang National Institute of Standards.

Slides:



Advertisements
Similar presentations
Protein Quantitation II: Multiple Reaction Monitoring
Advertisements

In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
1336 SW Bertha Blvd, Portland OR 97219
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
Smart Templates for Chemical Identification in GCxGC-MS QingPing Tao 1, Stephen E. Reichenbach 2, Mingtian Ni 3, Arvind Visvanathan 2, Michael Kok 2, Luke.
Building and Using Libraries of Peptide Ion Fragmentation Spectra S.E. Stein, L.E. Kilpatrick, M. Mautner, P. Neta, J. Roth National Institute of Standards.
ProReP - Protein Results Parser v3.0©
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Scaffold Download free viewer:
My contact details and information about submitting samples for MS
Facts and Fallacies about de Novo Sequencing & Database Search.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics Workshop Part III: Protein Quantitation
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Organic Mass Spectrometry
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Temple University MASS SPECTROMETRY FURTHER INVESTIGATIONS Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
A Reference Library of Peptide Ion Fragmentation Spectra Stephen Stein 1 ; Lisa Kilpatrick 2 ; Pedatsur Neta 1 ; Jeri Roth 1 ; Xiaoyu Yang 1 National Institute.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Overview of Mass Spectrometry
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
The observed and theoretical peptide sequence information Cal.MassObserved. Mass ±da±ppmStart Sequence EndSequenceIon Score C.I%modification FLPVNEK.
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
Material Measurement Lab Material Measurement Laboratory Mass Spectrometry Data Center Biomolecular Measurement Division Q. Dong; M. Lorna A. De Leoz;
Geranyl acetate C12H20O2. Mass Spectral Libraries An Ever-Expanding Resource for Chemical Identification Steve Stein Mass Spectrometry Data Center National.
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Constructing high resolution consensus spectra for a peptide library
Protein quantitation I: Overview (Week 5). Fractionation Digestion LC-MS Lysis MS Sample i Protein j Peptide k Proteomic Bioinformatics – Quantitation.
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Finding the unexpected in SWATH™ Data Sets – Implications for Protein Quantification Ron Bonner; Stephen Tate; Adam Lau AB SCIEX, 71 Four Valley Drive,
MS Libraries for Forensics: DART-MS and GC-MS
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
A Database of Peak Annotations of Empirically Derived Mass Spectra
MassMatrix Search Results Explained
Creation of assays using repositories
Bioinformatics Solutions Inc.
Presentation Title NEMC 2018 Dale Walker, Bruce Quimby Agilent
Proteomics Informatics David Fenyő
Metabolomics: Preanalytical Variables
A perspective on proteomics in cell biology
Proteomics Informatics –
NoDupe algorithm to detect and group similar mass spectra.
Volume 24, Issue 13, Pages (July 2014)
Shotgun Proteomics in Neuroscience
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Presentation transcript:

A Reference Library of Peptide Ion Fragmentation Spectra: Yeast S.E. Stein, L.E. Kilpatrick, P. Neta, Q.L. Pu, J. Roth, X. Yang National Institute of Standards and Technology, Gaithersburg, MD/Charleston, SC Rationale: Identifying peptides by matching their MS/MS spectra to reference spectra can be faster, more reliable and more informative than current sequence-based methods. The development and testing of a library for Baker’s Yeast is reported here. Yeast was selected for the first library largely because of the wide availability of its tryptic digest MS/MS data. Steps: 1)Acquire and organize ‘Shotgun’ proteomics data files from diverse sources 2)Identify peptides with available sequence search engines 3)Create a ‘consensus spectrum’ from all replicate spectra and find best single spectrum for each peptide ion 4)Derive reliability measures for each spectrum 5)Remove ambiguities and build library Applications Direct peptide Identification Search all spectra against library – sensitive, reliable, fast, comprehensive Confirmation Confirm/reject peptides identified by sequence search programs – compare to reference spectra Build library of recurring, unidentified spectra Link peptides between runs for later processing and identification 1) Sources of Yeast Spectra 2503 LC-MS/MS Data Files from 12 Labs Online repositories PeptideAtlas Open Proteomics Database Collaborators/Contributors ISB –PeptideAtlas (Deutsch/King/Aebersold/…) NIH/LNT (Markey/Geer/Kowalak…) Blueprint Initiative (Hogue) Steven Gygi (Harvard) Brian Haynes (U. Arizona) Ron Beavis (the GPM) NIST Test Measurements 4) Measures of Reliability A) Spectrum/Sequence Consistency Match theoretical spectrum, based on relative dissociation rates of adjacent amino acids (from statistical analysis of reliable spectra). Discrimination shown at right Fraction of unassigned abundance (peaks not originating from a known fragmentation path) Y/B ion continuity and Y/B correlation 3) Create Consensus Spectrum and Find Best Replicate Spectrum For all spectra matching a given peptide ion, a multi-step process aligns m/z peaks, rejects outliers and creates a consensus spectrum. It also finds the best replicate spectrum based on search engine scores and spectrum quality. A peak in a consensus spectrum must be present in a majority of the spectra that might have generated the peak. Spectrum Matching Algorithms Algorithms: Spectrum similarity scores have been adapted from algorithms used for electron ionization spectra. Peaks are weighted by their significance: - Reduce significance of common impurity ions (e.g., neutral loss from parent ion) - Reduce weight for uncertain and isotopic peaks - Use library spectrum reliability Speed: Straightforward indexing leads to very fast identification (<< sec) even for very large libraries. Robustness: Spectrum match scores are less sensitive to spectral details than sequence scores (see Figure). Half of all spectra fall within quartile lines. Derived from reliably identified ion trap spectra used in building the Yeast library Above S/N of 40, spectra are quite reproducible (0.7 or higher dot product). Consensus Spectra: When multiple good quality spectra (replicates) are available, a composite spectrum is derived. This rejects spurious peaks and averages peak abundances. Outlier spectra are rejected by clustering. Signal/noise considerations are used for peak selection and abundance weighting. Best replicate and ‘singular’ spectra are also included for reliable identifications. Reproducibility: For a MS/MS spectrum, abundance variations depend primarily on the signal/noise ratio, defined here as: S/N = maximum abundance / median abundance At low S/N, abundances are more variable, peaks can disappear and impurity peaks are more prominent. The strong dependence of spectrum reproducibility on S/N is shown below, where the similarity of spectra to their consensus spectrum are measured by the ‘dot product’. Library Searching ‘Sequence’ Library Searching: Current peptide identification methods match each measured MS/MS spectrum against ‘theoretical’ spectra of all possible peptide sequences. Since relative abundances, neutral losses from parent and product ions, and ratios of products having different charge states are not predictable, this rich, peptide- specific information is not effectively used for establishing identity (see below). Also, prior occurrence information is ignored – each search identifies the peptide as if for the first time. HPYFYAPELLYYANK/2+ Sequence Spectrum MS/MS Spectrum Spectrum 1 Spectrum 2 Consensus R.AALPEDVNAPSGEAA.- /2+ impurity Match to Theoretical Spectrum at Fixed Sequence Score Dot x 100 Goal: To create comprehensive, annotated libraries of MS/MS spectra of peptides from widely studied organisms. Include traceability and reliability of each spectrum along with implemented and tested search algorithms needed to integrate with existing data analysis systems. thresholds Probably wrong Probably right Probably right Confirmed 2) Identify Peptides Different search engines often give very different scores for matching a given peptide ion with a single spectrum (figure right). To capture the largest number of identifications, the highest score of four different search engines was used. This increased the number of reliable identifications by 25% over use of any single method. B) Peptide Sequence Confirmation Other peptide ions with same sequence (different charge state or modification) Sequence contained in (or contains) another peptide Number of peptides per protein / protein length ‘Spectrum’ Library Searching: Matching full spectra is routine for volatile compound identification by GC/MS. This method matches the entire profile. It is not sensitive to small (hard-to-detect) peaks and more informative peaks can be weighted. 5) Build Yeast Library Create annotated spectra for consensus and best matching single spectra. Resolve problems of similar spectra generating multiple peptides (homologies, small peptides,..). Spectrum vs. Sequence Library Search Results at right show 3-4 times as many spectra identified by searching against spectra than against sequence. Many are from spectra with insufficient numbers of theoretical peaks. Test Set: Yeast analysis files from the Open Proteomics Database (OPD40, 12 LC-MS/MS runs). Spectrum Library: Consensus spectra in current yeast library. Radiodurans library for false ids. Sequence Library: Search forward and reverse yeast library using relative homology scores. Search Speed: Spectrum searching was about 100 times faster than sequence searching. MS/MS Library Search Identification: Problems & Remedies Peptide and Background Mixture Spectra Problem: Isobaric impurities can appear throughout an LC run. If sufficient peptide signals are observed for a high sequence score, a peptide consensus spectra can be produced whose major peaks resemble background spectra. This can generate erroneous ids. Remedy: Exclude spectra with high fraction of unexplained peaks (>40%) and wide variation in retention. Low Information Content Spectra Problem: Few peaks can match with high similarity scores, but have little information content Remedy: Apply correction for peptide length and total signal. Fold in OMSSA sequence match score Homology Matches Problem: The large numbers of spectra processed generates some homologous sequences (part of the sequence is correct). Remedy: Develop corrections based on peptide type and unassigned abundance. Compare all spectra in library having similar spectra and reject homologs. Incomplete Coverage Problem: Observed peptides depend on digestion/MS conditions – many more are needed than generated in a single experiment. 15% new peptide ions found in yeast experiment at NIST. Remedy: Add more contributed data and data measured at NIST. This also improves overall spectrum quality Planned Release Date: June 2006 Three formats: 1)Simple ASCII ‘msp’ format (EI MS Library) 2)Using NIST Search Software (Windows) 3)Dynamic Link Library (Source & Binary) Contact: Under Development Human Peptide Library Add sequence scoring component (OMSSA) Libraries of selected peptides and protein digests Deep peptide mining (PTM, uncommon charge states, de Novo, …) Collision cell (Q-TOF) spectra – build library and optimize search algorithms Building the Library Searching the Library Introduction C) Peptide Class (for setting acceptance threshold) Tryptic or SemiTryptic SemiTryptic – In source or Unexpected SemiTryptic – Confirmed or Unconfirmed Missed Cleavages: None or Explained, or Unexplained Missed Cleavages: Confirmed (found contained peptide) or Unconfirmed Peptide Class# Peptides Consensus41,288 Singular (one ID)4,513 Simple Tryptic30,590 Tryptic Missed Cleavage6,979 Semi Tryptic3, , , ,367 ICAT19,389 Small Missing Peaks Can Have A Big Effect on Sequence Scores Sequence Search Score Peptide Library Browser of Annotated Yeast Spectra (software from NIST/EPA/NIH Mass Spectral Library of EI Spectra) Target identification Sensitive detection of internal standards, biomarkers, target proteins Mixture analysis Subtract a component from mixture spectrum Source of reusable information Difficult-to-identify, unusual, manually identified, special meaning NIST/EPA/NIH Library Electron Ionization Spectra 163K Compounds Volatile Compound ID by GC/MS. With adjustments for instrument class, this method should be equally applicable to peptide identification.

Goal Overview Background –+S/N –+Consensus Spectra –+Spectrum Variability –Completeness –Library Searching Library Building –Sources – list of contributors –Multiple SE – show scatter diagram –Quality Control Individual Spectrum features Relative spectrum features (use rest of lib) –Common sequence (charge, mod, missed and semitryptic cleavage –# peps in protein –Classes of Peptides – table with thresholds –False IDs and Low Quality Spectra Random match errors – handle with reverse and spectrum modeling Impurities – consensus spectra, set threshold Low S/N – dropped peaks in consensus – use best of replicates One hit wonders – accept singular spectra Homology matches – selected protein digests to set thresholds Library Search Algorithms –Show equation with corrections Performance testing/optimization Library Search Concerns –Peptide length and spectrum complexity dependences –Propagation of errors – false ids of common impurities Special Features –Unusual peptides –Biomarkers Plans Acknowledgements

Spectrum/Sequence Scores Vary More than Spectrum/Spectrum Scores Sequence score Small Missing Peaks Can Have A Big Effect on Sequence Scores