NoDupe algorithm to detect and group similar mass spectra.

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,
Protein Quantitation II: Multiple Reaction Monitoring
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
1336 SW Bertha Blvd, Portland OR 97219
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Database Searches. Peptide mass fingerprinting digestMS Search HIT SCORE Protein X 1000 Protein Y 50 Protein Z 5 Protein X theoretical digestProtein Y.
Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Building and Using Libraries of Peptide Ion Fragmentation Spectra S.E. Stein, L.E. Kilpatrick, M. Mautner, P. Neta, J. Roth National Institute of Standards.
Theodore Alexandrov, Michael Becker, Sören Deininger, Günther Ernst, Liane Wehder, Markus Grasmair, Ferdinand von Eggeling, Herbert Thiele, and Peter Maass.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Spectral Counting. 2 Definition The total number of identified peptide sequences (peptide spectrum matches) for the protein, including those redundantly.
Proteomics Informatics Workshop Part III: Protein Quantitation
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
PROTEIN QUANTIFICATION AND PTM JUN SIN HSS.I. PROJECT 1.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
A Phospho-Peptide Spectrum Library for Improved Targeted Assays Barbara Frewen 1, Scott Peterman 1, John Sinclair 2, Claus Jorgensen 2, Amol Prakash 1,
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Temple University MASS SPECTROMETRY INTRODUCTION Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Overview of Mass Spectrometry
Separates charged atoms or molecules according to their mass-to-charge ratio Mass Spectrometry Frequently.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
DIA Method Design, Data Acquisition, and Assessment
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
A Database of Peak Annotations of Empirically Derived Mass Spectra
LC-MS/MS Identification of Impurities Present in Synthetic Peptide Drugs Dr Anna Meljon*, Dr Alan Thompson, Dr Osama Chahrour, and Dr John Malone Almac.
MassMatrix Search Results Explained
Protein Identification via Database searching
Agenda Welcome from the Skyline team!
EMCal Recalibration Check
Proteomics Informatics David Fenyő
Now, More Than Ever, Proteomics Needs Better Chromatography
Interpretation of Mass Spectra I
Proteomics Informatics –
Volume 24, Issue 13, Pages (July 2014)
Top-down protein identification.
2D-LC-MS/MS analysis of tryptic digest of HEK293-SUMO3 cells (2 μg inj
Shotgun Proteomics in Neuroscience
Processing of fragment ion information in DTA files to remove isotope ions and noise. Processing of fragment ion information in DTA files to remove isotope.
The Coming Age of Complete, Accurate, and Ubiquitous Proteomes
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Kuen-Pin Wu Institute of Information Science Academia Sinica
Operation manual of AI SIDA
Presentation transcript:

NoDupe algorithm to detect and group similar mass spectra.

Reducing the number of similar spectra in proteomic experiments: Why? Identifying peptides from spectral collections is time consuming. Detecting similarities reduces number of spectra to be processed. Dynamic exclusion feature of the mass spectrometer does not eliminate all duplicate spectra. a. Peptides may elute over a period of time b. Peptide mixture may have high complexity.

MS/MS spectra from the same peptide may look different Signal to noise ratio. Variations in collision energy. Random noise.

Finding degree of similarity between two spectra Dot product comparison used to find similarity. Vectors are built for each spectra. Greater angles imply greater differences between spectra. Angles nearing zero imply considerable similarity.

NoDupe Algorithm Created in Java programming language. Spectra are grouped on the based on their similarities. Preprocessing done to reduce complexity. Optionally removes duplicate spectra from each LC run retaining only one representative spectrum.

NoDupe: Preprocessing All fragment ions in a run are assigned to bins 1.0057 m/z ions wide. Intensities of succeeding peaks in the same bins are added. Intensities of peaks are normalized by the sum of intensities of all peaks. Smaller peaks are emphasized. Peaks of very low intensities are removed. Sum of square roots of the intensities is calculated. Only significant peaks are retained and the rest are discarded.

Results of preprocessing

NoDupe: Finding similarities Scans are sorted based on the precursor m/z. Spectral contrast angles are calculated for pairs of spectra within 3 m/z of each other. ia peak intensity of spectrum A ib peak intensity of spectrum B θ spectral contrast angle For identical spectra, θ = 0 For completely dissimilar spectra, θ = π / 2

Spectral contrast angles

Similarity angle cutoff is taken as 1.1

NoDupe: Selecting representative spectra Match count is for spectra is calculated. Duplicates are detected based on the match count. Ties are broken based on number of peaks removed during preprocessing.

Samples used Gel band sample: Protein complex from stable HEK 293 cell. Microtubule-Associated protein sample: MAP purified from bovine brains Rat hippocampus sample : protein from rat brains. Sample complexity varied from 18.3 to 34.6 spectra/min.

Experimental process LC separations were done for all three samples. 2to3 algorithm was applied to remove spectral copies with incorrect charge state assignments. They used NoDupe to reduce the number of spectra.

Observations Large number of peaks removed. For the peptide VAAPEEHPVLLTEAPLNPK, Approximately 70% of the peaks in the spectra were removed. number of peaks and relative standard deviation diminished. The relative standard deviation diminished from 26% to 20%.

Observations: Clusters Average cluster size among was found to be around 4. Spectral pairs were the most common kind of clusters. Two-thirds of the spectra were not significantly similar to any other spectra. High confidence peptides were lost when duplicate spectra were removed.

Identifications lost 4 to 14% of the identifications were lost. Without removing the duplicate spectra 5 to 19% of the identifications were lost. Angle is found to be 0.847.

For group size 2 Since there are only two spectra in this group, the most representative one is chosen. Scan 491 is chosen as only 21% of the peaks are remaining as opposed to 24%. Since pairs are common, there might be a significant loss of protein identifications.

Lost spectra Scan 4892 was not found to be similar enough by NoDupe.

Duplicate spectra and peptides identified

Where it can be used Grouping results in substantial savings in time. Instead of finding the best sequence for each spectrum, it will find the spectrum that best matches each of the spectra in a group. If the database is large, it is more effective in saving time. A narrower mass window can be used. Alleviates random matching. Spectral libraries will be more effective if they contain representative spectra than randomly chosen ones. Spectra that are in the same groups but receive different identifications by De Novo examination can be flagged.

Acknowledgments The paper presented was “Similarity among tandem mass spectra from proteomic experiments: detection, similarity and utility” David L.Tabb, Michael J.MacCoss, Christine C.Wu, Scott D.Anderson, and John R.Yates. Thanks to Prof. Haixu Tang for guiding me.