A Database of Peak Annotations of Empirically Derived Mass Spectra

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,
Protein Sequencing and Identification by Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
My contact details and information about submitting samples for MS
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Illinois Bio-Grid Grid Computing The Illinois Bio-Grid Alexander B. Schilling, Ph.D. University of Chicago Proteomics Core Lab
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Temple University MASS SPECTROMETRY FURTHER INVESTIGATIONS Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
Mass Analyst Analysis of MS(MS) data. Short function overview: Load mzXML data (ms-ms data) Load pepXML and/or mascot data (found proteins/peptides after.
A Reference Library of Peptide Ion Fragmentation Spectra Stephen Stein 1 ; Lisa Kilpatrick 2 ; Pedatsur Neta 1 ; Jeri Roth 1 ; Xiaoyu Yang 1 National Institute.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Constructing high resolution consensus spectra for a peptide library
B Monoisotopic mass of neutral peptide M r (calc): Fixed modifications: Carbamidomethyl Ions score: 45 † Expect: ‡ Matches (red): 18/50.
10/30/2013BCHB Edwards Project/Review BCHB Lecture 17.
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
Computational Biology
Mass Spectrometry 101 (continued) Hackert - CH 370 / 387D
Mass spectrometry data enhancement software
‘Protein sequencing’: Determining protein sequences
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
From: Phosphorylation and Glycosylation of Bovine Lens MP20
Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
LC-MS/MS Identification of Impurities Present in Synthetic Peptide Drugs Dr Anna Meljon*, Dr Alan Thompson, Dr Osama Chahrour, and Dr John Malone Almac.
The Development Process of Web Applications
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
MassMatrix Search Results Explained
Accelerating Research in Life Sciences
Protein Identification via Database searching
Figure SI-15. Detailed experimental procedures.
Creation of assays using repositories
Research Techniques Made Simple: Mass Spectrometry for Analysis of Proteins in Dermatological Research  Christoph M. Hammers, Hsin-Yao Tang, Jing Chen,
Bioinformatics Solutions Inc.
Computing Xcorr exact p values
Proteomics Informatics David Fenyő
Proteomic Approaches to Cancer Biomarkers
Interpretation of Mass Spectra I
A perspective on proteomics in cell biology
Proteomics Informatics –
Complementary Structural Mass Spectrometry Techniques Reveal Local Dynamics in Functionally Important Regions of a Metastable Serpin  Xiaojing Zheng,
A, high resolution MS/MS spectrum (lower panel) of 1435
NoDupe algorithm to detect and group similar mass spectra.
Protein Identification Using Tandem Mass Spectrometry
Top-down protein identification.
MS/MS spectra of INEILSNALKR with a Lys residue modified with SUMO1 or SUMO3 remnant chains. MS/MS spectra of INEILSNALKR with a Lys residue modified with.
Shotgun Proteomics in Neuroscience
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Operation manual of AI SIDA
Presentation transcript:

A Database of Peak Annotations of Empirically Derived Mass Spectra Dennis Harman[1], Patrick Smyth[2], and David Sigfredo Angulo[3] [1] DePaul University, CTI. d_harman60657@yahoo.com [3] DePaul University, CTI.dangulo@cti.depaul.edu [2] DePaul University, CTI. phsmyth@ameritech.net (to whom correspondence is to be addressed) Abstract Mass spectrometry has generated vast amounts of data and is the central technology in proteomics research. Presently, several databases containing empirically derived tandem mass spectrum (MS/MS) data are publicly available. These can be used singly or in a concatenated fashion; together they contain the sequences of more than 12 million proteins. We have imported these into the Illinois Bio-Grid Mass Spectrometry Proteomics Database (IBG-MSP) along with annotations. The aim is to consolidate these now scattered public databases into a central resource and to allow this database to be utilized for protein identification. Database searching is the most popular approach used to identify unknown proteins. Spectra of unknown proteins are matched against theoretical spectra derived from genomic or proteomic sequence databases. We have developed software to utilize our empirical database to match against these unknown protein spectra which allows for more accurate protein identification, especially in cases of post-translational modifications. Our IBG-MSP contains a plethora of metadata including the amino acid sequence and details on the experimental techniques utilized in collecting the samples. The overall format for the metadata closely follows mzXML, an industry standard supported by HUPO. The IBG-MSP also supports MS/MS annotations, where peaks may be annotated using terminology conventionally used in describing MS/MS fragment ion series. This is accomplished through the implementation of algorithms based on the fragmentation rules of Collisionally Induced Dissociation (CID) of protonated peptide ions. An annotated theoretical spectrum is generated from each amino acid sequence, and the masses in each theoretical spectrum are matched to those in each experimental spectrum. Those annotations are then stored in the database. As a centralized, computational solution for mass spectrometry-based proteomic analyses, the IBG-MSP will not only be utilized to identification of proteins, but to provide training data for development of new proteomic analysis tools. IBG-MSP Database and the Data Loaded The ER diagram shown displays the principal tables and their associated relations contained within the IBG-MSP database. The meta data from xml file sources (Accession id, machine type, precursor mass, etc..) is contained within the database. Researchers can perform searches on any data items. Annotations such as a, b, or c and x, y, or z ions, neutral loss or gain, immonium ion, or internal cleavage ion can be found in the following tables: ionSeriesDetail, neutrallosscharge, and internalclevageion. The Batch Import Module is a Java program, which is hosted on a 20 node cluster and is used to download data from various publicly currated databases. The Module takes, as input, mass spectra, which are stored in mzXML files, and the peptide sequences (stored in xml or csv files) associated with the spectra. A Fragmentation Modeling Tool is utilized to annotate the spectra, as the data are imported into the database. The Module utilizes Java Beans, Java JAXB technology, and the ProteomeCommons IO Framework [5]. Sources of Imported Data Peptide Atlas [6] http://www.peptideatlas.org/repository Tranche @ ProteomeCommons.org [5] http://www.proteomecommons.org/data.jsp Fragmentation Modeling Tool Peptide Structure and Fragmentation Peptide ions do not fragment at random, but instead they always fragment with a certain order, which is well understood. A Fragmentation Modeling Software Tool was implemented that can be used to predict the potential ions that could theoretically be produced given a specific amino acid sequence. The Tool implements the rules of the peptide fragmentation process and uses data structures that we had previously developed [2], [3]. Based on the amino acid sequence associated with the spectrum being imported, the Tool computes the theoretical peaks of ions containing the N terminus and the C terminus (see figure to the right). These theoretical peaks are then compared with the peaks in the imported spectrum using a linear time matching algorithm. Where a match is found, the annotation of the theoretical peak is used to annotate the actual peak in the imported spectrum. Fragmentation of peptides typically occurs along the peptide backbone. The terminology conventionally used in describing MS ions encapsulates information about the fragmentation processes that took place to produce the ions. Each residue in the peptide chain successively fragments off, both in the N- to-C and C-to-N directions. The location in the backbone where the fragmentation occurs and the terminus retaining the ionization charge result in the formation of various ion types, a, b, or c and x, y, or z ions. Doubly charged tryptic peptides mainly yield singly charged y- and b-ions. A loss of a CO group resulting in a mass difference of 27.9949 Da relative to the b-ion can also occur and form a-ions. Other ions due to losses of neutral H2O and NH3 are possible. References [1] M. Kinter and N.E. Sherman, Protein Sequencing and Identification Using Tandem Mass Spectrometry, 2000; John Wiley & Sons, Inc, New York, NY. [2] Harman, D. and D. S. Angulo. Annotation of Mass Spectrum Data (Poster). Proceedings of the DePaul CTI Research Symposium. Chicago, IL. May 5, 2997 [3] Harman, D.; Angulo, D.; Drew, K; Schilling, A. A Data Model for Annotating the Peaks of Mass Spectrum Data (Poster). Proceedings of the Midwest Software Engineering Conference/DePaul CTI Research Symposium. Chicago, IL. April 29, 2006. [4] http://www.illinoisbiogrid.org/MSDB [5] http://www.proteomecommons.org/ [6] http://www.peptideatlas.org/repository