De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides 18.12.2008 Hannu Peltoniemi

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Protein Quantitation II: Multiple Reaction Monitoring
RNA-Seq based discovery and reconstruction of unannotated transcripts
Conclusion The workflow presented provides a strategy to incorporate unbiased glycopeptide identification to generate an initial list of targets for data.
A Multi-PCA Approach to Glycan Biomarker Discovery using Mass Spectrometry Profile Data Anoop Mayampurath, Chuan-Yih Yu Info-690 (Glycoinformatics) Final.
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Material Measurement Laboratory Mass Spectral Database of Glycans and Glycopeptides in Therapeutic Drugs Maria Lorna A. De Leoz, Xinjian (Eric) Yan, Xiaoyu.
My contact details and information about submitting samples for MS
Facts and Fallacies about de Novo Sequencing & Database Search.
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
1/49 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 9 Estimation: Additional Topics.
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 10, Slide 1 Chapter 10 Understanding Randomness.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Estimation: Confidence Intervals Based in part on Chapter 6 General Business 704.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
A study involving stress is done on a college campus among the students. The stress scores are known to follow a uniform distribution with the lowest stress.
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
SUPPLEMENTARY FIGURE LEGENDS Figure S1. Sample glycopeptide fragmentation. MS2 scans of the fragmentation of the triply charged glycopeptide at m/z =
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
Supplementary Data Average Monosaccharide Composition Calculations Example data in this table originates from work done for the following publication:
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
An Efficient Algorithm for Incremental Update of Concept space
Protein Identification via Database searching
Statistics in MSmcDESPOT
Authors: Aruna Jyothi. M, Sanovar Bhargava, Hima Bindu. A, Subbarayudu
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Proteomics Informatics –
Accelerating Research in Life Sciences
NoDupe algorithm to detect and group similar mass spectra.
Multiple Regression – Split Sample Validation
Top-down protein identification.
Accelerating Research in Life Sciences
Tryptic glycopeptides of IGFBP-5 from T47D cells separated by HPLC detected by ESI-MS and sequenced by tandem MS.a, ESI-MS spectrum of combined fractions.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Lab 3 – BLAST – Directed It’s a BLAST! (too easy?)
Presentation transcript:

De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides Hannu Peltoniemi

De novo vs database matching MS 2 spectrum Unknown glycan glycan database Database matching matching Best scoring glycan(s) in the DB Only those structures that are in the DB can be found OK if comprehensive DB If glycan not in the DB the result may be closest matching (wrong) structure or no result at all

MS 2 spectrum Unknown glycan De novo Best scoring glycans No database -> also new structures can be found ! Computational intensive, requires high quality spectra Typically no definite answer, but a set of high scoring structures. On the fly structure generation and matching

De novo structure search Part of the N-glycopeptide workflow: Joenväärä et al., N-Glycoproteomics - An automated workflow approach., Glycobiology 2008,18(4): Input: Protonated, deconvoluted MS 2 spectra Steps: 1) identification of peptides 2) identification of N-glycan compositions 3) identification of de novo N-glycan structures (branching, no linkage)

Input data Spectrum with annotated glycopeptide and glycan composition fragments.

Example data Peptide: QDQCIYNTTYLNVQR Glycan composition: 6 Hex 5 HexNac 3 NeuAc

Same data, different view: Hex NeuAc=0NeuAc=1NeuAc=2NeuAc= composition: 6 Hex 5 HexNac 3 NeuAc Glycan fragments attached to peptide Free glycans HexNAc

The puzzle All the measured fragment compositions of a unknown structure with the given total composition are known Some theoretical fragments may be missing Some measured fragments may be false What is the structure that explains best the data? ?

Solution The problem is split to two phases 1)Generation of possible structures: Structures are grown starting from N-glycan core. The population size is limited by removing structures with lowest fit with peptide+glycan fragments 2) Scoring: The set of structures are scored with full data. The final glycopeptide score is set to sum of peptide and glycan structure scores.

measured theoretical Initialization The missfit (cost) between theoretical structure and measured data is defined as the number of not matching theoretical and measured fragments. Example data: peptide + 5 Hex 4 HexNAc

Growing structures Start (core) End (final composition) add unit If population grows too large structures with highest cost are removed.

Scoring... Score is calculated as –log 10 (P), where P is the probability (binomial) that a random set of fragments would match as well or better as the ranked structure. The final glycopeptide score is sum of peptide and structure scores. highest scoring lowest scoring

Options All glycosidig bonds can be broken Unlimited number of cuts Assumptions Monosaccharide names Number of possible connections with each monosaccharide Accepted connections between monosaccharides Start structures (N-glycan cores) Max population size when growing structures

Testing with in silico generated data structure theoretical spectrum fragmentation randomly removing and adding noise fragments NeuAc=0NeuAc=1NeuAc=2NeuAc=3 Hex HexNAc peptide+glycan glycan input to the de novo algoritm randomized spectrum

Results of the in silico tests If about ½ of the theoretical fragments present => The correct structure is among the few highest scoring ones. Each mark is a result of a 100 runs.

Testing with serum sample Very complex wet lab data set, i.e. a human serum specimen Removal the high abundance proteins prior to LC-MS/MS 80 spectra with identified peptide and glycan compositions 62 spectra with putative structures Mostly typical structures Mostly small structures, large ones seems to be hard to catch

Example serum spectrum

ANT3(224,187), FIBG(78), THRB(121), A1AG1(56), FETUA(156), HPT(241), HRG(344), FIBB(394), TRFE(630), IGHA1(144), A1AT(70,107,271), { VINEX(102), HPTR(126) } FIBG(78), HRG(344), IGHA1(144)VTNC(169) IGHG1(180), IGHG2(176)IGHA1(144)A1AG1(93) IGHG2(176)IGHA1(144)CO2(621), CO3(85) IGHG2(176)IGHA1(144)CO3(85) Structures found from the serum sample

Conclusions De novo glycan structure identification of intact glycopeptides is possible High quality spectra is necessary Typically no definite answer but a few structures matching equally well => biological insight still needed if one identified structure needs to be picked