Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity.
The Proteomics Core at Wayne State University
Post-Translational Modifications: CrossTalk Robert Chalkley Chem 204.
Proteomics Informatics – Protein characterization I: post-translational modifications (Week 10)
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Proteomics Informatics – Protein identification III: de novo sequencing (Week 6)
Reference Assisted Nucleic Acid Sequence Reconstruction from Mass Spectrometry Data Gabriel Ilie 1, Alex Zelikovsky 2 and Ion Măndoiu 1 1 CSE Department,
De Novo Sequencing and Homology Searching with De Novo Sequence Tags.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
De Novo Sequencing of MS Spectra
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
My contact details and information about submitting samples for MS
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Facts and Fallacies about de Novo Sequencing & Database Search.
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Conclusion  Comprehensive workflow identified approximately 70% more high confident peptide as compare to general search strategy.  The comprehensive.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
PROTEIN QUANTIFICATION AND PTM JUN SIN HSS.I. PROJECT 1.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Generating Peptide Candidates from Protein Sequence Databases for Protein Identification via Mass Spectrometry Nathan Edwards Informatics Research.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
A Phospho-Peptide Spectrum Library for Improved Targeted Assays Barbara Frewen 1, Scott Peterman 1, John Sinclair 2, Claus Jorgensen 2, Amol Prakash 1,
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Hanyang Univ. Introduction to Data Analyses for Mass Spectrometry-based Proteomics 1.
B Monoisotopic mass of neutral peptide M r (calc): Fixed modifications: Carbamidomethyl Ions score: 45 † Expect: ‡ Matches (red): 18/50.
이원엽. Abstract InsPecT: a tool to identify post-translational modifications using tandem mass spectrometry data Database filtering using Peptide.
MassMatrix Search Results Explained
Protein Identification via Database searching
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Proteomics Informatics –
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Presentation transcript:

Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan and Alma L. Burlingame

Problem Input: An MS/MS spectrum of a mixture of peptides:  Heavily modified protein  Same amino acid sequence  Same PTM  Same total number of PTMs  Different PTM configurations Example  Two peptides with two methylations each. LATK[+32]AARKSAE LATK[+16]AARK[+16]SAE Problem:  Identify the PTM configurations  Estimate their relative abundance

Work flow

Peptide identification Input  A deisotoped MS/MS spectrum of a mixture of peptides  An identified peptide, the type of PTMs and the number of PTMs. Example  Peptide: LATKAARKSAPATGGVKKPHRYRPGTVALRE  PTM: Methylation  #PTM: 4 Problem  Identify the PTM configurations  Estimate their relative abundance

All possible configuration Assumption:  All methylations are on lysine residues  Each lysine residue has at most 3 methyl groups.

Configuration identification Score of Spectrum-Configuration-Pair  Spectrum S: ETD peak list  Configuration C: theoretical peak list (c-ion)  Sc(S,C) is the number of matched peaks in the real peak list and the theoretical peak list. Greedy algorithm  Compute the matching score for each configuration  Remove the configure with the highest score from the configuration set and remove the peaks in S that are matched to the configuration  Repeat the above steps until all configurations have score 0

Configuration identification results

Estimation of relative abundance We have four identified configurations C 1,C 2,C 3,C 4. x 1, x 2, x 3, x 4 the relative abundance  Sum equals to 1 Consider the ith c-ion with charge z  Five possible peaks p 0, …, p 4  Suppose p 2 is matched to C 1, C 2  Observed peak intensity I(p 2 )  Theoretical peak intensity Compute the observed and theoretical peak intensity pair for each matched c-ion

Estimation of relative abundance Find x 1, x 2, x 3, x 4 such that the sum of the squared errors of these intensity pairs is minimized. Standard non-negative least-square procedure

A Novel Approach for Untargeted Post- translational Modification Identification Using Integer Linear Optimization and Tandem Mass Spectrometry Richard C. Baliban, Peter A. DiMaggio, Mariana D. Plazas-Mayorca, Nicolas L. Young, Benjamin A. Garcia and Christodoulos A. Floudas

Bottom up PTM identification Two approaches  Tags  Non-tags Restricted Unrestricted  PILOT_PTM

Preprocessing Remove all peaks related the precursor ion Only keep locally significant peaks Deisotope Remove neutral offset if the peak doe not have a complementary peak. Each candidate peak has a list of supporting peaks.

ILP Model Input  A preprocessed deisotoped spectrum S={ a 1,a 2,…,a m }  A peptide (theoretical b-ion peak list) P={ b 1 b 2 …b n }  A list of all known PTMs Theoretical peak b k  CS k is the set of all possible peaks (indices) in S that b k can be matched to with PTMs Real peak a j  Pos j is the set of all possible peaks (indices) in P that a j can be matched to with PTMs  Support j is the set of all peaks (indices) supporting peak j in S  Mult j is the set of all peaks (indices) peak j supports

ILP Model Binary variable  p j,k = 1 if peak a j in S is matched to b k in P, otherwise p j,k = 0  y j = 1 is peak a j is a supporting peak or matched peak, otherwise y j = 0

ILP Model Objective Subject to  One peak in P can only match one peak in S  One peak in S can only match one peak in P

ILP Model Subject to: No three consecutive missing peaks The intensity of peak i is counted iff the exists one peak j such that peak i supports j and peak j is a matched peak.

ILP Model Solve using CPLEX  Report top-10 variable assignments Existing problem  No constraints that require the distance between two neighboring matched peaks should match the mass of a residue (with PTM)

New constraints For each p j,k  Set of candidate ion peaks j’ with respect to k’ such that no valid jump exists between j and j’  The maximum and minimum masses that can be reached from j, respectively

New constraints Neighboring matched peaks do not conflict Conflicting matched peaks must have a matched peak between them The distance between two matched peaks should be bounded

Postprocessing Re-scoring 10 candidate modified candidate peptides  Cross-correlation score Recheck modifications if there are unmatched peaks indicating non- modification

Test data sets Test set A: 44 CID spectra (Ion trap), 174 ETD spectra (Orbitrap) of chemically synthesized phosphopeptides, manually validated Test set B: 58 ECD spectra (FTICR) of Histone H3-(1–50) N-terminal Tail, manually validated Test set C: 553 CID spectra (Orbitrap) of Propionylated Histone Fragments, manually validated Test set D: 525 modified and 6025 unmodified CID spectra (Orbitrap) from chromatin fraction. Identified by SEQUEST and validated by MASCOT and remove low quality spectra manually Test set E: unmodified 36 (Ion trap), 37 (Q-TOF), 4061(Orbitrap) CID unmodified spectra. Validated as test set D

Residue predication accuracy

Peptide prediction accuracy

Comparison on test sets C and D1 Peptide and residue prediction accuracy

Comparison on test sets C and D1 Subsequence prediction accuracy

Running time

Q & A