Refining Peptide Fragmentation Models for Improved Confidence in Sequence/Spectrum Matching Karl Clauser Broad Institute of MIT and Harvard Cambridge,

Slides:



Advertisements
Similar presentations
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
Advertisements

MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Building and Using Libraries of Peptide Ion Fragmentation Spectra S.E. Stein, L.E. Kilpatrick, M. Mautner, P. Neta, J. Roth National Institute of Standards.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Scaffold Download free viewer:
My contact details and information about submitting samples for MS
Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.
Facts and Fallacies about de Novo Sequencing & Database Search.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Spectral Counting. 2 Definition The total number of identified peptide sequences (peptide spectrum matches) for the protein, including those redundantly.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Karl Clauser Proteomics and Biomarker Discovery 10/14/2015 9:47:49 AM 1 Manual De Novo Peptide MS/MS Interpretation For Evaluating Database Search Results.
MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.
Common parameters At the beginning one need to set up the parameters.
Organic Mass Spectrometry
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
A Phospho-Peptide Spectrum Library for Improved Targeted Assays Barbara Frewen 1, Scott Peterman 1, John Sinclair 2, Claus Jorgensen 2, Amol Prakash 1,
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
A Reference Library of Peptide Ion Fragmentation Spectra: Yeast S.E. Stein, L.E. Kilpatrick, P. Neta, Q.L. Pu, J. Roth, X. Yang National Institute of Standards.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Proteogenomic Novelty in 105 TCGA Breast Tumors
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
이원엽. Abstract InsPecT: a tool to identify post-translational modifications using tandem mass spectrometry data Database filtering using Peptide.
Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 신성호.
Mass Spectrometry 101 (continued) Hackert - CH 370 / 387D
Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
A Database of Peak Annotations of Empirically Derived Mass Spectra
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Protein Identification via Database searching
Karl R. Clauser Broad Institute of MIT and Harvard
Manual De Novo Peptide MS/MS Interpretation
General Overview of the module and the methods
Computing Xcorr exact p values
Manual De Novo Peptide MS/MS Interpretation
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Proteomics Informatics –
NoDupe algorithm to detect and group similar mass spectra.
Protein Identification Using Tandem Mass Spectrometry
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Kuen-Pin Wu Institute of Information Science Academia Sinica
Operation manual of AI SIDA
Presentation transcript:

Refining Peptide Fragmentation Models for Improved Confidence in Sequence/Spectrum Matching Karl Clauser Broad Institute of MIT and Harvard Cambridge, MA Strategies for Improving Reliability in Protein/Peptide Identification Workshop National Cancer Institute's (NCI) Clinical Proteomic Technologies for Cancer (CPTAC) Initiative and the National Institute of Standards and Technology (NIST) Gaithersburg, MD November 15, 2007 9/17/2018 7:50:32 AM

Continuum of MS/MS Interpretation Strategies Unsequenced Genome Unexpected PTM Multiple Mutations Validate DB match Spectral Quality Sequenced Genome Expected PTM Expected Chem Mods Single Mutations Repeat Sample Analysis Related Spectra Unexpected Partial Chem Mods Overlapping Peptides De Novo Sequence Database Match Spectral Library Fragmentation Modeling Peak Detection Collection Curation Each Digestion Enzyme Each Instrument Model Each Collision Energy SCORING 9/17/2018 7:50:32 AM

Are Spectral Similarity Scoring Functions Appropriate for Scoring Database Search Sequence/Spectrum Matches? ( S( Ai * Bi) )2 SAi 2 * S Bi 2 Difference of the 3 Scores Extent of Emphasis on low abundance ions Dot product Sokolow, 1978 General Features of the Scores Normalize Intensities & Number of Peaks Score contribution for each ion scales with its intensity Max Value is 1.0 Spectral Contrast Angle Wan, 2001 S( Ai * Bi) ( SAi 2 * SBi 2)1/2 Numerator Bonus for Matched ions Denominator Penalty for Unmatched ions A Experimental Penalty for Unmatched ions B Theoretically Possible S( Ai * Bi ) 1/2 ( SAi * SBi )1/2 Similarity Zhang, 2004 The naïve fragment ion models in current database search programs tend to substantially over predict the number of fragment ions. I.e. allow for all possible ion types with similar intensities 9/17/2018 7:50:32 AM

(R)L/Q|A|E|A|Q|Q|L|R(K) Zhang Kinetic Model ZSS: 0.826 SM Core Model Experimental 9/17/2018 7:50:32 AM

(R)L/Q|A|E|A|Q|Q|L|R(K) Zhang Kinetic Model ZSS: 0.826 SM Dominant Ion Model v9 Experimental 9/17/2018 7:50:32 AM

Non-assignment Penalty SpectrumMill Core Scoring of MS/MS Interpretations (R)E F E|I|I|W|V T K(H) 9.24 78.1% DEBS: -3 DFwdRev: 1.75 Score = Assignment Bonus (Ion Type Weighted) - Non-assignment Penalty (Intensity Weighted) Peak Selection: De-Isotoping, S/N thresholding, Parent - neutral removal, Charge assignment Match to Database Candidate Sequences SPI (%) Scored Peak Intensity 9/17/2018 7:50:32 AM

Refining Peptide Fragmentation Models to Account for Dominant Peaks Goal Increase gap between score of correct and incorrect sequences for each spectrum Increase score of correct sequence Decrease score of incorrect sequence Principles to Capture Bonus - dominant peaks should score a bonus for ion types and AA adjacencies that should be dominant, i.e. at N-term of Pro. Penalty - dominant peaks should be penalized if assigned to sites that should not be dominant. Correct peak assignments should rarely be penalized (<1%) Charge location - At a dominant site, formation of a b or y type ion depends primarily on the location of basic residues in the peptide, not the fragmentation site. Scoring scheme should still work if an enzyme besides trypsin is used. Do not try to predict which of b, y, b++, y++ will dominate Frequency - dominant fragmentation does not always occur at particular AA adjacencies. Instead there are tendencies, additional factors not yet understood. How dominant - It is very difficult to predict how dominant a peak will be when it occurs at a cleavage that is expected to be dominant. Do not try to predict the extent of dominance. Scale a peak’s score with the extent of dominance that occurs. 9/17/2018 7:50:32 AM

Additional Intensity Scaled Score on Dominant Peaks Bonus b or y at expected dominant cleavage Penalty not expected dominant cleavage M PM NM n R 2 2 2 c R 0 0 2 n P 2 2 2 c DnE 1 2 2 nc HK 2 2 0 c ILV 1 1 0 AG, FG, QG, YG 1 0 0 n-1: HALIVFYW, !n: PG 1* 1* 0 b2/yn-2 0 0 0 bn-2 contains RHK 0 0 0 nterm E b-h2o 1 1 1 nterm Q b-nh3 1 1 1 y++-h2o @ yn-2 DnEST 1 1 0 y++-nh3 @ yn-2 QH 1 1 0 Dominant Ions Expected by Peptide Sequence *not counted as expected, but bonus if occurs Proton Mobility Mobile: zpre > #Arg + #Lys + #His Partially mobile: zpre < #Arg + #Lys + #His and > #Arg Non-mobile: zpre < #Arg (amount to add to base score) Bonus Score 1.5 0.75 0.25 0.125 0.0 2.0 3.0 Class 1 Class 2 Class 0 (amount to subtract from base score) Penalty Score 1.5 0.25 0.0 2.0 3.0 Intensity (Dominant Peak/ Tallest non-dominant peak) Tallest non-dominant = peak min( seq: tallest not expected, spectrum: top 5 ratio gap ) 9/17/2018 7:50:32 AM

Frequency and Distribution of Dominant Ions v9 67% 72% 76% 5758 2974 177 Proton Mobility Mobile: zpre > #Arg + #Lys + #His Partially mobile: zpre < #Arg + #Lys + #His and > #Arg Non-mobile: zpre < #Arg Precursor z=2, 6699 spectra, <0.5% FPR from a trypsin GeLC/MS/MS experiment on an LTQ-FT 9/17/2018 7:50:32 AM

Frequency of AA Adjacency-Dependent Dominant Ions – v9, z=2 Mobile Partially Mobile 4525 spectra 2061 spectra # dominant ions # total cleavages >0.8 0.4 - 0.8 0.1 - 0.4 - (<3 obsv) Non-mobile 114 spectra 9/17/2018 7:50:32 AM

Frequency of Position-Dependent Dominant Ions – v9, z=2 Proton Mobility Mobile: zpre > #Arg + #Lys + #His Partially mobile: zpre < #Arg + #Lys + #His and > #Arg Non-mobile: zpre < #Arg 9/17/2018 7:50:32 AM

DIS performance Partially Mobile v9 P050531_PBD_WT 2061 spectra unique peptides 1477 spectra w/ dominant peaks 2060 spectra yield same match DIS v Orig z=2 only, m/z tolerance = 0.05 DIS better R1-R2 only DIS better both DIS worse both DIS better score only Score Rank1 – Rank2 DIS worse DIS worse DIS better DIS better 9/17/2018 7:50:32 AM

Spectral Similarity Comparison of Fragmentation Models Zhang Kinetic Model SM Core Model v9 9/17/2018 7:50:32 AM

Zhang Kinetic Model SM Dominant Ion Model v9 SM Core Model v9 9/17/2018 7:50:32 AM

Dominant Ions – Mobile b2/yn-2 (K)V/N|I|V|P V I\A K(A) ZSS: 0.697 Zhang Kinetic Model Experimental 9/17/2018 7:50:32 AM

Dominant Ions – Mobile b2/yn-2 (K)A N|S/N/L/V L|Q|A|D\R(S) ZSS: 0.534 Zhang Kinetic Model Experimental 9/17/2018 7:50:32 AM

Dominant Ions – PM nterm Q b-nh3 (K)Q D/A Q/S/L/H|G D|I|P Q\K(Q) ZSS: 0.808 Zhang Kinetic Model Experimental 9/17/2018 7:50:32 AM

Dominant Ions – PM nterm E b-h2o (K)E/G E/D/S/S/V/I/H|Y\D|D\K(A) ZSS: 0.753 Zhang Kinetic Model Experimental 9/17/2018 7:50:32 AM

Dominant Ions – PM y++-h2o @ yn-2 DnEST (K)V/A|E|I/E|H|A\E\K(E) ZSS: 0.724 Zhang Kinetic Model Experimental 9/17/2018 7:50:32 AM

Dominant Ions – PM y++-nh3 @ yn-2 QH (R)I/L|Q|E/H|E|Q\I\K(K) ZSS: 0.779 Zhang Kinetic Model Experimental 9/17/2018 7:50:32 AM

Dominant Ions – PM bn-2 contains RHK (R)R/Q V E|E\E|I\L|A|L\K(A) ZSS: 0.799 Zhang Kinetic Model Experimental 9/17/2018 7:50:32 AM

Non-dominant Ions – Mobile (R)T/V T V W E L I/S|S/E/Y|F|T|A|E\Q R(Q) ZSS: 0.427 Zhang Kinetic Model Experimental 9/17/2018 7:50:32 AM

Next Steps Refine definitions of dominant adjacencies & positions Add/combine factors to increase frequency of occurrence/expectation Bonus Class Cleavage Matrix Refine intensity ratio and dominant ion categories across charge states. Verify suitability across multiple enzymes. Verify suitability across multiple collision energies 28, 30, 35. Test scoring system on phospho-peptides and other labeling chemistries that produce strong marker ions and/or bear charge. Examine diminishing the relative score of minor ion types (a, b/y-h2o, b/y-nh3) Minimize role of SPI (secondary score) in validation. Evaluate/Integrate Zhang Fragment model 9/17/2018 7:50:32 AM

Acknowledgements Broad Institute Steve Carr Jake Jaffe MIT Michael Yaffe Majbrit Hjerrld Drew Lowery Agilent Technologies Joe Roark David Horn Amgen Zhongqi Zhang 9/17/2018 7:50:32 AM