Modified Peptide MS/MS Interpretation

Slides:



Advertisements
Similar presentations
Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
Post-Translational Modifications: CrossTalk Robert Chalkley Chem 204.
Proteomics Informatics – Protein characterization I: post-translational modifications (Week 10)
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
MS-Viewer – A Web Based Spectral Viewer For Database Search Results Peter R. Baker 1, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry.
1336 SW Bertha Blvd, Portland OR 97219
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Proteomics Informatics – Protein identification III: de novo sequencing (Week 6)
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
Sangtae Kim Ph.D. candidate University of California, San Diego
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
De Novo Sequencing of MS Spectra
Scaffold Download free viewer:
My contact details and information about submitting samples for MS
A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry,
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
Collision-based methods: Electron-based methods: Primary methods for dissociating peptides Collision-based methods: Ion trap collisional activation.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Karl Clauser Proteomics and Biomarker Discovery 10/14/2015 9:47:49 AM 1 Manual De Novo Peptide MS/MS Interpretation For Evaluating Database Search Results.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Phospho-Peptide Spectrum Library for Improved Targeted Assays Barbara Frewen 1, Scott Peterman 1, John Sinclair 2, Claus Jorgensen 2, Amol Prakash 1,
Laxman Yetukuri T : Modeling of Proteomics Data
ETD & ETD/PTR Electron Transfer Dissociation Proton Transfer Reaction
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Protein Identification by Database Searching John Cottrell Matrix Science.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Oct 2011 SDMBT1 Lecture 11 Some quantitation methods with LC-MS a.ICAT b.iTRAQ c.Proteolytic 18 O labelling d.SILAC e.AQUA f.Label Free quantitation.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
Goals in Proteomics Identify and quantify proteins in complex mixtures/complexes Identify global protein-protein interactions Define protein localizations.
‘Protein sequencing’: Determining protein sequences
Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
A Database of Peak Annotations of Empirically Derived Mass Spectra
LC-MS/MS Identification of Impurities Present in Synthetic Peptide Drugs Dr Anna Meljon*, Dr Alan Thompson, Dr Osama Chahrour, and Dr John Malone Almac.
MassMatrix Search Results Explained
Refining Peptide Fragmentation Models for Improved Confidence in Sequence/Spectrum Matching Karl Clauser Broad Institute of MIT and Harvard Cambridge,
Manual De Novo Peptide MS/MS Interpretation
Bioinformatics Solutions Inc.
Manual De Novo Peptide MS/MS Interpretation
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
A perspective on proteomics in cell biology
Volume 31, Issue 3, Pages (August 2008)
Proteomics Informatics –
Complementary Structural Mass Spectrometry Techniques Reveal Local Dynamics in Functionally Important Regions of a Metastable Serpin  Xiaojing Zheng,
Bioinformatics for Proteomics
Shotgun Proteomics in Neuroscience
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Volume 2, Issue 1, Pages (January 2014)
Presentation transcript:

Modified Peptide MS/MS Interpretation Karl R. Clauser Broad Institute of MIT and Harvard Bioinformatics for Protein Identification ASMS Fall Workshop Baltimore, MD November 5-6, 2009 11/16/2018 6:37:57 AM

Outline Fixed, variable, mix modifications and search space Multiple rounds of searching Diagnostic marker ions for modifications Data acquisition methods specific for modifications Ambiguity in localizing phosphorylation sites Sample handling chemistry artifacts Resources for masses/descriptions of known modifications 11/16/2018 6:37:57 AM

Fixed, Mix and Variable Modifications Allow 2 possibilities for an AA. Allow both in 1 spectrum if more than one location/AA. Fixed Redefine the wild type as Nearly all software package allow configuring Fixed and variable modifications. Some will allow a choice of mix, if not then one typically configures light as fixed and heavy as variable, or vice-versa. Mix Search in 2 cycles Cycle 1: all KR light Cycle 2: all KR heavy DO NOT allow both light and heavy in 1 spectrum 11/16/2018 6:37:57 AM

Variable Modifications Expand the Search Space Fixed Mods only Allow Variable Mods precursor mass filter Calculate MH+ fixed mods only tolerance filter Shift range filter AA composition filter Candidates passing precursor mass filter Precursor MH+ shift Calculate MH+ Variable mod combinations -256 -176 -160 -97 -81 -80 -32 -16 -2 -1 0 17 3ST 2ST 2ST 1ST 1ST 1ST 2M 1M 2N 1N * ^Q 1M 1M 1M 1N ? ? ? ? ? ? ? ? ? ? 1 .05 AA composition tolerance filter 11/16/2018 6:37:57 AM

Methods of Constraining Allowed # of Modifications/Peptide Parent mass shift range Spectrum Mill Max Number of mods/peptide Sequest - all mods have same max X!Tandem - ? Phenyx - each mod can have different max Max Permutations of mods/peptide Mascot - cap on permutations/peptide Candidate sequence contains sequence tags present in spectrum Protein Pilot/Paragon 11/16/2018 6:37:57 AM

Multiple Rounds of Searching Round 1: search all proteins Get high confidence peptide hits 0-1 missed cleavages Minimal number of variable AA modifications Round 2: limit the search to proteins identified in round 1 Semi-/un-specific cleavage Increase the number of modifications Allow for AA substitutions Allow for undefined modifications Alternate names for similar concept X!Tandem: refinement Mascot: Error tolerant Spectrum Mill: search saved hits, homology mode, unassigned single mass gap Phenyx: 2-rounds ProteinPilot/ Paragon: thorough ID, fraglet-taglet 11/16/2018 6:37:57 AM

X!Tandem - Refinement Search 11/16/2018 6:37:57 AM

Mascot - Error Tolerant Search Otherwise, submitting the search is just like submitting a standard search except that you check the Error Tolerant Checkbox Creasy DM, and Cottrell JS. (2002) Proteomics 2, 1426-1434. 11/16/2018 6:37:57 AM

Mascot - Error Tolerant Search Result Take a look at the match to query 218. The mass tolerance for this search was fairly wide, so the observed mass difference could correspond to either carbamidomethylation or carboxymethylation at the N-terminus. Since this sample was alkylated with iodoacetamide, we would choose carbamidomethylation as the more likely suspect, especially as this brings the error on the precursor mass into line with the general trend, whereas carboxymethylation would give an error of +0.6 Da. The assignment to carbamidomethylation is also very believable, because this is a known artefact of over-alkylation. The same modification is found for query 260. In other cases, the match may be good, but the assignment is not believable. Query 145 is listed with a substitution at F8 causing a loss of 48 Da. This seems unlikely because we have 2 other matches to the same peptide without any substitution. What else could it be? Well, notice that the other two matches are both oxidised at M7. If we suppose this peptide is also oxidised, then the mass shift becomes -64, which is a well-known loss for oxidised methionine, (loss of methanesulfenic acid). This would seem a much more likely explanation for this match. It is important to understand that the error tolerant search finds new matches by introducing mass shifts at different positions in the database sequences. The match may be very strong, but figuring out a credible assignment can require a bit of detective work. 11/16/2018 6:37:57 AM

Spectrum Mill Unassigned Single Mass Gap Search 11/16/2018 6:37:57 AM

Spectrum Mill - Unassigned mass gap Wide open precursor mass filter coupled with complementary ion principle 1 2 3 4 5 6 S A M P L E R 6 5 4 3 2 1 i j + Pe bi bi * yj yj* 0 sequence mismatches: bi , bi* , yj , yj* , Pe, Pdb match 1 sequence mismatch at A: bi* , yj match 2 sequence mismatches at A and P: yj matches Pdb bi + Pgap bi * yj* + Pgap Pgap Relative Abundance Every ion in a spectrum can be thought of as a measurement of the mass of two complementary entities, both the ion and neutral (yj & yj*) resulting from fragmentation of a precursor. When comparing the experimental MS/MS spectrum of a peptide whose precursor mass is shifted from that of a candidate sequence in a database, because of a single modification, either the masses of the fragment ions or the neutrals will match (bi* & yj)while the masses of their complements (bi & yj*) will be shifted by the gap (Pdb-Pe) between the two precursor masses. The enitites containing the modification will have the shifted masses. Mass (m/z) 11/16/2018 6:37:57 AM

Spectrum Mill - Unassigned Single Mass Gap Result b* ions Removal of Met -131.0405 Acetylation + 42.0106 - 89.0299 The b*-ions (b-ions plus the precursor mass shift) contain the modification and represent the complements of the detected y-ions. The absence unmodified b-ions means that the modification is on the N-terminus. Mass Gap # IDs Presumed Modification -89 Da 153 Met loss + Acetylation -17 49 pyro-Glu, pyro-CamC +16 12 Oxidation +32 28 Dioxidation +42 2 Acetylation +57 62 Overalkylation +80 7 Phosphorylation Number of identifications with below 5% FDR for particular mass gaps from an Agilent 6520 Q-Tof LC-MS/MS dataset collected on a HeLa cell lysate digested with trypsin and separated on the basis of peptide isoelectric point into 24 fractions by off-gel electrophoresis. 11/16/2018 6:37:57 AM

Phenyx: 2 Rounds 11/16/2018 6:37:57 AM

Phenyx: Effect of the parameters for one protein 1rnd, Only 3 fixed mods 131 valid, 75% cov. 2rnd, With all mods And half cleaved 348 valid, 90% cov. 2rnd, Add variable mods 205 valid, 84% cov. 11/16/2018 6:37:57 AM 14

Phenyx: Use the Annotation in SwissProt, TrEMBL In the Feature Tables Sequence processing annotations Removal of signal peptides Removal of transit peptides Extraction of active chains Post-translational modifications Sequence variants Splicing variants Sequence mutations 57292 variants / 20328 human proteins 11/16/2018 6:37:57 AM

Phenyx: Search Annotated PTMs in SwissProt 15 unique spectra 11/16/2018 6:37:57 AM 16

Applied Biosystems ProteinPilot™ Software Paragon™ Algorithm Limited de novo sequencing generates Taglets A large number of short sequence tags –‘Taglets’ – are called. Each Taglet rated with the chance it is correct, allowing a large number to be used but more likely Taglets to have more influence. G I T Taglets: STI, TI, AS, YH, TIG, IT, SA, etc… I T S S A Shilov et al Mol Cell Proteomics, 6:1638-1655, (2007). H Y 11/16/2018 6:37:57 AM

The Paragon™ Algorithm: Varying Search Space on a Continuum Taglets for Sequence Temperature Value (STV) ST, TI, STI, AS, DI, DIN, SE, EQ, NA, SEQ Sequence Tags in Order of Decreasing Certainty: >DHE3_BOVIN (P00366) Glutamate dehydrogenase 1, mitochondrial precursor (EC 1.4.1.3) (GDH) MYRYLGEALLLSRAGPAALGSASADSAALLGWARGQPAAAPQPGLVPPARRHYSEAAADREDDPNFFKMVEGFFDRGASIVEDKLVEDLKTRETEEQKRNRVRSILRIIKPCNHVLSLSFPIRRDDGSWEVIEGYRAQHSQHRTPCKGGIRYSTDVSVDEVKALASLMTYKCAVVDVPFGGAKAGVKINPKNYTDNELEKITRRFTMELAKKGFIGPGVDVPAPDMSTGEREMSWIADTYASTIGHYDINAHACVTGKPISQGGIHGRISATGRGVFHGIENFINEASYMSILGMTPGFGDKTFVVQGFGNVGLHSMRYLHRFGAKCITVGESDGSIWNPDGIDPKELEDFKLQHGTILGFPKAKIYEGSILEVDCDILIPAASEKQLTKSNAPRVKAKIIAEGANGPTTPEADKIFLERNIMVIPDLYLNAGGVTVSYFEWLNNLNHVSYGRLTFKYERDSNYHLLMSVQESLERKFGKHGGTIPIVPTAEFQDRISGASEKDIVHSGLAYTMERSARQIMRTAMKYNLGLDLRTAAYVNAIEKVFRVYNEAGVTFT A segment with cold STV The Paragon algorithm is not just another search engine. The next few slides will give you a flavor of what’s really different about this algorithm by taking a quick look at the three major innovations of the Paragon algorithm. The first major innovation is the use of a new kind of sequence tag algorithm. For the search of a single spectrum, we call many small tags. We don’t make black and white decisions about what tags are correct or not – we give them quality ratings. We don’t use thresholds on tag evidence – the degree of implication of all segments are rated on a continuum. We capture this as a quantity we call Sequence Temperature Value, which essentially tells you how likely we are to find the right answer in that sequence segment. This lets us search harder in places that are more likely to produce the right answer and not so hard in less likely places. Now all we need is a way to determine what features should be included where on that spectrum. A segment with warmer STV The segment with the hottest STV in this protein 11/16/2018 6:37:57 AM

Controlling Search Space with the Paragon™ Algorithm Using feature probabilities avoids include/exclude decisions and simplistic rules. When combined with STVs, search space is dynamic by spectrum and even segment of the database. 1.0 Probability of Feature MMTS on C Dehydration of E,D Oxidized M Deamidation on N,Q Pyroglutamic acid of E iTRAQ on K, N-term iTRAQ on Y Try only most likely mods for ‘cold’ segments Try only more likely mods for ‘warm’ segments Try all mods for ‘hot’ segments in the database With the Paragon algorithm, it doesn’t work like that at all. Instead, features are given probabilities. This allows use to make very specific decisions about when to consider a feature like a modification. (walk through animation series about which mods are considered with cold, warm, and hot Sequence Temperature Values). …same idea with digestion, etc. Thus, for example, you never have to decide to define Trypsin to allow cleavage of K-P or not – it simply has a lower probability. Same concept also used with digestion specificity, mass tolerances, etc. 11/16/2018 6:37:57 AM

Pause for Questions 11/16/2018 6:37:57 AM

Diagnostic Marker Ions for Modifications (Immonium ions and Neutral Losses from Precursor) Mass Modification P-98 H3PO4 phospho Ser, Thr 216, P-80 phospho Tyr P-64 SOCH4 oxidized Met P-43 carbamylated N-term 204, P-203 N-Acetylglucosamine (GlcNAc) Phospho Ser Dehydroalanine m/z 98 Phosphoric Acid CID 11/16/2018 6:37:57 AM

Data Acquisition Methods Specific for Modifications ETD - Electron transfer dissociation ECD - Electron capture dissociation MS3 - ion trap Multi-stage activation - ion trap Precursor ion scan - triple quadrupole, Q-Tof Neutral-loss scan - triple quadrupole Review: Boersema, P; Mohammed, S; and Heck, A. Phosphopeptide fragmentation and analysis by mass spectrometry. J. Mass. Spectrom. 2009, 44, 861–878. 11/16/2018 6:37:57 AM

Multi-stage Activation in an Ion Trap Single fill Single isolation Multi Activation Single Mass Analysis Multi fill Multi isolation Multi Activation Multi Mass Analysis Figure 1. Schematic illustration of CAD-based methods for phosphopeptide ion dissociation: (A) full-scan mass spectrum containing a doubly charged, doubly phosphorylated peptide; (B) MS/MS spectrum containing typical fragment ions following conventional phosphopeptide ion dissociation of precursor ion by CAD (note the majority of fragment ion signal is represented by loss of two phosphoric acid residues, denoted by P); (C) MS3 spectrum following isolation and activation of the most abundant neutral loss product ion (dashed lines indicate fragment ions produced as a result of neutral loss activation); note that fragment ions generated from original MS/MS event are not retained; (D) Pseudo MSn spectrum, a composite containing fragment ions generated from both initial MS/MS event (solid lines) and subsequent activations of the neutral loss product ions (dashed lines). Schroeder, MJ, Schabanowitz, J, Schwartz, JC, Hunt, DF and Coon JJ. Anal. Chem. 2004, 76, 3590-3598. 11/16/2018 6:37:57 AM

Single vs. Multi-stage Activation MS/MS in an Ion Trap (K)L/G/V|S|V/s|P S R(A) Single Activation Multi-stage Activation 11/16/2018 6:37:57 AM

Time Considerations for Different Acquisition Strategies Figure 9. Time window of ESI-MS approaches in phosphoproteomics. Depicted here are approximations of ion accumulation and reaction and scan times. In reality, these times depend on the amount of ions injected. (∗ Current generation Q-TOFs). Boersema, P; Mohammed, S; and Heck, A: J. Mass. Spectrom. 2009, 44, 861–878. 11/16/2018 6:37:57 AM

O-GlcNAcylation Addition of a single sugar residue: N-Acetylglucosamine (GlcNAc) to serine or threonine residues of nuclear and cytoplasmic proteins. Present in all multi-cellular organisms Different from ‘conventional’ glycosylation: Inside the cell Transient modification Enzymes responsible for addition and removal of modification i.e. analogous to phosphorylation O-GlcNAc modification and phosphorylation interact / affect each other Modification is involved in cellular response to nutritional and other stresses Clear links to Diabetes and Alzheimer Disease and elevated in cancer. 11/16/2018 6:37:57 AM

Side-chain Fragmentation Yields Diagnostic Neutral Losses Phospho Ser Dehydroalanine m/z 98 Phosphoric Acid CID GlcNAcylated Ser Unmodified m/z 204 GlcNAc oxonium Ion CID In CID, O-GlcNAc bond is more labile than peptide backbone, so neutral-loss of sugar occurs prior to peptide fragmentation. Site assignment often not possible since an unmodified residue remains following neutral-loss of the sugar (so multi-stage activation is ineffective). 11/16/2018 6:37:57 AM

CID/ETD MS/MS of Same Doubly GlcNAcylated Peptide GLAGPTtVPAtKASLLR - Protein bassoon Mass difference between z10-z11 identifies one site as residue T2941. Mass difference between c10-c11 identifies other site as residue T2945. GlcNAc MH33+ -GlcNAc MH22+ -GlcNAc MH22+ -2GlcNAc MH+ -2GlcNAc CID m/z 687.046 3+ c11 z6 z5 z4 z3 z2 z10 c10 z8 c13 c14 z12 z11 c16 ETD m/z 687.046 3+ The CID MS/MS spectrum of this peptide shows major fragment ions are all neutral losses from sugar residues. Clearly a doubly O-GlcNAc-modified peptide. However, one can not confidently identify the peptide, nor localize the modification sites. However, the ETD MS/MS spectrum enables both peptide identification and modification site localization in a database search with Protein Prospector. Chalkley, R. J. et al. Proc Natl Acad Sci USA (2009) 106, 22, 8894-8899 11/16/2018 6:37:57 AM

Phospho Site Ambiguity – S/T L A G G Q/T/S Q|P T T|P L\T s/P Q R Site-localizing ion L A G G Q/T/S Q|P T T|P L\t S/P Q R The same spectrum is shown here labeled with two possible locations of the phosphorylated residue (either on Thr-14 or Ser-15). The presence of the b14 ion at 1353.6 ion represents fragmentation between an unmodified Thr and a phospho-Serine and enables unambiguous assignment of the site of modification. 11/16/2018 6:37:57 AM

Reliability of LC/MS/MS Phosphoproteomic Literature Citation Approach Instrument #sites #ambiguous Scores Site Supplem. sites Shown Ambiq Labeled Shown Spectra Ballif, BA,…Gygi, SP 1DGel LCQ Deca XP 546 86 yes yes no 2004 MCP, 3, digest, SCX 1093-1101 LC/MS/MS Rush, J, … Comb, MJ digest lysate LCQ Deca XP 628 0 yes no no 2005, Nat Biotech, 23, pTyr Ab 94-101 LC/MS/MS Collins, MO, …Grant, SGN protein IMAC Q-Tof Ultima 331 42 no yes no 2005, J Biol Chem, 280, peptide IMAC 5972-5982 LC/MS/MS Gruhler, A, … Jensen, ON digest lysate LTQ-FT 729 0 yes no no 2005 MCP, 4, SCX, IMAC 310-327 LC/MS/MS “Resulting sequences were inspected manually …. When the exact site of phosphorylation could not be assigned for a given phosphopeptide, it was tabulated as ambiguous.” “All identified phosphopeptides were manually validated, and localization of phosphorylated residues within the individual peptide sequences were manually assigned…” “All spectra supporting the final list of assigned peptides used to build the tables shown here were reviewed by at least three people to establish their credibility.” “Assignment of phosphorylation sites was verified manually with the aid of PEAK Studio (Bioinformatics Solutions) software.” 11/16/2018 6:37:57 AM

MCP draft Guideline for publishing PTM data http://www.mcponline.org/ III. POST-TRANSLATIONAL MODIFICATIONS Studies focusing on posttranslational modifications require specialized methodology and documentation to assign the presence and the site(s) of modification. No current MS data analysis software is infallible in the automatic assignment of modification sites in peptides, and these analyses are particularly error prone when multiple possible sites within a peptide are being utilized. For these reasons, additional documentation supporting assignment of PTMs is required. In addition to the tabular presentation(s) of the data described in guideline II: The site(s) of modification within each peptide sequence must be clearly presented. An indication of the certainty of localization for each PTM: The manner in which the modification was located (by computation or manually) and a description of the software used, if any. A justification for any localization score threshold employed. Ambiguous assignments: Peptides containing ambiguous PTM site localizations must be listed in a separate table from those with unambiguous site localizations. In cases where there are multiple modification sites and at least one is ambiguous, then these peptides should be listed with the ambiguous assignments. Ambiguous assignments must be clearly labeled as such. Examples of ambiguities include: Modified peptides in which one or more modification sites are ambiguous. Instances where the peptide sequence is repeated in the same protein so the specific modification site cannot be assigned. Instances in which the same peptide is repeated in multiple proteins, e.g. paralogs and splice variants (See also Section IV). Isobaric modifications (e.g., acetylation vs. trimethylation, phosphorylation vs. sulfonation etc), where the possibilities may not be distinguished. Examples of methods able to distinguish between these include mass spectrometric approaches such as accurate mass determination, observation of signature fragment ions (e.g. m/z 79 vs. m/z 80 in negative ion mode for assignment of phosphorylation over sulfation), or biological or chemical strategies. Annotated, mass labeled spectra: Spectra for ALL modified peptides must be either submitted to a public repository or accompany the manuscript as described in guideline II. 11/16/2018 6:37:57 AM

Phosphosite Localization Scoring Figure 3 Resolving ambiguity in phosphorylation site localization. (a) Peptides containing multiple serine, threonine and/or tyrosine residues should be evaluated for precise site assignment. This phosphopeptide is from Zinc finger protein 638. (b) General scheme for calculating a probability-based ion matching score (Peptide Score) for each potential phosphorylation site. The tandem mass (MS/MS) spectrum for the phosphopeptide from panel a is shown. The spectrum was separated into 100 m/z windows where the top N most-intense peaks per window were matched to predicted b- and y-type ions for each possibility. This was repeated using from 1 to 10 ions in each window (6 is shown). The cumulative binomial probability P was calculated using the number of trials (all b- and y-type ions) and the number of successes (matched ions) for each possibility and plotted as -10 log (P) vs peak depth (peaks per 100 m/z). The peptide corresponding to the red line matched more ions at every peak depth than any other possibility. The actual ambiguity score (Ascore) for this peptide is calculated using only site-determining ions as shown in Figure 3c using information from this plot. (c) The Ascore is a probability-based metric that measures the likelihood that a difference in site-determining ions between two site positions was matched by random chance. In this example, only six b- or y-type ions could potentially differentiate the two phosphorylation sites. A peak depth of six was determined from Figure 3b as the earliest maximal difference in the number of matched ions. The cumulative binomial probability was applied as in Figure 3b but using only the site-determining subset of ions. An Ascore of 53.57 would represent a probability of less than 1 in 200,000 of matching a difference of at least 5 ions in 6 trials by random chance. Any of these 5 ions, if not due to chance, can differentiate between the two potential sites. http://ascore.med.harvard.edu/ Supports Sequest results only, Linux only Beausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP (2006) Nat Biotechnol 24:1285–1292. 11/16/2018 6:37:57 AM

Phosphosite Localization Scoring P = (k!/[n!(n-k)!] [pk] [(1-p) (n-k) ]) = (k!/[n!(n-k)!] [0.04k] [(0.96) (n-k) ]) PTM score = -10 x log (P) p: 0.04 - use the 4 most intense fragment ions per 100 m/z units n: total num possible b/y ions in the observed mass range for all possible combinations of PO4 sites in a peptide k: number of peaks matching n Figure S6. Derivation of Phosphorylation Site Probabilities from the PTM Score (A) The matrix represents amino acid positions for a doubly phosphorylated peptide, with red positions indicating candidate phosphorylation sites. The second position is phosphorylated in each of the top four PTM scores but the second phosphorylation site could be located at position 4,5,6 or 7. (B) The table shows a specific example of this situation (phosphopeptide of Eps8). The five top scoring possibilities for phosphorylation have PTM scores from 30.4 to 29.64. Corresponding inverted probabilities add up to 4954.31, which is set equal to one. P-site is the proportional probability for each possibility and is assigned to the two phosphorylation sites in each case. Next, probabilities are summed up for each candidate site and sites with less than 0.25 are discarded. In the example, site 2 would be a class I site (p>0.75). Sites 4 and 7 are class III sites unless they also match a known kinase motif, in which case they are class II sites. Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Cell (2006), 127 (3), 635–48. Olsen, J.V., and Mann, M. Proc. Natl. Acad. Sci. USA. (2004) 101, 13417–13422. 11/16/2018 6:37:57 AM

True Probability or Just Effective Scores? Peak selection assumptions All regions of spectrum equally likely multiply charged fragements below precursor some 100-300 m/z values not possible dipeptide AA combinations Tall and short peak intensities equally diagnostic Fragment ion type assumptions All ion types equally probable Neutral losses ignored, y-H3P04, y-H2O 11/16/2018 6:37:57 AM

Spectral Matching if Modified & Unmodified Peptides Present FIG. 1. Identification of a novel modification on a peptide belonging to human saliva PRP. A, 9-min integrated survey scan showing two ions separated by 12.000 Da. B, CAD spectrum of the lowest mass ion in the survey scan identified as peptide GPPQQGGHQQ from PRP. The inset shows the mass deviation of the fragment masses for this identification. C, CAD spectrum of the 12.000-Da peptide. Note the similarity between this spectrum and the one depicted in B. Full sequence cleavage is achieved, and no fragment mass deviates more than 6 mDa. ModifiComb - Savitski, MM; Nielsen, ML; and Zubarev, RA. Mol Cell Proteomics 5, 935–948, 2006. 11/16/2018 6:37:57 AM

Software Tools Specialized for Identifying Modifications and Localizing Sites Ascore Beausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP (2006) Nat Biotechnol. 24, 1285–1292. MaxQuant Cox J, Mann M.(2008) Nat Biotechnol. 26, 1367 - 1372. Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M. (2006) Cell. 127, 635–48. Inspect, MS-Alignment, PTMFinder Tanner S, Payne S, Dasari S, Shen Z, Wilmarth PA, David L, Loomis WF, Briggs SP, Bafna V. (2008) J Proteome Res. 7, 170–181. Payne S, Yau M, Smolka MB, Tanner S, Zhou H, Bafna V. (2008) J Proteome Res. 7, 3373–3381. Tsur D, Tanner S, Zandi E, Bafna V, Pevzner P. (2005) Nat Biotechnol. 23, 1562–1567. Tanner S, Shu H, Frank A, Wang LC,Zandi E, Mumby M, Pevzner P, Bafna V. (2005) Anal Chem. 77, 4626-4639. PhosphoScore Ruttenberg BE, Pisitkun T, Knepper MA, Hoffert JD. (2008) J Proteome Res. 7, 3054-9. Debunker Lu B, Ruse C, Xu T, Park SK, Yates J 3rd. (2007) Anal Chem. 79, 1301-10. SloMo - ETD/ECD Bailey CM, Sweet SM, Cunningham DL, Zeller M, Heath JK, Cooper HJ. (2009) J Proteome Res. 8, 1965-71. ModifiComb Savitski MM, Nielsen ML, Zubarev RA. (2006) Mol Cell Proteomics. 5, 935–48. 11/16/2018 6:37:57 AM

Pause for Questions 11/16/2018 6:37:57 AM

Expect Woes & Nuisances Sample Handling Chemistry Carbamylation +43 nterm, Lys urea in digest buffer Deamidation +1 N -> D sample in acid pyroGlutamic acid -17 nterm Q sample in acid pyroCarbamidomethyl Cys -17 nterm C sample in acid Oxidized Met +16 M gels Cys alkylation reagent +x n-term, W side reaction 11/16/2018 6:37:57 AM

Stinkers (b-NH3) & Pyroglutamic Acid -17 Da Q to q (R)Q L/Q/L/A|Q/E/A|A Q\K(R) P(m/z)-NH3 (R)q L/Q|L|A|Q|E|A|A\Q\K(R) 11/16/2018 6:37:57 AM

Deamidation of Asn +1Da Asn –NH + O = Asp ionsource.com

Deamidation G S/E/S|G|I|F|T|n\T K G S/E/S|G|I|F|T|D\T K 18.35 96.9% +0.007 Da G S/E/S|G|I|F|T|D\T K G S/E S\G\I\F\T\N/T K 6.62 43.4% +0.986 Da 11/16/2018 6:37:57 AM

Carbamylation from Urea in Digest Buffer +43Da CNHO +43Da 11/16/2018 6:37:57 AM

Carbamylated N-term I/G/E|G/T/y/G V|V|Y\K unmodified +43 b ions N-term P(m/z)-CNHO +43 b ions N-term Carbamylated P(m/z)-CNHO-H2O 11/16/2018 6:37:57 AM

Unimod Resource for Masses of Modifications http://www.unimod.org/modifications_list.php 11/16/2018 6:37:57 AM

Delta Mass Resource for Masses of Modifications http://www.abrf.org/index.cfm/dm.home 11/16/2018 6:37:57 AM

RESID Resource for Masses of Modifications http://www.ebi.ac.uk/RESID/ 11/16/2018 6:37:57 AM

Broad Institute of MIT and Harvard Acknowledgements Broad Institute of MIT and Harvard Steven Carr Philipp Mertins Pierre-Alain Binz GeneBio Phenyx Robert Chalkley University of California San Francisco O-GlcNAc John Cottrell Matrix Science Mascot Chris Miller Agilent Technologies Spectrum Mill Sean Seymour Applied Biosystems Protein Pilot, Paragon 11/16/2018 6:37:57 AM

iPRG-2010: Proteome Informatics Research Group Study - Phosphopeptide Identification In this study, an LC-MS/MS dataset from a lysate digested with trypsin and enriched for phosphopeptides using strong cation exchange fractionation followed by immobilized metal affinity chromatography (SCX/IMAC) will be provided. Participants are asked to return a list of identified peptides and localized phosphorylation sites Requests to participate must be submitted by e-mail to iPRG2010@gmail.com prior to Monday, November 30, 2009. Please include the words “iPRG Study 2010 request” in the subject line and provide contact name and affiliation in the body of the message. http://www.abrf.org 11/16/2018 6:37:57 AM