Constructing high resolution consensus spectra for a peptide library Sergey L. Sheetlin, Yuri A. Mirokhin, Dmitrii V. Tchekhovskoi, Xiaoyu Yang, Stephen E. Stein NIST Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology
Tandem mass spectrometry Sample Ionization m/z ion sorting MS1 precursor ion CID fragmentation m/z ion sorting MS2 product ions Detection (measuring m/z) m/z: mass-to-charge ratio CID: Collision-Induced Dissociation MS: Mass Spectrum
Example of chromatogram (using Thermo Xcaliber Qual Browser) Relative abundance Time (min)
Example of MS1 spectrum (using Thermo Xcaliber Qual Browser) Relative abundance
Example of high resolution MS2 spectrum (shown with NIST MS search program) Relative abundance
Peptides Glu-Thr-Lys ETK Glutamylthreonyllysine C15H28N4O7 Short chains of amino acids connected by amide bonds Glu-Thr-Lys ETK Glutamylthreonyllysine C15H28N4O7
Peptide libraries NIST MS Search libraries of peptide spectra are available at http://chemdata.nist.gov/dokuwiki/doku.php?id=peptidew:cdownload
Nomenclature for peptide CID fragment ions R1 R2 Rn-1 Rn | | | | H2N-CH-CO-NH-CH-CO- …...-NH-CH-CO-NH-CH-CO2H I. A. Papayannopoulos. The interpretation of collision-induced dissociation mass spectra of peptides. Mass Spectrometry Review, 14:49–73, 1995.
Computation of fragment masses
Peptides monoisotopic masses 20 amino acids residues and their monoisotopic masses Ala Arg Asn Asp Cys Glu Gln Gly His Ile A R N D C E Q G H I 71.0371 156.1011 114.0429 115.0269 103.0092 129.0426 128.0586 57.0215 137.0589 113.0841 Leu Lys Met Phe Pro Ser Thr Trp Tyr Val L K M F P S T W Y V 113.0841 128.095 131.0405 147.0684 97.0528 87.032 101.0477 186.0793 163.0633 99.0684
Protein modifications for mass spectrometry In vivo Posttranslational modifications (PTM) are covalent modifications of proteins after its translation. PTMs play role in activity, function of proteins and their interaction with other molecules. Modifications caused by sample preparation. Modification name Composition Monoisotopic mass Carbamidomethyl C2H3NO 57.021464 iTRAQ4plex H12C413C3N15NO 144.102063 Oxidation O 15.994915 Deamidation H-1N-1O 0.984016 Phosphorylation HO3P 79.966331 Glu->pyro-Glu H-2O-1 -18.010565 Gln->pyro-Glu H-3N-1 -17.026549 Protein modifications for mass spectrometry www.unimod.org
Fragments neutral losses Common losses are H2O, NH3, CO, H3PO4,iTRAQ (H12C413C3N15NO). Relative abundance
Peptide “GHVIAAR”; charge 2; modification iTRAQ4plex on the first AA ‘G’ Relative abundance
Experimental and theoretical isotopic peaks Relative abundance Valkenborg, D., Mertens, I., Lemiere, F., Witters, E., Burzykowski, T.: The isotopic distribution conundrum. Mass Spectrom. Rev. 31(1), 96–109 (2012)
Annotation of peaks of experimental spectra
Experimental and theoretical densities Probability density function
Experimental and theoretical densities Probability density function
Experimental and theoretical densities Probability density function
Error of annotation of experimental peaks Probability density function
Statistics of different types of fragment ions (based on limited set of data)
Clustering experimental spectra Set of unidentified MS2 spectra Identification (peptide sequencing) Grouping results by charge, peptide, modifications, collision energy Filtering Clusters of replicate spectra
Peptide sequencing algorithms Database search: comparing theoretical spectra for sequences from a database with the query spectrum MS-GF+, Mascot etc. De novo sequencing: trying to find a peptide optimal in terms of some measure of similarity between its theoretical spectrum and the query spectrum PEAKS, NovoHMM etc. Library search: direct comparison of the query spectrum with identified spectra from a library NIST Mass Spectral Library etc.
Example of experimental spectra from the same cluster Relative abundance
Computing peaks of consensus spectra MS-GF+ 𝑑 1 𝑋 1 𝑑 2 𝑋 2 𝑑 3 Replicate spectra 𝑋 3 𝑑 5 𝑋 5 Consensus
Computing peaks of consensus spectra
Example of log-likelihood
Average fraction of replicates Hypothesis: “good” peaks of the consensus spectrum have properties similar to annotated peaks
Filtering peaks of consensus spectra Replicate number Is there a peak in the replicate corresponding to the given consensus peak with abundance A? 1 Yes 2 … N-1 No N Bernoulli distribution: Yes with probability p; No with probability 1-p
Density of peaks for different relative abundancies
Comparison of consensus and best replicate spectra
Further directions Adjusting the parameters of the method for optimal performance of the existing search algorithms Building peptide libraries of consensus spectra
Acknowledgements NIST MS Data Center Yuri A. Mirokhin Dmitrii V. Tchekhovskoi Xiaoyu Yang Stephen E. Stein William E. Wallace