Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University Medical Center
The challenge Identify glycopeptides in large-scale tandem mass-spectrometry datasets Many glycopeptide enriched fractions Many tandem mass-spectra / fraction Good, but not great, instrumentation QStar Elite – CID, good MS1/MS2 resolution Strive for hypothesis-generating analysis Site-specific glycopeptide characterization Glycoform occupancy in differentiated samples 2
Observations Oxonium ions (204, 366) help distinguish glycopeptides from peptides… …but do little to identify the glycopeptide Few peptide b/y-ions to identify peptides… …but intact peptide fragments are common If the peptide can be guessed, then… …the glycan's mass can be determined 3
Observations 4
Glycopeptide Search Strategy Glycan-Peptide to Spectrum Matches Multi-Peptide, Multi-Glycan Mass (Single Peptide), Single Glycan Mass, Single Glycan (Topology) 5
Compromises Single protein / Simple protein mixture Few peptides to distinguish Single N-glycan per peptide Subtraction from precursor Digest may not resolve site Need peptide/glycan fragments to distinguish Isobaric peptide-glycan pairs are not resolved Need peptide/glycan fragments to distinguish 6
Glycan Databases Link putative glycan masses to N-linked glycan structures (and organism, etc. ): Human N-linked GlycomeDB Cartoonist structure enumeration CFG Mammalian Array (v5.0) In-house database (Oxford notation) Database(s) provide "biased" search space: Coverage vs. "Reasonableness" Trade off: Time, Specificity, Biology 7
Haptoglobin (HPT_HUMAN) NLFLNHSE*NATAK MVSHHNLTTGATLINE VVLHPNYSQVDIGLIK Haptoglobin standard 8 N-glycosylation motif (NX/ST) * Site of GluC cleavage Pompach et al. Journal of Proteome Research 11.3 (2012): 1728–1740.
Haptoglobin standard 11 HILIC fractions enriched for glycopeptides 11 x LC-MS/MS acquisitions (≥ 15k spectra) 2887/3288 MS/MS spectra have oxonium ion(s) 317 have "intact-peptide" fragment ions 263 spectra matched to peptide-glycan pairs 52% matched single-glycan 8% matched multi-peptide 27 distinct (mass) glycans on 11 peptides Glycans identified on all 4 haptoglobin sites 9
Algorithms & Infrastructure Glycan databases indexed by composition, mass, N-linked, and motif/type Formats: IUPAC, Linear Code, GlycoCT_condensed Implemented: GlycomeDB, Cartoonist, CFG Array Monosaccharide decomposition of glycan mass Böcker et al. Efficient mass decomposition (2005) χ 2 Goodness-of-fit test for precursor cluster Theoretical isotope cluster from composition. ICScore based on χ 2 -test p-value. 10
False Discovery Rate (FDR) How confident can we be in these mass- matches? 11
False Discovery Rate (FDR) How confident can we be in these mass- matches? FDR: 3.9% [ ~ 10 / 263 spectra ] 12
False Discovery Rate (FDR) How confident can we be in these mass- matches? FDR: 3.9% [ ~ 10 / 263 spectra ] Estimate the number of errors by searching with non-N-linked motif (decoy) peptides too. Count spectra matched to decoy peptide-glycan pairs. Rescale decoy counts to balance the number of motif and non-motif peptides. 13
Tuning the filters… Adjusting thresholds and parameters to Increase specificity (lower FDR, fewer spectra), or Increase sensitivity (more spectra, higher FDR) 14
Tuning the filters… Oxonium ions: Number & intensity Match tolerance "Intact-peptide" fragments: Number & intensity Match tolerance Glycan composition: ICScore Constrain search space Match tolerance Glycan database: Constrain search space Match tolerance Precursor ion: Non-monoisotopic selection Sodium adducts Charge state Peptide search space: Semi-specific peptides Non-specific peptides Peptide MW range Variable modifications 15
Tuning the filters… 16
Tuning the filters… 17
GlycoPeptideSearch (GPS) 1.3 Freely available implementation Windows, Linux Reads open-format spectra (mzXML, MGF) Pre-indexed Glycan databases Human & Mammalian GlycomeDB Mammalian CFG Array (v5.0) User-Named (Oxford notation) In silico digest and N-linked motif identification Automatic target/decoy analysis for FDR 18
Where to from here? Demonstrate utility on new instrument platforms, proteins, samples Develop a scoring model for fragments Re-implement Cartoonist demerits Exploit relationships between MS 2 spectra, MS n spectra Explore application to O-glycopeptides, N-glycans, O-glycans 19
Edwards Lab (Georgetown) Kevin Brown Chandler [NSF] (Poster 32) Goldman Lab (Georgetown) Radoslav Goldman (Poster 6) Petr Pompach Miloslav Sanda (Poster 23) Marshal Bern (Xerox PARC) Cartoonist, Peptoonist Rene Ranzinger (CCRC) GlycomeDB Acknowledgements 20