Biomarker discovery by automatic annotation of N-glycan species in MALDI-TOF-TOF spectra Chuan-Yih, Yu Capstone Advisor: Prof. Haixu Tang
Introduction Post-Translation Modification (PTM) –Nitrosylation –Phosphorylation –Glycosolation 50% of all eukaryotic proteins are glycosylated 1 1.Apweiler, R., H. Hermjakob, and N. Sharon, On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta, (1): p. 4-8
Glycoprotein Protein glycosylation –N-linked glycosylation Core structure – 2 GlcNac + 3 Man Asn-X-Ser or Asn-X-Thr, X can be any but Pro Glycosylation before folding –O-linked glycosylation Core structures Serine or Threonine Glycosylation after folding
Monosaccharides Building blocks Diverse linage Three types N-linked glycan –High mannose –Complex –Hybrid 412 combinations ->7,000 structures 1 Graphs: Varki, A., Essentials of glycobiology. 2nd ed. 2009, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press. xxix, 784 p NameMolecular formula/ Structure Mannose (Man)C 6 H 12 O 6 Galactose (Gal)C 6 H 12 O 6 Fucose (Fuc)C 6 H 12 O 5 GlcNacC 8 H 15 NO 6 NeuNACC 11 H 19 NO 9 NeuNGCC 11 H 19 NO 10 1.Krambeck, F.J. and M.J. Betenbaugh, A mathematical model of N-linked glycosylation. Biotechnol Bioeng, (6): p
Mass Spectrometry Wright scale of molecular Ion Source –Electrospray ionization (ESI) –Matrix-assisted laser desorption/ionization (MALDI) Mass Analyzer –Time of flight (TOF) –Quadrupole –Fourier transform mass spectrometry (FTMS) Detector –Charge induced or the current produced
MALDI-TOF-TOF Graph:MALDI-TOF Mass Analysis. (2008, 11 16). Retrieved May 2, 2009, from The Protein Facility of the Iowa State University Office of Biotechnology
Problem Isotope pattern overlap –Permethylated, Add Sodium 2 GlcNac + 9 Man = 2, GlcNac + 3 Man = 2, High-throughput glycans screening –Find significant differences between groups of sample Graphs: Isotope Pattern Calculator v
Major Features Glycans profile correlation –Report scores for non-overlap and overlap profile –Glycans examination Glycan profiling comparison –Report significant glycan between groups –Glycans biomarker discovery
Glycans Profile Correlation For each glycan combination –412 different glycan combinations –Generate a theoretical isotope pattern –Calculate the correlation for following cases Glycans Glycans + Glycans, linear combination applied Glycans + Unknown, linear combination applied Mercury algorithm 1 1.Rockwood, A., S. Van Orden, and R. Smith, Rapid Calculation of Isotope Distributions. Analytical Chemistry, : p
Three Cases Experiment spectrum Glycans αβ α β Unknown Score
Glycan Profiling Comparison Multiple spectra comparison Biomarker discovery –Given spectrum with several conditions –Find distinct glycans between samples Graph: Ressom, H.W., et al., Analysis of MALDI-TOF mass spectrometry data for discovery of peptide and glycan biomarkers of hepatocellular carcinoma. J Proteome Res, (2): p HCC: Hepatocellular Carcinoma ( Cancer of liver) CLD: Chronic liver disease
Concept Health spectra (H 1, H 2, H 3 …H k ) Disease spectra (D 1, D 2, D 3 …D k ) Remove the least significant component. Repeat until all the score above threshold. 1.Hastie, T., et al., 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol, (2): p. RESEARCH % identical with a cutoff at 0.5
Multi N-Glycan Software Requirement –.net framework 2.0 using C# –C++ runtime –R –Thermo Scientific Xcalibur Input –Spectrum Plain text (Peak list) mzXML 1 RAW ( instrument raw file) –Glycans list CSV file 1.Pedrioli, P., et al., A Common Open Representation of Mass Spectrometry Data and its Application in a Proteomics Research Environment. Nature Biotechnology, (11): p
Software Interface
Html result export Biomarker discovery setting
Result Filtered out Can’t find the glycan structure in CFG database
Result
Future Works Test on more clinical samples Verify the correlation between glycan modification with disease Perform these tasks on O-linked glycan
References Apweiler, R., H. Hermjakob, and N. Sharon, On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta, (1): p Hastie, T., et al., ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol, (2): p. RESEARCH0003. Krambeck, F.J. and M.J. Betenbaugh, A mathematical model of N-linked glycosylation. Biotechnol Bioeng, (6): p Pedrioli, P., et al., A Common Open Representation of Mass Spectrometry Data and its Application in a Proteomics Research Environment. Nature Biotechnology, (11): p Ressom, H.W., et al., Analysis of MALDI-TOF mass spectrometry data for discovery of peptide and glycan biomarkers of hepatocellular carcinoma. J Proteome Res, (2): p Rockwood, A., S. Van Orden, and R. Smith, Rapid Calculation of Isotope Distributions. Analytical Chemistry, : p Tang, Z., et al., Identification of N-glycan serum markers associated with hepatocellular carcinoma from mass spectrometry data. J Proteome Res, (1): p Varki, A., Essentials of glycobiology. 2nd ed. 2009, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press. xxix, 784 p.
Acknowledge Advisor: Prof. Haixu Tang Co-worker: Anoop Mayampurath Collaborator: Yehia Mechref, Department of Chemistry This work will present in 26 th May, 58 th ASMS Conference Salt Lake City, Utah and submit to the Bioinformatics Application Notes.
Thank You