Automatic annotation of N-glycan species in MALDI-TOF-TOF spectra for rapid profiling and comparing Chuan-Yih, Yu 2010.05.14 Capstone Advisor: Prof. Haixu.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

The Proteomics Core at Wayne State University
A Multi-PCA Approach to Glycan Biomarker Discovery using Mass Spectrometry Profile Data Anoop Mayampurath, Chuan-Yih Yu Info-690 (Glycoinformatics) Final.
MS-Viewer – A Web Based Spectral Viewer For Database Search Results Peter R. Baker 1, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry.
Evaluation of Peptides in Wisconsin Beer Mckenna L. Missfeldt, Dr. Jennifer Grant, University of Wisconsin-Stout Abstract Matrix-assisted laser desorption/ionization.
MALDI-TOF Mass Spectrometry and Introduction to Proteomics Dr. Steve Hartson Oklahoma State University Dept. Biochemistry and Molecular Biology Recombinant.
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Biomarker discovery by automatic annotation of N-glycan species in MALDI-TOF-TOF spectra Chuan-Yih, Yu Capstone Advisor: Prof. Haixu Tang Indiana.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
PROTEIN IDENTIFICATION BY MASS SPECTROMETRY. OBJECTIVES To become familiar with matrix assisted laser desorption ionization-time of flight mass spectrometry.
Machine Learning techniques for biomarker discovery in proteomic pattern data Elena Marchiori Department of Computer Science Vrije Universiteit Amsterdam.
Biomarker discovery by automatic annotation of N-glycan species in MALDI-TOF-TOF spectra Chuan-Yih, Yu Capstone Advisor: Prof. Haixu Tang.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
ProReP - Protein Results Parser v3.0©
Computational Methods for Biomarker Discovery in Proteomics and Glycomics Vijetha Vemulapalli School of Informatics Indiana University Capstone Advisor:
Basics of 2-DE and MALDI-ToF MS
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Announcements: Proposal resubmissions are due 4/23. It is recommended that students set up a meeting to discuss modifications for the final step of the.
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry,
Proteomics Josh Leung Biology 1220 April 13 th, 2010.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
EUROCarbDB CCRC – Database for high quality mass spectrometry data Khalifeh Al Jadda 1, Haseeb Yousef 1, Kitae Myong 1, Srikalyan Swayampakula 1, David.
Chapter 9 Mass Spectrometry (MS) -Microbial Functional Genomics 조광평 CBBL.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Automatic annotation of N-glycans in MALDI-TOF spectra for rapid glycan profiling and comparison Chuan-Yih, Yu Capstone Presentation Advisor:
Mass Spectrometry I Basic Data Processing. Mass spectrometry A mass spectrometer measures molecular masses. The mass unit is called dalton, which is 1/12.
UPDATE! In-Class Wed Oct 6 Latil de Ros, Derek Buns, John.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
LC-MS Based Detection and Quantification of N-glycans in Human Serum Samples Tsung-Heng Tsai¹, Minkun Wang¹, Cristina Di Poto¹, Yi Zhao¹, Yunli Hu², Shiyue.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s.
High throughput Protein Measurement Techniques Harin Kanani.
Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac.
Clustering of MS/MS spectra for glycan biomarker discovery Anoop Mayampurath, Chuan-Yih Yu.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
SVM-based techniques for biomarker discovery in proteomic pattern data Elena Marchiori Department of Computer Science Vrije Universiteit Amsterdam.
Glycan database. Database of molecules Two models (of vocabularies) – Proteins / Nucleic Acids Residues (+ modifications) Genbank / Swissprot – Compounds.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
1 CH908 Structural Analysis by Mass Spectrometry revision lecture. Prof. Peter O’Connor.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Carbonyl-Reactive Tandem Mass Tags for the Proteome-Wide Quantification of N-Linked Glycans Hannes Hahne, Patrick Neubert, Karsten Kuhn, Chris Etienne,
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
Mass Spectrometry makes it possible to measure protein/peptide masses (actually mass/charge ratio) with great accuracy Major uses Protein and peptide identification.
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Volume 67, Issue 6, Pages (June 2005)
Mass spectrometry-based proteomics
V. Protein Chips 1. What is Protein Chips 2. How to Make Protein Chips
Proteomics Informatics David Fenyő
Kiyoko F. Aoki-Kinoshita Dept. of Bioinformatics, Soka University
Pierre P. Massion, MD, Richard M. Caprioli, PhD 
Proteomics Informatics David Fenyő
Volume 67, Issue 6, Pages (June 2005)
Presentation transcript:

Automatic annotation of N-glycan species in MALDI-TOF-TOF spectra for rapid profiling and comparing Chuan-Yih, Yu Capstone Advisor: Prof. Haixu Tang Indiana University Bloomington School of Informatics and Computing

Outline Introduction –Glycoprotein, Monosaccharides, N-linked glycosylation, and Mass Spectrometry Problem set Goals MultiNGlycan Result Future works 1

Introduction Post-Translation Modification (PTM) –An enzyme-catalyzed change after synthesized –Acetylation, Cleavage, Glycosylation, Methylation, Phosphorylation, and Prenylation 50% of all eukaryotic proteins are glycosylated 1 [Apweiler, et al.] 1.Apweiler, R., H. Hermjakob, and N. Sharon, On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta, (1): p

Glycosylation N-linked glycosylation –Core structure – 2 GlcNac + 3 Man –Asn-X-Ser or Asn-X-Thr, X can be any but Pro (glycosylation sequon) –Glycosylation before folding O-linked glycosylation –Many different core structures –Serine or Threonine –Glycosylation after folding 3

N-linked glycosylation Tree structure Monosaccharides- building blocks of polysaccharide chain Diverse linage – at most four branches Three types of N-linked glycan tree –High mannose –Complex –Hybrid Graphs: Varki, A., Essentials of glycobiology. 2nd ed. 2009, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press. xxix, 784 p NameMolecular formula/ Structure Mannose (Man)C 6 H 12 O 6 Galactose (Gal)C 6 H 12 O 6 Fucose (Fuc)C 6 H 12 O 5 GlcNacC 8 H 15 NO 6 NeuNACC 11 H 19 NO 9 NeuNGCC 11 H 19 NO 10 4

Mass Spectrometry Wright scale of molecular Ion Source –Electrospray ionization (ESI) –Matrix-assisted laser desorption/ionization (MALDI) Mass Analyzer –Time of flight (TOF) –Quadrupole –Fourier transform mass spectrometry (FTMS) Detector –Charge induced or the current produced 5

MALDI-TOF-TOF Matrix-assisted laser desorption/ionization Time of flight (TOF) 6 Graph:MALDI-TOF Mass Analysis. (2008, 11 16). Retrieved May 2, 2009, from The Protein Facility of the Iowa State University Office of Biotechnology

Problem Sets Glycopeptide isotope pattern overlap Graphs: Isotope Pattern Calculator v GlcNac + 9 Man = GlcNac + 3 Man = Mass% %

Problem Sets High-throughput glycans profiling 8

Goals Glycans profile correlation –Report scores for non-overlap and overlap profile –Glycans examination Glycan profile comparison –Report significant glycan between groups –Glycans biomarker discovery 9

Glycans Profile Correlation For each glycan combination –412 different glycan combinations [Krambeck, et al. ] 1 –Generate a theoretical isotope pattern –Calculate the correlation for following cases 1.Glycans 2.Glycans + Glycans, linear combination applied 3.Glycans + Unknown, linear combination applied Mercury algorithm 2 –Generate the unknown isotope pattern 2.Rockwood, A., S. Van Orden, and R. Smith, Rapid Calculation of Isotope Distributions. Analytical Chemistry, : p Krambeck, F.J. and M.J. Betenbaugh, A mathematical model of N-linked glycosylation. Biotechnol Bioeng, (6): p

Three Cases Experiment spectrum Glycans α α Unknown Score 11 Theoretical isotope pattern β β

Glycan Profile Comparison Multiple spectra comparison Biomarker discovery –Given spectrum with several conditions –Find distinct glycans between samples Graph: Ressom, H.W., et al., Analysis of MALDI-TOF mass spectrometry data for discovery of peptide and glycan biomarkers of hepatocellular carcinoma. J Proteome Res, (2): p HCC: Hepatocellular Carcinoma ( Cancer of liver) CLD: Chronic liver disease 12

Concept Health spectra (H 1, H 2, H 3 …H k ) Disease spectra (D 1, D 2, D 3 …D k ) Remove the least significant component. Repeat until all the score above threshold. 1.Hastie, T., et al., 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol, (2): p. RESEARCH % identical with a cutoff at

Multi N-Glycan Software Requirements –.net framework 2.0 using C# –C++ runtime –R –Thermo Scientific Xcalibur Input –Spectrum Plain text (Peak list), mzXML 1,RAW (Thermo Scientific raw file) –Glycans list CSV file (User-defined) Output –List of glycans with scores 1.Pedrioli, P., et al., A Common Open Representation of Mass Spectrometry Data and its Application in a Proteomics Research Environment. Nature Biotechnology, (11): p

Software Interface 15

Software features Signal preprocessing provided –Subtracting background –Smoothing peak –Tolerating Mass Spectrometry accuracy Flexible parameters incorporate actual experiment Useful tools provides –Isotope pattern generator Content rich output, multi-format supports –csv, text, html 16

Software screenshot 17 Html result export

Software screenshot 18

Result Data set –Liver Cancer : 73 individuals –Health: 78 individuals 412 glycan structures are tested Glycan criterion –Correlation score cut off < 0.5 –Present in 30% of total spectra 19 Zhiqun T., et al., Identification of N-Glycan Serum Markers Associated with Hepatocellular Carcinoma from Mass Spectrometry Data. J Proteome Res, 2009 Ressom, H.W., et al., Analysis of MALDI-TOF mass spectrometry data for discovery of peptide and glycan biomarkers of hepatocellular carcinoma. J Proteome Res, (2): p Anoop M., Chuan-Yih Y., A Multi-PCA Approach to Glycan Biomarker Discovery using Mass Spectrometry Profile Data. I690 project, 2009 Fall

Result Filtered out Can’t find the glycan structure in CFG database 20 Correlation score Overlap with 2192

Result 21

Future Works Test on more clinical samples Verify the correlation between glycan modification which reported by MultiNGlycan with Hepatocellular arcinoma Perform these tasks on O-linked glycan Apply de novo glycan sequencing on reported glycan (ongoing) 22

References Anoop M., Chuan-Yih Y., A Multi-PCA Approach to Glycan Biomarker Discovery using Mass Spectrometry Profile Data. I690 project, 2009 Fall Apweiler, R., H. Hermjakob, and N. Sharon, On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta, (1): p Dalit Shental-Bechor and Yaakov Levy, Effect of glycosylation on protein folding: A close look at thermodynamic stabilization, PNAS June 11, 2008 Hastie, T., et al., ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol, (2): p. RESEARCH0003. Krambeck, F.J. and M.J. Betenbaugh, A mathematical model of N-linked glycosylation. Biotechnol Bioeng, (6): p Pedrioli, P., et al., A Common Open Representation of Mass Spectrometry Data and its Application in a Proteomics Research Environment. Nature Biotechnology, (11): p Ressom, H.W., et al., Analysis of MALDI-TOF mass spectrometry data for discovery of peptide and glycan biomarkers of hepatocellular carcinoma. J Proteome Res, (2): p Rockwood, A., S. Van Orden, and R. Smith, Rapid Calculation of Isotope Distributions. Analytical Chemistry, : p Zhiqun, T., et al., Identification of N-glycan serum markers associated with hepatocellular carcinoma from mass spectrometry data. J Proteome Res, (1): p Varki, A., Essentials of glycobiology. 2nd ed. 2009, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press. xxix, 784 p. 23

Acknowledge Advisor: Prof. Haixu Tang Co-worker: Anoop Mayampurath Collaborator: Yehia Mechref, Department of Chemistry COL Lab members This work will present in 26 th May, 58 th ASMS Conference Salt Lake City, Utah and submit to the Bioinformatics Application Notes. 24

Thank You