Metabolomics Data Analysis PCB 5530 Tom Niehaus Fall 2016
Data Analysis Goals • Huge data files • Identify all peaks In practice this is very difficult if not impossible Time [min] Normalized Intensity or TIC m/z Chromatogram (GC-MS) 100 50 75 25 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00 20.00 22.00 30 40 60 70 80 90 110 120 130 140 150 160 170 166 97 129 83 61 47 35 119 112 Mass spectrum Peak selector
Data Analysis Identifying peaks • MS libraries can identify peaks (mostly GC/MS), especially when combined with RT information (GC/MS only): e.g. NIST library
Data Analysis Goals • Huge data files • Identify all peaks In practice this is very difficult if not impossible • quantification or semi-quantification of compounds Standards are needed to quantify a metabolite. For rigorous quantification, samples must be spiked with an isotopically labeled standard. • Generally relative abundance of metabolites (e.g. –fold change in KO vs WT) • Various statistical tests to look for differences in the treatment groups e.g. PCA, MCA, ANOVA
Data Analysis Example PCA analysis Metabolomics Approach Reveals Integrated Metabolic Network Associated with Serotonin Deficiency (PMID: 26154191) Figure 2: Multivariate statistical analysis results of serum metabolites in the pCPA-treated mice (n = 40) and the control mice (n = 40).
Data Analysis Identifying peaks • -Fold changes in metabolites can be visualized by a heat map. PMID: 26154191 Figure 3: Heat map denoting fold changes (over normalized means) of the 21 biomarkers in mice injected with increasing dosages of pCPA and the control mice (n = 10).
Example output of a metabolomics experiment Data Analysis Example output of a metabolomics experiment • Open GC-TOF-MS dataset from class website: -How many compounds identified? How many significant -fold changes -Pathway analysis at http://www.metaboanalyst.ca/MetaboAnalyst/ -enter compound names or KEGG IDs for significant -fold changes -choose organism ‘E. coli’ and submit Which pathways are affected in this dataset? • Open HILIC-TOF-MS dataset from class website: -How many compounds identified? How many significant -fold changes -How many unidentified peaks? -Can you identify an unknown peak with a significant fold change
Activity 1: Identifying peaks Data Analysis Activity 1: Identifying peaks • Can you find sucrose in a MS dataset? Example: sucrose (C12H22O11)
Why is resolution important? Mass Spectrometry Why is resolution important? • High resolution is needed to determine the accurate mass • High resolution is also needed to determine accurate isotopic patterns • Note: -monoisotopic vs ave mass
Adduct formation – expect the unexpected …around 290 different adducts Statistics: Adducts in NIST12 MS/MS DB (80,000 spectra) Most common adducts for LC-MS ([M+H]+ [M+Na]+ [M+NH4]+ [M+acetate]+)
Activity 1: Identifying peaks Data Analysis Activity 1: Identifying peaks • Accurate mass can help determine the chemical formula: Example: sucrose (C12H22O11) -Determine monoisotopic mass at http://www.chemspider.com/ (342.116211 Da) -Determine M+H from MS adduct excel sheet (class website) (343.123487 Da) Lets say you find that mass in the dataset, but is it really sucrose? -Download Molecular weight calculator at http://www.alchemistmatt.com/mwtwin.html -Open formula finder under tools -enter molecular weight target: 342.116211 -how many isobars are at 2 ppm? 0.1 ppm -enter 342.116211 at chemspider, how many isomers?
Heavy-acetate labeled metabolite Data Analysis Using stable isotopes • Unwanted metabolites can be acetylated to target them for breakdown or disposal Unwanted metabolite acetyl-CoA O CH3 Targeted for secretion / breakdown • Labeling cells with heavy-acetate (13C2,2H3) can produce heavy-acetylated compounds 13C metabolite heavy-acetyl-CoA O 13CD3 Heavy-acetate labeled metabolite +5.0255 shift
Data Analysis Using stable isotopes • Experiment: find acetyl-maltose in E. coli acetyl maltose KEGG ID C02130 chemspider ID: 388731 monoisotopic mass = 384.126770 • Samples given either natural-acetate or heavy-acetate
Homework Use what you have learned about mass spectroscopy to answer the following questions: You find a parent ion with m/z = 760.4861. How do you determine its identity? How would you search for an expected compound in a MS dataset (e.g. sucrose)?