Download presentation
Presentation is loading. Please wait.
PublishDerrick Greer Modified over 8 years ago
1
Canadian Bioinformatics Workshops www.bioinformatics.ca
2
2Module #: Title of Module
3
Module 3 Metabolite Identification and Annotation – Part II
4
Goal of Metabolite Annotation 1234567 ppm
5
Metabolite ID by Spectral Deconvolution (NMR) Mixture Compound A Compound B Compound C
6
Alternatives to Chenomx AMIX (Bruker) AutoFit (automated fitting) MetaboMiner (2D NMR) HMDB (NMR spectral match) PRIMe Spin Assgn (NMR spectral matching server) rNMR and BRMB Peaks Server CCPN-MP
7
AutoFit - Automated NMR Profiling
10
Performance of Autofit Synthetic Real P. Mercier et al. J Biomol NMR. 2011 Apr;49(3-4):307-23
11
NMR Compound ID from Mixtures - MetaboMiner Raw TOCSY Spectrum ID’d Compounds http://wishart.biology.ualberta.ca/metabominer/
12
MetaboMiner Software Design Standard reference libraries –225 TOCSY spectra –488 HSQC spectra –Specialized sub-libraries for CSF, plasma and urine Algorithms for automatic processing & compound identification –“Minimal signature peaks” –1D 1H peak list as sanity check –Extra dimensional information for identification Support for direct spectral annotation
13
MetaboMiner Performance
14
NMR Compound ID - HMDB Phenyllactate Phenylpyruvate Phenylacetic acid Tropic acid Benzyl alcohol … NMR spectrum of mixture Peak list to HMDB High scoring matches http:///www.hmdb.ca
15
PRIMe Spin Assign http://prime.psc.riken.jp/?action=nmr_search
16
rNMR http://rnmr.nmrfam.wisc.edu/
17
BMRB Peaks Server http://www.bmrb.wisc.edu/metabolomics/query_metab.php
18
CCPN - MP http://www.ccpn.ac.uk/ccpn/projects/metabolomics/
19
Metabolite ID by GC-MS GC -MS total Ion chromatogram
20
EI Breaks up Molecules in Predictable Ways Molecular ion Recall EI MS Generates Multiple Peaks
21
GC-MS Spectrum
22
Recall GC-MS Analytes are Derivatized Methoxime
23
Metabolite ID by GC-MS GC-MS is often best for identification of amino acids, organic acids, sugars, fatty acids and molecules with MW<500 GC has higher resolution and reproducibility than LC EI-MS is more standardized than soft ionization methods, so EI spectra are more comparable Most common route is to use AMDIS + NIST database
24
NIST 11 MS Database 243,893 EI spectra of 212,961 cmpds 9934 ion trap MS for 4649 cmpds 91,557 Qtof & QqQ spectra for 3774 compounds 224,038 RI values for 21,847 cmpds
25
NIST MS Search Software
26
AMDIS (Automated Mass Spectral Deconvolution and Identification System) Noise analysis –Determines background noise level Component perception –Identifies peaks by comparing to noise Spectral deconvolution –Generates a “clean” or model spectrum Compound identification –Identifies compounds via a library search using a match factor
27
Match Factor (MF) Measures the similarity of the MS spectrum of the query to the MS spectrum in the reference database Defined as the normalized dot product of the query and the reference spectra I ref corresponds to the intensities of the reference spectra, I qry corresponds the intensities of the query spectra, M corresponds to the masses (m/z) w is a weighting term to penalize uncertain peaks
28
GC-MS Protocol Prepare a set of external n-alkane standards (8-9 n-alkanes spanning octane to hexadecane) and run as an external calibration standard Run a “blank sample” containing just the solvent and derivatization agents Run the sample of interest (under the same conditions as the blank)
29
GC-MS Protocol External n-alkane standard used for RI calculation
30
GC-MS Protocol Create a calibration file using the n-alkane mixture (sets retention indices [RI’s] to the standard values) Analyze the sample data file against the CAL(calibration)-file for the alkane mixture (sets and recalculates RI's using the n- alkanes) Search the NIST database for matches and displaying the results of the search Get rid of “false” positives by comparing the “blank” against the sample spectrum
31
Step 1- Create Calibration File AMDIS
32
Step 2 – Calibrate Sample Spectrum Using CAL-file
33
AMDIS GC Peak List EI-MS Spectrum For 11.597 Step 3 – Search NIST Database for Matches
34
Match factor 60% (if in doubt compare “blank” and your signal) Step 3 – Search NIST Database for Matches (Zero in) Reference Spectrum Peak Spectrum MF = 84% Match To Valine 73 & 144 are 2 most abund. m/z
35
Other GC-MS Options Alternatives to AMDIS –AnalyzerPro (SpectralWorks) –ChromaTOF (Leco) –Evaluated in TrAC Trends in Analytical Chemistry Volume 27, Issue 3, March 2008, Pages 215-227 Alternatives to NIST08 or NISTII –Golm Database (Open access) –FiehnLib (Leco, Agilent) –HMDB???
36
The Golm Database GC-MS (Quad and TOF) database Contains MSRI (MS + retention index) or MST data for 1450 identified metabolites Includes 10,336 spectra linked to analytes Downloadable libraries compatible with NIST08 and AMDIS software Primary focus on plant metabolites Supports compound name and MS queries MS submissions via NIST08 or AMDIS format
37
Golm Database http://gmd.mpimp-golm.mpg.de/
38
Golm Database
39
The FiehnLib GC-MS Database 2212 EI MS and RI data for quadrupole &TOF GC-MS Over 1000 primary metabolites below 550 Da Covers lipids, amino acids, fatty acids, amines, alcohols, sugars, amino- sugars, sugar alcohols, sugar acids,, and sterolsphosphates, hydroxyl acids, purines
40
Metabolite ID by LC-MS LC -MS total Ion chromatogram
41
Levels of Metabolite Identification in MS 4 levels of metabolite identification Positively identified compounds –Confirmed by match to known standard Putatively identified compounds –Match to MS + RT or MS/MS + RT Compounds putatively identified in a compound class Unknown compounds
42
Metabolite ID by LC-MS LC-MS is often best for identification of lipids, bases, amino acids, organic acids, fatty acids and other somewhat hydrophobic molecules Metabolite ID typically requires both MS and MS/MS data (along with retention time information) and internal standards Compound ID can be done by high accuracy mass matching and/or by MS/MS matching to spectral databases
43
Simple MW Search DBs ChEBI (www.ebi.ac.uk/chebi/) PubChem (http://pubchem.ncbi.nlm.nih.gov/) ChemSpider (www.chemspider.com) HMDB (www.hmdb.ca)
44
PubChem MW Search Available Under “Advanced Search”
45
PubChem Results
46
ChEBI MW Search http://www.ebi.ac.uk/chebi/advancedSearchForward.do
47
Advanced MS Search DBs NIST/AMDIS (http://chemdata.nist.gov) Metlin (http://metlin.scripps.edu/) HMDB (www.hmdb.ca) MassBank (www.massbank.jp)
48
Advanced MS Search DBs These databases support not only MW or MW range searches, but also support parent ion searches (positive, negative, neutral), peak list searches (from MS or MS/MS data) as well as MS/MS spectral matching These DBs are intended more for MS- based metabolomics and compound ID than the simple MW search tools
49
MS Compound ID - HMDB Phenyllactate Phenylpyruvate Atrolactic acid Homovanillin Coumaric acd LC-MS Spectrum Peak list to HMDB High scoring matches http:///www.hmdb.ca
50
MS Compound ID - HMDB Database of ~100,000 predicted masses from ~10,000 known metabolites Includes adduct mass calculations for 30+ possible or expected metabolite adducts Allows selection of different databases (DrugBank, HMDB, FooDB, T3DB), mass tolerance and ionization mode Designed for mixture deconvolution (i.e. identification of multiple compounds at a time)
51
MS/MS Compound ID - HMDB Database of 1000 experimental MS/MS spectra (low, medium and high collision energies) collected on QqQ - but largely valid for ion trap instruments as well Allows selection of different instruments (QqQ, ion trap, FT-MS qTOF), collision energies, ionization modes, parent ion mass tolerance and fragment ion mass tolerance Designed for identification of a single compound at a time
52
Metlin MS Search http://metlin.scripps.edu/metabo_search_alt2.php Step 1: Enter Mass Step 2: Select Charge Step 3: Select “all” Step 4: “Find Metabolites”
53
Metlin Results
54
Metlin MS/MS Search mzXML mzML mzData http://metlin.scripps.edu/upload.php
55
Metabolite ID - Complications LC-ESI-MS often leads to the production of salt adducts, neutral loss species and multiply charged species Up to 50% of LC-MS signals arise from these “noise” sources Key challenge is to distinguish adducts or multiply charged species from parent ions or to group adducts or multiply charged species with parent ions
56
Adduct Formation Effect on ESI Mass Spectrum Sample Na Adducts
57
Common Adducts in DI-MS
58
Fiehn Lab Adduct Table http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/MS-Adduct-Calculator/
59
MZedDB – Adduct Calculator http://maltese.dbs.aber.ac.uk:8888/hrmet/search/genip.php
60
MZedDB – Results for C6H12O6
61
Neutral Loss Fragments
62
Handling MS Complications MZedDB, Metlin and HMDB are able to handle or predict adducts Metlin and MZedDB are able to handle or predict ion pairs or multiply charged species Metlin can potentially handle or predict neutral loss species Searching by MS or MS ranges can lead to lots of hits (high FP rate)
63
Exploiting High Mass Accuracy to ID Compounds 50-200 ppm Linear IonTrap 3 - 5 ppm Triple Quad 3 - 5 ppm Q-TOF 3 - 5 ppm TOF-MS 1 - 2 ppm Magnetic Sector 0.5 - 1 ppm Orbitrap 0.1 - 1 ppm FT-ICR-MS Mass AccuracyType (10 ppm in Ultra-Zoom)
64
Molecular Formula Generators Formula generators are used to create molecular formulae from very accurate masses obtained by FT-MS or OrbiTrap Assist in compound ID by LC-MS (formula is more restrictive than MW) Input typically requires: –Accurate isotopic mass (with or without adduct) –Error in ppm or mDa (milliDaltons)
65
Molecular Formula Generators (MWTWIN) Accurate mass Mass error http://www.alchemistmatt.com/mwtwin.html
66
Molecular Formula Generators (HighChem) http://www.highchem.com/formula-generator/
67
Molecular Formula Generator Server (MZedDB) http://maltese.dbs.aber.ac.uk:8888/hrmet/search/gr.html
68
Finding Compounds By Molecular Formula - PubChem http://pubchem.ncbi.nlm.nih.gov/search/search.cgi
69
Finding Compounds By Molecular Formula - ChEBI http://www.ebi.ac.uk/chebi/advancedSearchForward.do
70
Formula Filters Use additional MS information (isotopic abundance) as well as chemical bonding restrictions (Lewis & Senior rules), known or presumed atomic compositional data and matches to known or hypothesized structures to reduce the possible # of structures/formulas that are generated
71
Fiehn’s 7 Golden Rules (7GR) Formula Filter
72
7GR Software http://fiehnlab.ucdavis.edu/projects/Seven_Golden_Rules/Software/
73
Molecular Formula Space of Small Molecules
74
Frequency Distribution of Molecular Formulas
75
Impact of Mass Accuracy on Formula Numbers
76
Mass + Isotope Abundance Example: ESI-MS (+) of Solanine on a LTQ Resolving Power: 1700 Mass Accuracy: 46 ppm Isotopic Abundance Error: ±1.46% C45H73NO15 MW = 867.49799 [M+H] +
77
Mass Isomers Are Hard To Distinguish by MS Alone Use Retention Time or Isomer Generators to Distinguish
78
Molecular Isomer Generators Example: MOLGEN DEMO (Bayreuth)MOLGEN DEMO Creates all possible structural isomers from a given molecular formula
79
Size of Molecular Isomer Space is Unknown Accurate massFormulaNumber Isomersin Beilstein DB 77.99531CH2O460 78.04293CH6N2O2281 78.03169C2H6O3108 78.02180C4H2N24652 78.01056C5H2O1512 78.04695C6H621729 150.04293C7H6N2O2100,082,479153 150.09054C7H10N466,583,863105 150.03169C8H6O36,717,40490 150.07931C8H10N2O76,307,072542 150.06808C9H10O26,843,602667 150.11569C9H14N29,459,132568 150.02180C10H2N265,563,8280 150.10446C10H14O1,548,3611938 150.01056C11H2O9,414,5090 150.14084C11H1884,051762 150.04695C12H634,030,90512
80
Some Points of Caution Many databases (PubChem, ChEBI, Metlin, FiehnLib, NIST) mix non-metabolites with metabolites or plant metabolites with animal and/or microbial metabolites or drugs/buffer reagents with metabolites This leads to many “silly” hits If you know the source organism use this information to limit the search or use organism-specific metabolome databases (HMDB, FooDB, DrugBank, KnapSack, etc.)
81
Alternatives to Mass Filtering and Mass Matching Use chemoselective labeling (similar to proteomics) to simplify the identification of “true” metabolites, reduce number of signals and eliminate false positives Use MS-based kits (Biocrates) Use concepts in Computer-Aided Structure Elucidation (CASE) to assist in compound ID
82
Quantitative MS Metabolomics With Chemoselective Labeling LC-MS Analysis Mix Pooled AnalysisIndividual Analysis
83
Quantitative MS Metabolomics With Chemoselective Labeling
84
Quantitative MS Metabolomics in Human Urine 2.51mM 30 nM 672 peaks by amino labeling 120 standards spiked 92 peaks identified/quantified 30 nM - 2.51 mM 820 peaks by carboxy labeling Still assessing Guo K. & Li L. Anal Chem. 2009 May 15;81(10):3919-32.
85
Advantages to Derivitization Tags can convert non-UV active compounds into UV or fluorescently detectable cmpds Tags improve ionization efficiency and lower limit of detection Tags permit affinity purification and concentration Tags make polar molecules hydrophobic, leading to better LC separations Tags permit isotope based quantification Tags greatly increase # compounds detected Tags allow independent confirmation of “real” peaks Best route to automated ID & quantification by LC-MS
86
BioCrates IDQ Kit 40 acylcarnitines, 13 amino acids, 15 LysoPCs, 77 PCs, 15 SMs = 160
87
Multiple Reaction Monitoring Q1 Q3 CH 3 CD 3
88
Sample Urine Metabolite List Concentration range from 10 nM to 7.2 mM (1,000,000 X concentration) Arginine 38.7 uM Tyrosine 204.0 uM C14:2 Carn 0.03 uM C4:1 Carn 0.235 uM C8 Carnitine 1.05 uM PC(36:5) aa 0.011 uM LysoPC-20:4 0.039 uM SM(22:3) 0.016 uM Glutamine 531.0 uM Valiine 37.0 uM C14:2-OH 0.02 uM C5 Carnit 4.39 uM C9 Carnitine 1.37 uM PC(38:5) aa 0.016 uM LysoPC-6:0 0.073 uM SM(24:0) 0.342 uM Glycine 922.0 uM Leu/Ile 128.0 uM C16 Carn 0.021 uM C6-OH Carn 0.703 uM PC(28:1) aa 0.059 uM PC(42:4) aa 0.010 uM SM(OH)16:1 0.020 uM SM(24:1) 0.206 uM Histidine 1146.0 uM Carnitiine 73.2 uM C16-OH Cr 0.035 uM C5-M-DC 0.531 uM PC(30:2) aa 0.009 uM PC(38:3) ae 0.021 uM SM(OH)22:1 0.065 uM SM(26:0) 0.020 uM Methionine 15.6 uM C10 Carn 0.324 uM C16:1-OH 0.035 uM C5-OH Carn 1.46 uM PC(34:1) aa 0.094 uM PC(38:4) ae 0.025 uM SM(OH)22:2 0.060 uM SM(26:1) 0.014 uM Phenylalanin 52.7 uM C10:1 Carn 1.83 uM C2 Carnitine 45.2 uM C5:1 Carn 1.84 uM PC(34:2) aa 0.087 uM PC(38:5) ae 0.092 uM SM(OH)24:2 0.015 uM Glucose 2264 uM Proline 42.9 uM C10:2 Carn 0.796 uM C3 Carnitine 2.12 uM C5:1-OH 0.367 uM PC(34:4) aa 0.009 uM PC(38:6) ae 0.068 uM SM(16:0) 0.352 uM Creatinine 7222 uM Serine 408.0 uM C12 Carn 0.203 uM C3-OH Carn 0.163 uM C6 Carnitine 0.814 uM PC(36:1) aa 0.053 uM PC(40:5) ae 0.014 uM SM(16:1) 0.001 uM Threonine 220.0 uM C14 Carn 0.063 uM C4 Carnint 11.0 uM C6:1 Carnt 0.294 uM PC(36:3) aa 0.054 uM PC(42:3) ae 0.012 uM SM(18:1) 0.023 uM Tryptophan 15.0 uM C14:1-OH 0.016 uM C4-OH Carn 0.405 uM C8-OH Carn 0.509 uM PC(36:4) aa 0.051 uM PC(44:3) ae 0.014 uM SM(20:2) 0.020 uM
89
CASE – Computer-Aided Structure Elucidation Two approaches – Bottom Up and Top Down Top-Down uses known metabolites and generates variants (via metabolic transformation or other bio-informed methods). Properties/spectra/MW are predicted and then compares them to observed spectra/properties of unknown Bottom-Up uses known fragments of molecules, assembles the fragments into logical structure, predicts the properties/spectra and compares to observed spectra/properties of unknown
90
Top - Down CASE Methods Known metabolites (20,000) Predicted biotransformations (20,000 --> 200,000) Predicted MS, MS/MS, NMR, GC-MS Spectra Match observed spectra to predicted specta to ID
91
Bottom-Up (Traditional) CASE Known metabolite substructures or metabolite EI or CID fragments Match observed spectra to predicted specta to ID Predicted (or kown) MS, MS/MS, NMR, GC-MS fragment spectra Neural Network or GA driven fragment assembly +
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.