Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn.

Slides:



Advertisements
Similar presentations
Gas Chromatography, GC L.O.:  Explain the term: retention time.  Interpret gas chromatograms in terms of retention times and the approximate proportions.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
Improvements in Mass Spectrometry for Life Science Research – Does Agilent Have the Answer? Ashley Sage PhD.
Welcome! Mass Spectrometry meets Cheminformatics Tobias Kind and Julie Leary UC Davis Course 7: Concepts for LC-MS Class website: CHE Spring 2008.
Microspectrophotometry Validation. Reasons for Changing Instruments Reduced reliability. Limited efficiency. Limited availability and cost of replacement.
Personalia: Pre-Sheffield Batchelor’s degree in Chemistry at Oxford Pre-university job in my local public library system Chemistry or information science?
Lecture 8. GC/MS.
Metabolomics DNA RNA Protein Biochemicals (Metabolites) Genomics – 25,000 Genes Transcriptomics – 100,000 Transcripts Metabolomics – 2,800 Compounds Proteomics.
Smart Templates for Chemical Identification in GCxGC-MS QingPing Tao 1, Stephen E. Reichenbach 2, Mingtian Ni 3, Arvind Visvanathan 2, Michael Kok 2, Luke.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Application of Comprehensive Two-Dimensional Gas Chromatography - Mass Spectrometry to Forensic Science Investigations Glenn S. Frysinger Richard B.
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
4. Mass Spectrometry Objectives:
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
2007 GeneSpring MS GeneSpring for Metabolite BioMarker Analysis using Mass Spectrometry data Agilent Q-TOF VIP Visit Jan 16-17, 2007 Santa Clara, CA Thon.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
Understanding mass spectroscopy. Mass spectroscopy is a very powerful analytical tool that can provide information on the molecular mass of a compound,
Purpose of study A high-quality computing education equips pupils to use computational thinking and creativity to understand and change the world. Computing.
Raul Garcia-Sanchez Research Investigator: Dr. Paul R. Mahaffy Code 699, NASA Goddard Space Flight Center Research Mentor: Dr. Prabhakar Misra Department.
DCS Overview MCS/DCS Technical Interchange Meeting August, 2000.
Introduction to Analytical Chemistry Dr M. Abd-Elhakeem Faculty of Biotechnology General Chemistry Lecture 7.
Peak-purity by LC-MS and LC-DAD Knut Dyrstad Erlend Hvattum Sharon Jara Arnvid Lie.
Mass spectrometry session. Summary Fiehn (1) Standardization important Reporting important, but has to be feasible Does not matter which MS instrument.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
TPD Workstation Software TPD software for control/ analysis of experiments Multiple temperature steps. Optimised data acquisition.
MASS SPECTROMETRY. CONTENTS Prior knowledge Background information The basic parts of a mass spectrometer The four stages of obtaining a spectrum How.
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Metabolomics Metabolome Reflects the State of the Cell, Organ or Organism Change in the metabolome is a direct consequence of protein activity changes.
Application of Method 1668A to the Analysis of Dioxin-Like PCBs and Total PCBs in Human Tissue and Environmental Samples Coreen Hamilton, Todd Fisher,
Combined techniques problems L.O.:  Analyse absorptions in an infrared spectrum to identify the presence of functional groups in an organic compound.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
# load data originaldata = load_data_from_csv(rawdatafile) #filter out a range filtered = range_filter({:min=> 20,:max =>50},originaldata) # sum normalize.
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
Low lightHigh light High light response in Arabidopsis thaliana 4 days 1100 transcripts change Anthocyanin light response mutant.
Semi-targeted UPLC-MS analysis of phenylpropanoids in Arabidopsis Jiří Grúz, LGR.
Dr Saleha Shamsudin. 1.INTRODUCTION Topics to be covered: 1.1 INTRODUCTION TO METHODS IN ANALYTICAL CHEMISTRY 1.2 STEPS IN QUANTITATIVE ANALYSIS.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Metabolomics MS and Data Analysis PCB 5530 Tom Niehaus Fall 2015.
IIC Information Flow Interesting ions? Priority list of interesting ions Empty priority list? QA/QC? Peptide identification Protein identification External.
A Study in Hadoop Streaming with Matlab for NMR data processing Kalpa Gunaratna1, Paul Anderson2, Ajith Ranabahu1 and Amit Sheth1 1Ohio Center of Excellence.
PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.
Canadian Bioinformatics Workshops
Computational Challenges in Metabolomics (Part 1)
Data independent acquisition methods for metabolomics Stephen Tate, Ron Bonner AB SCIEX, 71 Four Valley Drive, Concord, ON, L4K 4V8 Canada A high resolution.
MASS SPECTROSCOPY (with Gas Chromatography). 5 Stages of the process – where do they happen? 1.Sample vaporised 2.Sample ionised 3.Ions accelerated 4.Ions.
이 장 우. 1. Introduction  HPLC-MS/MS methodology achieved its preferred status -Highly selective and effectively eliminated interference -Without.
Metabolomics Data Analysis
Big data classification using neural network
Bottom-Up Proteomics Data collection
Accelerating Research in Life Sciences
Accelerating Research in Life Sciences
Fig. 1. proFIA approach for peak detection and quantification
System Design.
Ivana Blaženović Postdoctoral Researcher
Jan Stanstrup Bioactive Foods and Health
This teaching material has been made freely available by the KEMRI-Wellcome Trust (Kilifi, Kenya). You can freely download,
Presentation Title NEMC 2018 Dale Walker, Bruce Quimby Agilent
MS Review.
Douglas Walker 1, Karan Uppal 2, Dean Jones 2, Tianwei Yu 3,*
Standards Development for Metabolomics
Softberry Mass Spectra (SMS) processing tools
NoDupe algorithm to detect and group similar mass spectra.
Consortium: National networks in 16 European countries.
Consortium: National networks in 16 European countries.
Comprehensive, Non-Target Characterization of Environmental Exposome Samples Using GC×GC and High Resolution Time-of-Flight Mass Spectrometry James Carlson,
Operation manual of AI SIDA
Presentation transcript:

Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn 3, Nigel Hardy 1 and Birgit Linkohr 3 1.Department of Computer Science, University of Wales, Penglais, Aberystwyth, SY23 3DB 2.Institute of Biological Sciences, University of Wales, Penglais, Aberystwyth, SY23 3DB 3.Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany The authors gratefully acknowledge the UK Food Standards Agency for the funding of this project. 1. Introduction ArMet (Architecture for Metabolomics) aims to provide a standard representation of the data associated with metabolomics research. It is designed to encompass the entire timeline of metabolomics experiments from descriptions of the biological source material and the experiments themselves through, sample growth, collection and preparation for analysis by technologies such as GC-MS, NMR, FTIR, to the results of those analyses. It does this by way of nine packages that define core data applicable to a wide range of experiments. Extensive requirements analysis has resulted in sub-packages for ArMet which describe sample growth, collection and preparation for GC-MS analyses for plant metabolomics experiments. The aim of these sub-packages is to provide sufficient data on each experiment to enable meaningful comparison of samples analysed in different laboratories on different GC-MS platforms using statistical and data mining techniques. They, therefore, maintain data on as many of the sources of variability evident in the experimental process as possible. This poster describes work carried out to develop a sub-package for ArMet that describes the results of GC-MS analysis carried out upon the samples produced during a plant metabolomics experiment. 2. GC-MS Data There are a variety of types of metabolomics analysis:  Fingerprinting: Fingerprint analysis involves the development of complete metabolome descriptions for samples without knowledge of the chemical identities of the compounds that the samples contain. TIC (Total Ion Current) chromatograms are examples of such descriptions which may be used to globally compare the metabolomes of samples.  Targeted analysis: Targeted analysis involves detection and precise quantification of a single or small set of target compounds within the metabolome of a sample.  Metabolite Profiling: Profiling involves the detection, identification and approximate quantification of a large set of target compounds within the metabolome of a sample.  Metabolomics: Metabolomics involves the detection, tentative identification and approximate quantification of as many of the compounds within the metabolome of a sample as possible. The ArMet GC-MS sub-package supports all four of these types of analysis. Fingerprinting datasets have a simple structure which represents the raw data from instrumental analysis, i.e. values from a TIC chromatogram and its associated mass spectra or a summed mass spectrum. However, the output from instrumental analysis must be processed to produce the quantified list of metabolites that comprise targeted analysis, metabolite profiling or metabolomics metabolome descriptions. 3. Data Pre-Processing We identified the stages of pre-processing as follows:  Noise removal: Thresholding mass intensities and reconstructing the chromatogram accordingly.  Peak detection: Analysis to decided the start and end of peaks in the TIC chromatogram.  Peak deconvolution: Analysis of mass spectra to identify separate and co-eluting compounds.  Peak quantification: Determining the area under the peaks in order to quantify the individual metabolites either with respect to one another or to an internal standard. 4. Operating Procedure Factors that Affect Dataset Comparability Each of the stages of pre-processing was examined and characterised on the assumption that this will give data analysts the greatest opportunity for meaningful comparison of data collected under different sets of procedures. This characterisation led to descriptions of the stages in the following terms:  The software and algorithms used to perform the different stages of pre-processing.  The parameters to the software and algorithms employed.  Any procedures for manual adjustment or correction of the automatic output of each stage. In the absence of standardised operating procedures in this area, it was decided that the ArMet GC- MS module should annotate the datasets that it maintains with these descriptions of the stages of pre-processing. 6. Results Following our investigations we modelled the data required to represent the results of GC-MS plant metabolomics profiling experiments. Our model has the following features:  Support for all four types of metabolomics analysis.  Metabolomics datasets containing peaks described by retention and area information and labelled with the 20 most abundant mz values, and associated relative intensities, from the mass spectrum that best describes each one. Each peak also has a set of zero of more candidate chemical identities annotated with provenance and confidence.  Targeted analysis and metabolite profiling datasets annotated with information about the external standards or target compounds that they contain.  Fingerprinting datasets as either TIC chromatograms with or without associated mass spectra or summed mass spectra.  Descriptions of the stages of pre-processing.  A reference to the archive of the raw data We have represented our model using the Unified Modelling Language (UML) and translated this representation into a database implementation which is undergoing evaluation. 5. Peak Labelling Peak identification for targeted analysis or metabolite profiling involves the comparison of the peaks for a sample with external standards or a reference list of target compounds. These datasets should, therefore, be annotated with these details. Peak identification for metabolomics involves the use of spectral libraries to label GC-MS peaks with their chemical identities. Characterisation of this process resulted in the following description:  The software and algorithms used to perform spectral library look-up.  The parameters to the software and algorithms used.  The spectral library used. As the use of two different spectral libraries can result in zero, one or more different chemical identities being attributed to a peak it was decided that within the ArMet GC-MS sub-package the primary label for each metabolomics peak would be the 20 most abundant mz values, and associated relative intensities, from the representative mass spectrum for the peak. In addition each peak may be given candidate chemical identities by a number of different methods each with an associated confidence value. GC-MS Machine TIC Chromatogram Mass Spectra Mass SpectrumChemical Identity Peak Labelling MetaboliteArea Peak onexxx Peak twoyyy……… Data Pre-Processing Peak List GC-MS Output Output Machine Data Quality Control Data Internal Standard Data Results Data Pre-Processing Information Inputs ArMet GC-MS Module Example Outputs Peaks labelled with spectra Peaks labelled with chemical identities Pre-Processing data