Douglas Walker 1, Karan Uppal 2, Dean Jones 2, Tianwei Yu 3,*

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Automatic Identification of Bacterial Types using Statistical Image Modeling Sigal Trattner, Dr. Hayit Greenspan, Prof. Shimon Abboud Department of Biomedical.
Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn.
Diagnosis of Ovarian Cancer Based on Mass Spectra of Blood Samples Hong Tang Yelena Mukomel Eugene Fink.
Smart Templates for Chemical Identification in GCxGC-MS QingPing Tao 1, Stephen E. Reichenbach 2, Mingtian Ni 3, Arvind Visvanathan 2, Michael Kok 2, Luke.
Theodore Alexandrov, Michael Becker, Sören Deininger, Günther Ernst, Liane Wehder, Markus Grasmair, Ferdinand von Eggeling, Herbert Thiele, and Peter Maass.
Biomedical Tracers Biology 685 University of Massachusetts at Boston created by Kenneth L. Campbell, PhD.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Introduction to high-throughput analysis of proteins and metabolites by Mass Spectrometry The basic principle Brief introduction of techniques Computational.
Proteomics Informatics Workshop Part III: Protein Quantitation
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
APCCB AVERAGE OF DELTA – A NEW CONCEPT IN QUALITY CONTROL GRD Jones Department of Chemical Pathology, St Vincents Hospital, Sydney, Australia.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
ERP DATA ACQUISITION & PREPROCESSING EEG Acquisition: 256 scalp sites; vertex recording reference (Geodesic Sensor Net)..01 Hz to 100 Hz analogue filter;
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Blue: Histogram of normalised deviation from “true” value; Red: Gaussian fit to histogram Presented at ESA Hyperspectral Workshop 2010, March 16-19, Frascati,
A Phospho-Peptide Spectrum Library for Improved Targeted Assays Barbara Frewen 1, Scott Peterman 1, John Sinclair 2, Claus Jorgensen 2, Amol Prakash 1,
Metabolomics Metabolome Reflects the State of the Cell, Organ or Organism Change in the metabolome is a direct consequence of protein activity changes.
MS Calibration for Protein Profiles We need calibration for –Accurate mass value Mass error: (Measured Mass – Theoretical Mass) X 10 6 ppm Theoretical.
Quality Assurance How do you know your results are correct? How confident are you?
1 The Monitoring of Linear Profiles Keun Pyo Kim Mahmoud A. Mahmoud William H. Woodall Virginia Tech Blacksburg, VA (Send request for paper,
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
PHOTON RECONSTRUCTION IN CMS APPLICATION TO H   PHOTON RECONSTRUCTION IN CMS APPLICATION TO H   Elizabeth Locci SPP/DAPNIA, Saclay, France Prague.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
9-1 Copyright © 2016 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Chapter 1: Introduction 1. Type of instrumental methods Radiation Electrical method Thermal properties Others 2. Instruments for analysis Non-electrical.
Introduction to high-throughput analysis of proteins and metabolites by Mass Spectrometry The basic principle Brief introduction of techniques Computational.
Rick Walker Evaluation of Out-of-Tolerance Risk 1 Evaluation of Out-of-Tolerance Risk in Measuring and Test Equipment Rick Walker Fluke - Hart Scientific.
Does It Matter What Kind of Vibroseis Deconvolution is Used? Larry Mewhort* Husky Energy Mike Jones Schlumberger Sandor Bezdan Geo-X Systems.
Protein quantitation I: Overview (Week 5). Fractionation Digestion LC-MS Lysis MS Sample i Protein j Peptide k Proteomic Bioinformatics – Quantitation.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Serum Diagnosis of Chronic Fatigue Syndrome (CFS) Using Array-based Proteomics Pingzhao Hu W Le, S Lim, B Xing, CMT Greenwood and J Beyene Hospital for.
이 장 우. 1. Introduction  HPLC-MS/MS methodology achieved its preferred status -Highly selective and effectively eliminated interference -Without.
Open source tools for data analysis
A new R package statTarget Hemi Luan Hong Kong Baptist University.
Fig. 1. proFIA approach for peak detection and quantification
Tracking results from Au+Au test Beam
Signal processing.
Jan Stanstrup Bioactive Foods and Health
Strategies for Eliminating Interferences in Optical Emission Spectroscopy Best practices to optimize your method and correct for interferences to produce.
Tianwei Yu Department of Biostatistics and Bioinformatics
Brain Region Mapping Using Global Metabolomics
High level GWAS analysis
Edge detection Goal: Identify sudden changes (discontinuities) in an image Intuitively, most semantic and shape information from the image can be encoded.
AnalysisXML Results Design
ECE539 final project Instructor: Yu Hen Hu Fall 2005
General Overview of the module and the methods
Environmental Laboratory Certification Program (ELCP)
Metabolomics: Preanalytical Variables
Comparing the Matrix and EPID Flatness/Symmetry/Output Measurement
A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits Negar Hassanpour and Russ Greiner Department of Computing.
Pejman Mohammadi, Niko Beerenwinkel, Yaakov Benenson  Cell Systems 
Nat. Rev. Nephrol. doi: /nrneph
Softberry Mass Spectra (SMS) processing tools
Problems with the Run4 Preliminary Phi->KK Analysis
Attentional Modulations Related to Spatial Gating but Not to Allocation of Limited Resources in Primate V1  Yuzhi Chen, Eyal Seidemann  Neuron  Volume.
NoDupe algorithm to detect and group similar mass spectra.
Quality Control Lecture 3
Feature extraction and alignment for LC/MS data
The Ventriloquist Effect Results from Near-Optimal Bimodal Integration
Bo Li, Akshay Tambe, Sharon Aviran, Lior Pachter  Cell Systems 
Introduction to Analytical Chemistry
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Jia-Bin Huang Virginia Tech
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Calibration Method.
Operation manual of AI SIDA
Untargeted LC/MS metabolite profiling of DFMO-treated HT-29 colorectal cancer cells. Untargeted LC/MS metabolite profiling of DFMO-treated HT-29 colorectal.
Presentation transcript:

Hierarchical Processing for LC/MS Metabolomics Data Generated in Multiple Batches Douglas Walker 1, Karan Uppal 2, Dean Jones 2, Tianwei Yu 3,* 1 Department of Environmental Health, 2 Department of Medicine, 3 Department of Biostatistics and Bioinformatics, Emory University 12/8/2018

Take “slices” in retention time, send to MS Introduction retention time Take “slices” in retention time, send to MS Liquid chromotography retention time Mass-to-charge ratio (m/z) Mass-to-charge ratio (m/z)

Issues in LC/MS data processing: Introduction An ideal peak should show a Gaussian curve in intensity along the retention time axis, while keeping constant in the m/z axis Issues in LC/MS data processing: Background noise Chemical noise (ridges in spectra, ion suppression,…) Peak intensity (along retention time axis) deviate from Gaussian curve substantially (fronting, breaks, …) Slight m/z shift across MS spectra (not severe with newer machines) Retention time shift across LC/MS spectra Multiple charge status of a single metabolite Isotopes ……

Introduction The analysis of LC/MS metabolomics data involves a number of steps. Example - the XCMS workflow:  XCMS Mzmine apLCMS MetAlign … … These steps assume the peaks are from a similar distribution, in terms of their variation in m/z and retention time.

Introduction Batches can be different from each other in terms of data characteristics Retention time shift Levels of variation in m/z and retention time Different features may be missing at different rates …… One set of parameters may not suite all batches. It can make tolerance levels unnecessarily large, and cause false +/- in alignments. An example: Batch 1 RT Batch 2 Two tight clusters might make the program think there are two features: 

Peak detection, Time adjustment, Peak alignment, Signal recovery Introduction Batch 1 Batch 2 Batch 3 Common practice: Peak detection, Time adjustment, Peak alignment, Signal recovery

Methods Batch 1 Batch 2 Batch 3 Peak detection Time adjustment Peak alignment Signal recovery Peak detection Time adjustment Peak alignment Signal recovery Peak detection Time adjustment Peak alignment Signal recovery

+ Methods Treat each as a sample Time adjustment Peak alignment Between batch Within batch Overall correction Time adjustment Peak alignment Signal recovery

Results The data: Standard QC sample Same sample measured repeatedly 33 batches, each contains 3 profiles from the same sample Parameter settings: All other parameters the same. Batch-wise procedure: Within-batch detection proportion threshold: 0.4, 0.5, 0.6, 0.7, 0.8 Between-batch detection proportion threshold: 0.1, 0.2, …, 0.8 Traditional procedure: Detection threshold (number of profiles): 5, 10, 15, …., 95

Results Evaluation of the results: Total number of zeros in the final data matrix Proportion of features with m/z matched to known metabolites (xMSAnnotator) Coefficient of variation (CV) in the final data matrix (without considering batches) Coefficient of variation (CV) after merging each triplet (batch)

Results

Results The data: Emory/Georgia Tech Center for Health Discovery and Well Being (CHDWB) data Part of the data was used for the testing: 115 subjects 6 batches, each contains 60 profiles Each subject measured in triplet; a few were measured 6 times Parameter settings: All other parameters the same. Batchwise procedure: Within-batch detection proportion threshold: 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 Between-batch detection proportion threshold: 0.25 Traditional procedure: Detection threshold (number of profiles): 30,60,90,120,180

Results Evaluation of the results: Before merging triplets for each subject: Total number of zeros in the final data matrix Proportion of features with m/z matched to known metabolites (xMSAnnotator) Average coefficient of variation across batches After merging triplets for each subject: Coefficient of variation (CV) in the final data matrix (without considering batches) Coefficient of variation (CV) in the final data matrix using only non-zero values (without considering batches)

Results Red: Batchwise processing; Blue: traditional processing

Better performance appears to be achieved when parameters require Conclusions The stepwise processing procedure that consider batches improves the consistency of feature detection and quantification. Better performance appears to be achieved when parameters require - Higher within-batch consistency - Lower between-batch consistency Extra studies is necessary to validate and optimize the procedure - larger datasets - examination at the single feature level Needs better integration with semi-supervised peak detection procedure

People who work on metabolomics data in Dr. Jones lab: Acknowledgements People who work on metabolomics data in Dr. Jones lab: Dr. Shuzhao Li Dr. ViLinh Tran Dr. Youngja Park Mr. Chunyu Ma Biomedical collaborators who use metabolomics data: Dr. Jessica Alvarez Dr. Jeremy Sarnat Dr. Chandresh Ladva Dr. Donghai Liang Biostatisticians: Dr. Jian Kang Dr. Qingpo Cai Dr. Nancy Jin Dr. Eugene Huang Dr. Elizabeth Chong Dr. John Hanfelt