Download presentation
Presentation is loading. Please wait.
Published byHerman Dharmawijaya Modified over 6 years ago
1
Hierarchical Processing for LC/MS Metabolomics Data Generated in Multiple Batches
Douglas Walker 1, Karan Uppal 2, Dean Jones 2, Tianwei Yu 3,* 1 Department of Environmental Health, 2 Department of Medicine, 3 Department of Biostatistics and Bioinformatics, Emory University 12/8/2018
2
Take “slices” in retention time, send to MS
Introduction retention time Take “slices” in retention time, send to MS Liquid chromotography retention time Mass-to-charge ratio (m/z) Mass-to-charge ratio (m/z)
3
Issues in LC/MS data processing:
Introduction An ideal peak should show a Gaussian curve in intensity along the retention time axis, while keeping constant in the m/z axis Issues in LC/MS data processing: Background noise Chemical noise (ridges in spectra, ion suppression,…) Peak intensity (along retention time axis) deviate from Gaussian curve substantially (fronting, breaks, …) Slight m/z shift across MS spectra (not severe with newer machines) Retention time shift across LC/MS spectra Multiple charge status of a single metabolite Isotopes ……
4
Introduction The analysis of LC/MS metabolomics data involves a number of steps. Example - the XCMS workflow: XCMS Mzmine apLCMS MetAlign … … These steps assume the peaks are from a similar distribution, in terms of their variation in m/z and retention time.
5
Introduction Batches can be different from each other in terms of data characteristics Retention time shift Levels of variation in m/z and retention time Different features may be missing at different rates …… One set of parameters may not suite all batches. It can make tolerance levels unnecessarily large, and cause false +/- in alignments. An example: Batch 1 RT Batch 2 Two tight clusters might make the program think there are two features:
6
Peak detection, Time adjustment, Peak alignment, Signal recovery
Introduction Batch 1 Batch 2 Batch 3 Common practice: Peak detection, Time adjustment, Peak alignment, Signal recovery
7
Methods Batch 1 Batch 2 Batch 3 Peak detection Time adjustment Peak alignment Signal recovery Peak detection Time adjustment Peak alignment Signal recovery Peak detection Time adjustment Peak alignment Signal recovery
8
+ Methods Treat each as a sample Time adjustment Peak alignment
Between batch Within batch Overall correction Time adjustment Peak alignment Signal recovery
9
Results The data: Standard QC sample Same sample measured repeatedly 33 batches, each contains 3 profiles from the same sample Parameter settings: All other parameters the same. Batch-wise procedure: Within-batch detection proportion threshold: 0.4, 0.5, 0.6, 0.7, 0.8 Between-batch detection proportion threshold: 0.1, 0.2, …, 0.8 Traditional procedure: Detection threshold (number of profiles): 5, 10, 15, …., 95
10
Results Evaluation of the results: Total number of zeros in the final data matrix Proportion of features with m/z matched to known metabolites (xMSAnnotator) Coefficient of variation (CV) in the final data matrix (without considering batches) Coefficient of variation (CV) after merging each triplet (batch)
11
Results
12
Results The data: Emory/Georgia Tech Center for Health Discovery and Well Being (CHDWB) data Part of the data was used for the testing: 115 subjects 6 batches, each contains 60 profiles Each subject measured in triplet; a few were measured 6 times Parameter settings: All other parameters the same. Batchwise procedure: Within-batch detection proportion threshold: 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 Between-batch detection proportion threshold: 0.25 Traditional procedure: Detection threshold (number of profiles): 30,60,90,120,180
13
Results Evaluation of the results: Before merging triplets for each subject: Total number of zeros in the final data matrix Proportion of features with m/z matched to known metabolites (xMSAnnotator) Average coefficient of variation across batches After merging triplets for each subject: Coefficient of variation (CV) in the final data matrix (without considering batches) Coefficient of variation (CV) in the final data matrix using only non-zero values (without considering batches)
14
Results Red: Batchwise processing; Blue: traditional processing
15
Better performance appears to be achieved when parameters require
Conclusions The stepwise processing procedure that consider batches improves the consistency of feature detection and quantification. Better performance appears to be achieved when parameters require - Higher within-batch consistency - Lower between-batch consistency Extra studies is necessary to validate and optimize the procedure - larger datasets - examination at the single feature level Needs better integration with semi-supervised peak detection procedure
16
People who work on metabolomics data in Dr. Jones lab:
Acknowledgements People who work on metabolomics data in Dr. Jones lab: Dr. Shuzhao Li Dr. ViLinh Tran Dr. Youngja Park Mr. Chunyu Ma Biomedical collaborators who use metabolomics data: Dr. Jessica Alvarez Dr. Jeremy Sarnat Dr. Chandresh Ladva Dr. Donghai Liang Biostatisticians: Dr. Jian Kang Dr. Qingpo Cai Dr. Nancy Jin Dr. Eugene Huang Dr. Elizabeth Chong Dr. John Hanfelt
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.