Download presentation
Presentation is loading. Please wait.
1
Feature extraction and alignment for LC/MS data
Tianwei Yu Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University April 25, 2019
2
LC/MS Liquid chromotography retention time
Take “slices” in retention time, send to MS Liquid chromotography Mass-to-charge ratio (m/z)
3
Here is an example of LC/MS data.
(a) Original data; (b) square root-transformed data to show smaller peaks; (c) A portion of the data showing details.
4
LC/MS
5
LC/MS
6
Some computational issues in LC/MS
Noise reduction & feature detection Modeling peaks and feature quantification Retention time correction. Feature alignment. Grouping multiple features from one molecule caused by (1) isotopes (2) multiple charge states Mapping MS2 for identification
7
Some notations of the data
8
Some notations of the data
9
Feature detection
10
Feature detection
11
Feature detection Katajamaa&Oresic (2007) J Chr. A 1158:318
12
Feature detection XCMS Matched filter
Coefficients are equal to a second-derivative Gaussian function. The filtered chromatogram crosses the x-axis roughly at the peak inflection points.
13
Feature detection XCMS Centwave
Directly scan for regions where at least pmin centroids with a deviation less than μ ppm occur. Peak detection on multiple scales using Continuous Wavelet Transform (CWT), which reliably detects chromatographic peaks of differing width.
14
Feature detection Adaptive binning
15
Feature detection apLCMS run filter Subject to:
16
Peak modeling and quantification
In high-resolution LC/MS data, every peak is a thin slice --- there is no need to model the MS dimension. Modeling the LC dimension is important for quantification. Models have been developed for traditional LC data, which can be applied here. Most empirical peak shape models were derived from Gaussian model. Changes were made to account for asymmetry in the peak shape.
17
Peak modeling and quantification
Generalized exponential function Data Analysis and signal processing in chromatography. A. Felinger
18
Peak modeling and quantification
Log-normal function. Data Analysis and signal processing in chromatography. A. Felinger
19
Peak modeling and quantification
The bi-Gaussian model: Data Analysis and signal processing in chromatography. A. Felinger
20
Peak modeling and quantification
Some peaks share m/z and partially overlap in RT. Some heuristic methods (require low noise): Data Analysis and signal processing in chromatography. A. Felinger
21
Peak modeling and quantification
Statistical approach: Select a set of smoother window sizes; Using each of the window size, run smoother & EM-like algorithm to fit the data; find corresponding BIC value, Choose the result with minimum BIC value.
22
Peak modeling and quantification
Bi-Gaussian mixture Gaussian mixture
23
Retention time correction
With every run, the LC dimension data has some fluctuation. Identify “reliable” peaks in both samples, use non-linear curve fitting to adjust the retention time. Anal Chem Feb 1;78(3):
24
Retention time correction
Select a sample as reference Every other sample is corrected against the reference To correct: Pair peaks in the two samples m/z close enough no multiple peaks at same m/z Fit a nonlinear curve through their RT values Correct based on the nonlinear fit
25
Peak alignment
26
Peak alignment Dynamic programming. BMC Bioinformatics 2007, 8:419
27
Peak alignment First align m/z dimension by binning.
Use kernel density estimation to find “meta-peaks”. Anal Chem Feb 1;78(3):
28
An example of the overall strategy in LC/MS metabolomics
Anal Chem Feb 1;78(3):
29
On real data
30
On real data
31
Semi-supervised detection
32
Semi-supervised detection
33
Semi-supervised detection
To reduce false-positives
34
Semi-supervised detection
Example of features found by hybrid approach only.
35
Semi-supervised detection
36
Semi-supervised detection
37
The data matrix
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.