Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feature extraction and alignment for LC/MS data

Similar presentations


Presentation on theme: "Feature extraction and alignment for LC/MS data"— Presentation transcript:

1 Feature extraction and alignment for LC/MS data
Tianwei Yu Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University April 25, 2019

2 LC/MS Liquid chromotography retention time
Take “slices” in retention time, send to MS Liquid chromotography Mass-to-charge ratio (m/z)

3 Here is an example of LC/MS data.
(a) Original data; (b) square root-transformed data to show smaller peaks; (c) A portion of the data showing details.

4 LC/MS

5 LC/MS

6 Some computational issues in LC/MS
Noise reduction & feature detection Modeling peaks and feature quantification Retention time correction. Feature alignment. Grouping multiple features from one molecule caused by (1) isotopes (2) multiple charge states Mapping MS2 for identification

7 Some notations of the data

8 Some notations of the data

9 Feature detection

10 Feature detection

11 Feature detection Katajamaa&Oresic (2007) J Chr. A 1158:318

12 Feature detection XCMS Matched filter
Coefficients are equal to a second-derivative Gaussian function. The filtered chromatogram crosses the x-axis roughly at the peak inflection points.

13 Feature detection XCMS Centwave
Directly scan for regions where at least pmin centroids with a deviation less than μ ppm occur. Peak detection on multiple scales using Continuous Wavelet Transform (CWT), which reliably detects chromatographic peaks of differing width.

14 Feature detection Adaptive binning

15 Feature detection apLCMS run filter Subject to:

16 Peak modeling and quantification
In high-resolution LC/MS data, every peak is a thin slice --- there is no need to model the MS dimension. Modeling the LC dimension is important for quantification. Models have been developed for traditional LC data, which can be applied here. Most empirical peak shape models were derived from Gaussian model. Changes were made to account for asymmetry in the peak shape.

17 Peak modeling and quantification
Generalized exponential function Data Analysis and signal processing in chromatography. A. Felinger

18 Peak modeling and quantification
Log-normal function. Data Analysis and signal processing in chromatography. A. Felinger

19 Peak modeling and quantification
The bi-Gaussian model: Data Analysis and signal processing in chromatography. A. Felinger

20 Peak modeling and quantification
Some peaks share m/z and partially overlap in RT. Some heuristic methods (require low noise): Data Analysis and signal processing in chromatography. A. Felinger

21 Peak modeling and quantification
Statistical approach: Select a set of smoother window sizes; Using each of the window size, run smoother & EM-like algorithm to fit the data; find corresponding BIC value, Choose the result with minimum BIC value.

22 Peak modeling and quantification
Bi-Gaussian mixture Gaussian mixture

23 Retention time correction
With every run, the LC dimension data has some fluctuation. Identify “reliable” peaks in both samples, use non-linear curve fitting to adjust the retention time. Anal Chem Feb 1;78(3):

24 Retention time correction
Select a sample as reference Every other sample is corrected against the reference To correct: Pair peaks in the two samples m/z close enough no multiple peaks at same m/z Fit a nonlinear curve through their RT values Correct based on the nonlinear fit

25 Peak alignment

26 Peak alignment Dynamic programming. BMC Bioinformatics 2007, 8:419

27 Peak alignment First align m/z dimension by binning.
Use kernel density estimation to find “meta-peaks”. Anal Chem Feb 1;78(3):

28 An example of the overall strategy in LC/MS metabolomics
Anal Chem Feb 1;78(3):

29 On real data

30 On real data

31 Semi-supervised detection

32 Semi-supervised detection

33 Semi-supervised detection
To reduce false-positives

34 Semi-supervised detection
Example of features found by hybrid approach only.

35 Semi-supervised detection

36 Semi-supervised detection

37 The data matrix


Download ppt "Feature extraction and alignment for LC/MS data"

Similar presentations


Ads by Google