Feature extraction and alignment for LC/MS data

Slides:



Advertisements
Similar presentations
Image Registration  Mapping of Evolution. Registration Goals Assume the correspondences are known Find such f() and g() such that the images are best.
Advertisements

Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.
Protein Quantitation II: Multiple Reaction Monitoring
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
QR Code Recognition Based On Image Processing
OverviewOverview Motion correction Smoothing kernel Spatial normalisation Standard template fMRI time-series Statistical Parametric Map General Linear.
Digital Image Processing
Amir Hosein Omidvarnia Spring 2007 Principles of 3D Face Recognition.
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
Proteomics Informatics – Signal processing I: analysis of mass spectra (Week 3)
The Statistics of Fingerprints A Matching Algorithm to be used in an Investigation into the Reliability of the Use of Fingerprints for Identification Bob.
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Principal Component Analysis
Speaker Adaptation for Vowel Classification
Edge Detection Today’s reading Forsyth, chapters 8, 15.1
Fitting a Model to Data Reading: 15.1,
Scale Invariant Feature Transform (SIFT)
Announcements Since Thursday we’ve been discussing chapters 7 and 8. “matlab can be used off campus by logging into your wam account and bringing up an.
Edge Detection Today’s readings Cipolla and Gee –supplemental: Forsyth, chapter 9Forsyth Watt, From Sandlot ScienceSandlot Science.
Abstract Overall Algorithm Target Matching Error Checking: By comparing what we transform from Kinect Camera coordinate to robot coordinate with what we.
Collaborative Signal Processing CS 691 – Wireless Sensor Networks Mohammad Ali Salahuddin 04/22/03.
Introduction to high-throughput analysis of proteins and metabolites by Mass Spectrometry The basic principle Brief introduction of techniques Computational.
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Linear Algebra and Image Processing
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Variable Penalty Dynamic Time Warping For Aligning Chromatography Data David Clifford Research Scientist June 2009.
SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.
W  eν The W->eν analysis is a phi uniformity calibration, and only yields relative calibration constants. This means that all of the α’s in a given eta.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Implementing a Speech Recognition System on a GPU using CUDA
CHROMATOGRAPHY Chromatography basically involves the separation of mixtures due to differences in the distribution coefficient.
Laxman Yetukuri T : Modeling of Proteomics Data
EDGE DETECTION IN COMPUTER VISION SYSTEMS PRESENTATION BY : ATUL CHOPRA JUNE EE-6358 COMPUTER VISION UNIVERSITY OF TEXAS AT ARLINGTON.
MS Calibration for Protein Profiles We need calibration for –Accurate mass value Mass error: (Measured Mass – Theoretical Mass) X 10 6 ppm Theoretical.
Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.
CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 15/16 – TP7 Spatial Filters Miguel Tavares Coimbra.
Non-Linear Transformations Michael J. Watts
Introduction to high-throughput analysis of proteins and metabolites by Mass Spectrometry The basic principle Brief introduction of techniques Computational.
Protein quantitation I: Overview (Week 5). Fractionation Digestion LC-MS Lysis MS Sample i Protein j Peptide k Proteomic Bioinformatics – Quantitation.
Miguel Tavares Coimbra
Expanding lipidome coverage using LC-MS/MS data-dependent acquisition with automated exclusion list generation Supporting Information Jeremy P. Koelmel1,
Chem. 133 – 4/27 Lecture.
Data Transformation: Normalization
SIFT Scale-Invariant Feature Transform David Lowe
LECTURE 11: Advanced Discriminant Analysis
Distinctive Image Features from Scale-Invariant Keypoints
Fig. 1. proFIA approach for peak detection and quantification
3D Vision Interest Points.
The Q Pipeline search for gravitational-wave bursts with LIGO
Signal processing.
Image gradients and edges
Machine Learning Feature Creation and Selection
Tianwei Yu Department of Biostatistics and Bioinformatics
Douglas Walker 1, Karan Uppal 2, Dean Jones 2, Tianwei Yu 3,*
Nonlinear regression.
Metabolomics: Preanalytical Variables
6.7 Practical Problems with Curve Fitting simple conceptual problems
Continuous distributions
Digital Image Processing Week IV
Softberry Mass Spectra (SMS) processing tools
Generally Discriminant Analysis
NoDupe algorithm to detect and group similar mass spectra.
Model generalization Brief summary of methods
Edge Detection Today’s readings Cipolla and Gee Watt,
Mixture Models with Adaptive Spatial Priors
Microbiome: Metabolomics
Operation manual of AI SIDA
Presentation transcript:

Feature extraction and alignment for LC/MS data Tianwei Yu Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University April 25, 2019

LC/MS Liquid chromotography retention time Take “slices” in retention time, send to MS Liquid chromotography Mass-to-charge ratio (m/z)

Here is an example of LC/MS data. (a) Original data; (b) square root-transformed data to show smaller peaks; (c) A portion of the data showing details.

LC/MS

LC/MS

Some computational issues in LC/MS Noise reduction & feature detection Modeling peaks and feature quantification Retention time correction. Feature alignment. Grouping multiple features from one molecule caused by (1) isotopes (2) multiple charge states Mapping MS2 for identification

Some notations of the data

Some notations of the data

Feature detection

Feature detection

Feature detection Katajamaa&Oresic (2007) J Chr. A 1158:318

Feature detection XCMS Matched filter Coefficients are equal to a second-derivative Gaussian function. The filtered chromatogram crosses the x-axis roughly at the peak inflection points.

Feature detection XCMS Centwave Directly scan for regions where at least pmin centroids with a deviation less than μ ppm occur. Peak detection on multiple scales using Continuous Wavelet Transform (CWT), which reliably detects chromatographic peaks of differing width.

Feature detection Adaptive binning

Feature detection apLCMS run filter Subject to:

Peak modeling and quantification In high-resolution LC/MS data, every peak is a thin slice --- there is no need to model the MS dimension. Modeling the LC dimension is important for quantification. Models have been developed for traditional LC data, which can be applied here. Most empirical peak shape models were derived from Gaussian model. Changes were made to account for asymmetry in the peak shape.

Peak modeling and quantification Generalized exponential function Data Analysis and signal processing in chromatography. A. Felinger

Peak modeling and quantification Log-normal function. Data Analysis and signal processing in chromatography. A. Felinger

Peak modeling and quantification The bi-Gaussian model: Data Analysis and signal processing in chromatography. A. Felinger

Peak modeling and quantification Some peaks share m/z and partially overlap in RT. Some heuristic methods (require low noise): Data Analysis and signal processing in chromatography. A. Felinger

Peak modeling and quantification Statistical approach: Select a set of smoother window sizes; Using each of the window size, run smoother & EM-like algorithm to fit the data; find corresponding BIC value, Choose the result with minimum BIC value.

Peak modeling and quantification Bi-Gaussian mixture Gaussian mixture

Retention time correction With every run, the LC dimension data has some fluctuation. Identify “reliable” peaks in both samples, use non-linear curve fitting to adjust the retention time. Anal Chem. 2006 Feb 1;78(3):779-87.

Retention time correction Select a sample as reference Every other sample is corrected against the reference To correct: Pair peaks in the two samples m/z close enough no multiple peaks at same m/z Fit a nonlinear curve through their RT values Correct based on the nonlinear fit

Peak alignment

Peak alignment Dynamic programming. BMC Bioinformatics 2007, 8:419

Peak alignment First align m/z dimension by binning. Use kernel density estimation to find “meta-peaks”. Anal Chem. 2006 Feb 1;78(3):779-87.

An example of the overall strategy in LC/MS metabolomics Anal Chem. 2006 Feb 1;78(3):779-87.

On real data

On real data

Semi-supervised detection

Semi-supervised detection

Semi-supervised detection To reduce false-positives

Semi-supervised detection Example of features found by hybrid approach only.

Semi-supervised detection

Semi-supervised detection

The data matrix