Interactive Series Baseline Correction Algorithm Andrey Bogomolova, Willem Windigb, Susan M. Geerc, Debra B. Blondellc, and Mark J. Robbinsc a ACD/Labs, Russian Chemometrics Society, Moscow, Russia b Eigenvector Research Inc., Rochester, NY, USA c Eastman Kodak Company, Rochester, NY, USA
Baseline (Background) Problem Baseline is an “eternal” issue in analytical data processing “Baseline” or “background”? no clear distinction baseline is associated with a smooth line reflecting a “physical” interference background tends to be used in a more general sense to designate ANY unwanted signal including noise and chemical components Our preference is given to the term “baseline” because smoothness of the background signal is the main assumption of the proposed correction algorithm
Classical Approach to the Baseline Correction Problem Classical baseline correction algorithms with respect to single curve are almost exhaustively elaborated in the literature A baseline to be subtracted is fitted by a linear (polynomial) function to the nodes that belong to signal-free regions The nodes can be automatically detected by the software or manually placed by the user These methods are advantageous for half-automatic processing where software-generated results need to be revised by a human expert
Serial (Batch) Methods Development of two-dimensional spectroscopy and hyphenated techniques demanded new methods applicable to data matrices Early works in this direction applied automated baseline correction algorithms to every individual curve in a matrix dataset The main problem with this approach is that it neglects internal (inter-spectral) correlations Instead of the expected rank reduction it may introduce additional variance into the dataset It is a “black-box” routine that is difficult to control
Multivariate Background Correction Multivariate data analysis produced a revolutionary impact onto the baseline problem in general The paradigmatic shift from hard- (knowledge-driven) to soft- or self- (data-driven) modeling has opened new horizons and introduced new concepts PLS introduces the means to address the background without its subtraction in the calibration context OSC by S. Wold turns the problem inside out eliminating the variance that is irrelevant for calibration (orthogonal to Y) from the data (X) A number of other excellent algorithms…
Our Objectives The researchers are typically concentrated at the development of fully automated background correction methods Statement: fuzzy character of the baseline problem in general puts in doubt the feasibility of automated (expert-free) baseline correction routines In contrast, we present an alternative approach that tends to maximize the means of control for a human operator simplicity visualization interactive stepwise algorithm
The Method The method is applied to a series of curves (e.g., spectra or chromatograms) The method consists of two distinct steps First, a prototype baseline is constructed from linear segments by selecting a set of nodes To aid in the node selection the mean values are calculated to represent the entire series: Second, the prototype baseline is used to construct individual baselines to be subtracted from the series curves by adjusting the nodes vertically to the corrected curve
HPLC/DAD: Sample Data Raw Corrected Subtracting the baseline Selecting nodes Calculating the mean
2nd Derivative for Node Selection
Baseline Correction for Curve Resolution Baseline correction is an application-specific preprocessing technique The present baseline correction algorithm has been developed to improve the performance of SIMPLISMA (SIMPLe-to-use Interactive Self-modeling Mixture Analysis) curve resolution technique The algorithm has been used at Eastman Kodak Company over 10 years for routine analysis of TGA/IR data that represent a challenging case for curve resolution: a lot of components high degree of overlap intensive background signal
TGA/IR Sample Data Reprinted with permission from Eastman Kodak Company, 2005
Baseline Nature in TGA/IR The most common reasons for TGA/IR baseline drift: Temperature fluctuations over time Instrument drift Material scattering Impurities Inappropriate background, etc. In the present dataset - miscellaneous reasons Spectral domain is more suitable for series baseline correction because of narrow peaks and explicit baseline areas
TGA/IR: Baseline Correction Raw Corrected Subtracting “Snapping” the baseline Raw spectral series Calculating the mean Reprinted with permission from Eastman Kodak Company, 2005
TGA/IR: Corrected Data Map Reprinted with permission from Eastman Kodak Company, 2005
TGA/IR: SIMPLISMA Curve Resolution Reprinted with permission from Eastman Kodak Company, 2005
IR Library Identification Reprinted with permission from Eastman Kodak Company, 2005
Conclusions A new interactive approach to the baseline correction problem has been suggested It allows for adapting traditional automated single-scan baseline correction routines or for performing manual correction on matrix data as if they were a single curve Advantages of the method include “transparency” of the process and the means for extensive operator interaction The method has passed long-term testing in an industrial laboratory and was integrated into a professional software package In spite of the simplicity of the algorithm, it allows for successful elimination of baselines – even in complex cases such as TGA/IR data
Acknowledgements Antony Williams for his friendly support, and Michel Hachey for his help and valuable ideas