Once again, how many points per peak? Ampersand Ltd is a manufacturer of chromatographic data station Chrom&Spec/МультиХром Yuri Kalambet, Ampersand Ltd., Moscow Yuri Kozmin, Institute of Bioorganic Chemistry, Moscow Andrey Samokhin, Moscow State University, Chemical Department Kalambet@ampersand.ru
Why? Second D in 2D chromatography Slow scanning rate in …C-MS and fast chromatography Fast chromatography Experience (35+ years) Ampersand team develops chromatographic software since early 1980s. New technologies appeared that produce narrow peaks.
Estimate of peak integration error according to N. Dyson (J Estimate of peak integration error according to N.Dyson (J.Chromatography A,1999, 382, 321-340) Error of integration by rectangle rule can be estimated as 𝐼 𝑡𝑟𝑢𝑒 − 𝐼 𝑚𝑒𝑎𝑠 = 𝑊 𝑏 3 12 𝑛 2 ℎ ′′ 𝑡 = (𝜀 3 𝑁/12)∙ 𝑓 ′′ 𝑡 Error of integration by Simpson’s rule is smaller than that by rectangle/trapezoid rule Proper area measurement requires 40 to 100 points per peak Asymmetric (τ/σ=3) peaks require up to 2.5 times more points than Gaussian to achieve similar uncertainty. Norman Dyson wrote a very useful and popular book “Chromatographic Integration Methods”. Unfortunately, he also wrote a review article “Peak distortion, data sampling errors and the integrator in the measurement of very narrow chromatographic peak”. Some conclusions from this article are on the slide. Most of them are misleading.
Peak-like function f(x) Smooth Analytic Function and all derivatives are “practically” zero outside definite finite region Chromatographic peak is a peak-like function This is just a definition of some properties of the peak function. We will need them later.
Average (for frame shift) value of derivative add-on We are considering integral of peak-like function over the complete peak region Grid nodes coordinates 𝑥 𝑖 = ℎ 2 +𝑖∙ℎ +𝜏 Average value of derivative add-on 𝑖=1 𝑁 − ℎ 2 ℎ 2 𝐾∙𝑓 2𝑘 𝑥 𝑖 𝑑𝜏 =𝐾 0 𝑁+1 ℎ 𝑓 2𝑘 𝜏 𝑑𝜏 =𝐾 𝑓 2𝑘−1 0 − 𝐾𝑓 2𝑘−1 𝑁+1 ∙ℎ =𝟎 Hense, A0 (rectangle rule area) is an unbiased estimate of peak area
Integration rules Simpson’s rule Trapezoid rule Just a reminder about peak quantification and integration rules.
Tips and tricks of integration rules All adequately implemented integration methods for standalone peak give the same result = rectangle/trapezoidal rule Rectangle rule: [01111…11110] Trapezoid rule: [01222…22210]/2 Integration limits for composite Midpoint Rectangle rule are from position x0- h/2 to position xN+h/2, and integration limits for Trapezoidal rule are from x0 to xN. Adjusting integration limits to {x0, xN} leaves only one rule - trapezoidal Simpson’s rule has two implementations, differing by start point. Averaging: ([014242…242410]/3+[0014242…242410]/3)/2=[0156666…666510]/6 gives result identical to that of trapezoidal rule Difference between integration methods is limited to peak boundaries, where peak function equals zero. Rules with alternating coefficients are not robust to periodical noise
Proper averaging of composite Simpson’s rules Node index -1 1 2 3 4 5 6 7 Divisor Simpson 1 16 8 12 Simpson 2 Simpson 2 add-on Average (Rule1) 25 24 Trapezoidal Difference Node index -1 1 2 3 4 5 6 7 Divisor Simpson 1 16 8 12 Simpson 2 Simpson 2 add-on Average (Rule2) 9 28 23 24 Trapezoidal Difference -3
Euler-Maclaurin formula 𝑥 0 𝑥 𝑁 𝑓 𝑥 𝑑𝑥=ℎ 𝑖=0 𝑁 𝑓 𝑥 𝑖 − 𝑓 𝑥 0 +𝑓 𝑥 𝑁 2 + 𝑘=1 ∾ ℎ 2𝑘 𝐵 2𝑘 2𝑘 ! 𝑓 2𝑘−1 𝑥 0 − 𝑓 2𝑘−1 ( 𝑥 𝑁 Both Rule 1 & Rule 2 are implementations of Euler-Maclaurin formula with k=1 Rule 1 utilizes estimate of derivative f’(x0)≈(f(x1)-f(x-1))/2h, Rule 2 f’(x0)≈(-3f(x0)+4f(x1)-f(x2))/2h Last time I was close to deriving proper formula of error estimate, but was not the first one. Euler and Maclaurin managed to derive this formula 300 years earlier.
Tips and tricks of integration rules 2 (The most) efficient rule of peak integration is trapezoidal rule w() = {1,4,2,…,4,2,4,1}/3 = {1,2,2,…,2,2,2,1}/3 + {0,2,0,…,2,0,2,0}/3 Simpson’s rule = (2/3 of Trapezoidal rule with step h) + (1/3 of Rectangle rule with the step 2h) Let’s assume, that for Trapezoidal rule ΔA=O(h2)=Eh; E2h=4Eh; Esimpson ≈ 2Eh/3+E2h/3=2Eh/3+4Eh/3=2Eh
Modelling of EMG peaks No noise Height=100000 counts Sigma=0.35…8 points Tau/Sigma=0(Gaussian); 1(EMG-1); 3(EMG-3) 100 peaks Inter-peak distance=N*Sigma+0.01 SigmaEMG2=SigmaG2+Tau2
Average (for frame shift) sum of f(xn) Gaussian peak; Area vs peak number
Maximum error vs. data rate (modelling)
Sufficient data rates Trapezoidal rule Simpson's rule Threshold 1% 0.1% Gaussian 0.52 0.62 0.92 1.15 EMG-1 0.65 0.80 1.14 1.46 EMG-3 1.28 1.64 2.17 2.96 Simpson’s rule requires 1.8 higher data rate than Trapezoidal
Peak moments M0 Area M1 Retention time M2 (Central) Dispersion (Dispersion1/2=(Standard Deviation)≡Sigma) M3 (Central) Skewness = M3/(Sigma)3 Any peak moment is a restricted function
Peak moments 𝑀𝑖= −∞ ∞ 𝑥 𝑖 𝑃(𝑥) M0 Area M1 Retention time 𝑀𝑖= −∞ ∞ 𝑥 𝑖 𝑃(𝑥) M0 Area M1 Retention time M2 (Central) Dispersion (Dispersion1/2=(Standard Deviation)≡Sigma) M3 (Central) Skewness = M3/(Sigma)3 Average retention M1 Standard deviation σ Asymmetry τ Threshold 0.1∙σtrue 0.01∙σtrue Gaussian 0.45 0.58 0.48 0.61 0.77 0.99 EMG-1 0.53 0.71 0.54 0.74 0.60 0.79 EMG-3 0.88 1.33 0.75 1.29 1.17
Trapezoidal rule estimate of area is efficient, consistent, but biased? “True” area has the lowest probability
Integration of non-peak regions (partial integration of Gaussian) Rules 1 & 2 perform often worse, sometimes better than Simpson’s Rule (1+2)/2 performs “almost” always better than Simpson’s Euler-Maclaurin rule with analytically calculated 1st derivative is always the best Excel table available
Conclusions Rectangle or trapezoidal rule is the best (and only) way of peak integration, giving efficient estimate of peak area Peak area can be reliably (0.1%) measured for peaks of 0.62 points per sigma for symmetric peaks, up to 1.6 points per sigma for strongly asymmetric peaks Peak moments can be used to evaluate narrow peak shape properties. In the case of partial peak integration Euler-Maclaurin rule significantly outperforms Simpson’s rule.