Once again, how many points per peak?

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Physics 114: Lecture 7 Uncertainties in Measurement Dale E. Gary NJIT Physics Department.
INTEGRALS Areas and Distances INTEGRALS In this section, we will learn that: We get the same special type of limit in trying to find the area under.
16 MULTIPLE INTEGRALS.
8 TECHNIQUES OF INTEGRATION. There are two situations in which it is impossible to find the exact value of a definite integral. TECHNIQUES OF INTEGRATION.
1 Example 2 Estimate by the six Rectangle Rules using the regular partition P of the interval [0,  ] into 6 subintervals. Solution Observe that the function.
NUMERICAL DIFFERENTIATION The derivative of f (x) at x 0 is: An approximation to this is: for small values of h. Forward Difference Formula.
Chapter 9 Numerical Integration Numerical Integration Application: Normal Distributions Copyright © The McGraw-Hill Companies, Inc. Permission required.
Continuous Probability Distribution  A continuous random variables (RV) has infinitely many possible outcomes  Probability is conveyed for a range of.
ANALYTICAL CHEMISTRY CHEM 3811
Yuri Kalambet, Ampersand Ltd., Moscow, Russia
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Numerical Integration In general, a numerical integration is the approximation of a definite integration by a “weighted” sum of function values at discretized.
Integrals 5.
3. Numerical integration (Numerical quadrature) .
Continuous Distributions The Uniform distribution from a to b.
Integration Copyright © Cengage Learning. All rights reserved.
Integrals  In Chapter 2, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.  In much the.
1 Statistical Analysis – Descriptive Statistics Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
The Best Method of Noise Filtering Yuri Kalambet, Sergey Maltsev, Ampersand Ltd., Moscow, Russia; Yuri Kozmin, Shemyakin Institute of Bioorganic Chemistry,
1 2 nd Pre-Lab Quiz 3 rd Pre-Lab Quiz 4 th Pre-Lab Quiz.
CpSc 881: Machine Learning Evaluating Hypotheses.
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
4.2 Area Definition of Sigma Notation = 14.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
Chapter 7, part D. VII. Sampling Distribution of The sampling distribution of is the probability distribution of all possible values of the sample proportion.
INTEGRALS 5. INTEGRALS In Chapter 3, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.
R. Kass/W04 P416 Lec 3 1 Lecture 3 The Gaussian Probability Distribution Function Plot of Gaussian pdf x p(x)p(x) Introduction l The Gaussian probability.
Numerical Integration Methods
CHEM-E7130 Process Modeling Lecture 6
The Maximum Likelihood Method
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Sampling Distributions
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
NUMERICAL DIFFERENTIATION Forward Difference Formula
5.5 The Trapezoid Rule.
Lecture 19 – Numerical Integration
Copyright © Cengage Learning. All rights reserved.
MTH1170 Numeric Integration
Sampling Distributions and Estimation
Copyright © Cengage Learning. All rights reserved.
The Gaussian Probability Distribution Function
The Maximum Likelihood Method
5.1 – Estimating with Finite Sums
Chapter 7 Sampling Distributions.
The Normal Distribution…
The Maximum Likelihood Method
Statistics Branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. Practice or science of.
TECHNIQUES OF INTEGRATION
When you see this symbol
Chapter 7 Sampling Distributions.
Chapter 7 Numerical Differentiation and Integration
Copyright © Cengage Learning. All rights reserved.
3.1 Sums of Random Variables probability of z = x + y
Chapter 7 Sampling Distributions.
Chapter 7 Sampling Distributions.
Numerical Integration Methods
SKTN 2393 Numerical Methods for Nuclear Engineers
Numerical Integration
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Objectives Approximate a definite integral using the Trapezoidal Rule.
4.2 – Areas 4.3 – Riemann Sums Roshan Roshan.
Chapter 7 Sampling Distributions.
Continuous Distributions
Applied Statistics and Probability for Engineers
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Once again, how many points per peak? Ampersand Ltd is a manufacturer of chromatographic data station Chrom&Spec/МультиХром Yuri Kalambet, Ampersand Ltd., Moscow Yuri Kozmin, Institute of Bioorganic Chemistry, Moscow Andrey Samokhin, Moscow State University, Chemical Department Kalambet@ampersand.ru

Why? Second D in 2D chromatography Slow scanning rate in …C-MS and fast chromatography Fast chromatography Experience (35+ years) Ampersand team develops chromatographic software since early 1980s. New technologies appeared that produce narrow peaks.

Estimate of peak integration error according to N. Dyson (J Estimate of peak integration error according to N.Dyson (J.Chromatography A,1999, 382, 321-340) Error of integration by rectangle rule can be estimated as 𝐼 𝑡𝑟𝑢𝑒 − 𝐼 𝑚𝑒𝑎𝑠 = 𝑊 𝑏 3 12 𝑛 2 ℎ ′′ 𝑡 = (𝜀 3 𝑁/12)∙ 𝑓 ′′ 𝑡 Error of integration by Simpson’s rule is smaller than that by rectangle/trapezoid rule Proper area measurement requires 40 to 100 points per peak Asymmetric (τ/σ=3) peaks require up to 2.5 times more points than Gaussian to achieve similar uncertainty. Norman Dyson wrote a very useful and popular book “Chromatographic Integration Methods”. Unfortunately, he also wrote a review article “Peak distortion, data sampling errors and the integrator in the measurement of very narrow chromatographic peak”. Some conclusions from this article are on the slide. Most of them are misleading.

Peak-like function f(x) Smooth Analytic Function and all derivatives are “practically” zero outside definite finite region Chromatographic peak is a peak-like function This is just a definition of some properties of the peak function. We will need them later.

Average (for frame shift) value of derivative add-on We are considering integral of peak-like function over the complete peak region Grid nodes coordinates 𝑥 𝑖 = ℎ 2 +𝑖∙ℎ +𝜏 Average value of derivative add-on 𝑖=1 𝑁 − ℎ 2 ℎ 2 𝐾∙𝑓 2𝑘 𝑥 𝑖 𝑑𝜏 =𝐾 0 𝑁+1 ℎ 𝑓 2𝑘 𝜏 𝑑𝜏 =𝐾 𝑓 2𝑘−1 0 − 𝐾𝑓 2𝑘−1 𝑁+1 ∙ℎ =𝟎 Hense, A0 (rectangle rule area) is an unbiased estimate of peak area

Integration rules Simpson’s rule Trapezoid rule Just a reminder about peak quantification and integration rules.

Tips and tricks of integration rules All adequately implemented integration methods for standalone peak give the same result = rectangle/trapezoidal rule Rectangle rule: [01111…11110] Trapezoid rule: [01222…22210]/2 Integration limits for composite Midpoint Rectangle rule are from position x0- h/2 to position xN+h/2, and integration limits for Trapezoidal rule are from x0 to xN. Adjusting integration limits to {x0, xN} leaves only one rule - trapezoidal Simpson’s rule has two implementations, differing by start point. Averaging: ([014242…242410]/3+[0014242…242410]/3)/2=[0156666…666510]/6 gives result identical to that of trapezoidal rule Difference between integration methods is limited to peak boundaries, where peak function equals zero. Rules with alternating coefficients are not robust to periodical noise

Proper averaging of composite Simpson’s rules Node index -1 1 2 3 4 5 6 7 Divisor Simpson 1   16 8 12 Simpson 2 Simpson 2 add-on Average (Rule1) 25 24 Trapezoidal Difference Node index -1 1 2 3 4 5 6 7 Divisor Simpson 1   16 8 12 Simpson 2 Simpson 2 add-on Average (Rule2) 9 28 23 24 Trapezoidal Difference -3

Euler-Maclaurin formula 𝑥 0 𝑥 𝑁 𝑓 𝑥 𝑑𝑥=ℎ 𝑖=0 𝑁 𝑓 𝑥 𝑖 − 𝑓 𝑥 0 +𝑓 𝑥 𝑁 2 + 𝑘=1 ∾ ℎ 2𝑘 𝐵 2𝑘 2𝑘 ! 𝑓 2𝑘−1 𝑥 0 − 𝑓 2𝑘−1 ( 𝑥 𝑁 Both Rule 1 & Rule 2 are implementations of Euler-Maclaurin formula with k=1 Rule 1 utilizes estimate of derivative f’(x0)≈(f(x1)-f(x-1))/2h, Rule 2 f’(x0)≈(-3f(x0)+4f(x1)-f(x2))/2h Last time I was close to deriving proper formula of error estimate, but was not the first one. Euler and Maclaurin managed to derive this formula 300 years earlier.

Tips and tricks of integration rules 2 (The most) efficient rule of peak integration is trapezoidal rule w() = {1,4,2,…,4,2,4,1}/3 = {1,2,2,…,2,2,2,1}/3 + {0,2,0,…,2,0,2,0}/3 Simpson’s rule = (2/3 of Trapezoidal rule with step h) + (1/3 of Rectangle rule with the step 2h) Let’s assume, that for Trapezoidal rule ΔA=O(h2)=Eh; E2h=4Eh; Esimpson ≈ 2Eh/3+E2h/3=2Eh/3+4Eh/3=2Eh

Modelling of EMG peaks No noise Height=100000 counts Sigma=0.35…8 points Tau/Sigma=0(Gaussian); 1(EMG-1); 3(EMG-3) 100 peaks Inter-peak distance=N*Sigma+0.01 SigmaEMG2=SigmaG2+Tau2

Average (for frame shift) sum of f(xn) Gaussian peak; Area vs peak number

Maximum error vs. data rate (modelling)

Sufficient data rates   Trapezoidal rule Simpson's rule Threshold 1% 0.1% Gaussian 0.52 0.62 0.92 1.15 EMG-1 0.65 0.80 1.14 1.46 EMG-3 1.28 1.64 2.17 2.96 Simpson’s rule requires 1.8 higher data rate than Trapezoidal

Peak moments M0 Area M1 Retention time M2 (Central) Dispersion (Dispersion1/2=(Standard Deviation)≡Sigma) M3 (Central) Skewness = M3/(Sigma)3 Any peak moment is a restricted function

Peak moments 𝑀𝑖= −∞ ∞ 𝑥 𝑖 𝑃(𝑥) M0 Area M1 Retention time 𝑀𝑖= −∞ ∞ 𝑥 𝑖 𝑃(𝑥) M0 Area M1 Retention time M2 (Central) Dispersion (Dispersion1/2=(Standard Deviation)≡Sigma) M3 (Central) Skewness = M3/(Sigma)3   Average retention M1 Standard deviation σ Asymmetry τ Threshold 0.1∙σtrue 0.01∙σtrue Gaussian 0.45 0.58 0.48 0.61 0.77 0.99 EMG-1 0.53 0.71 0.54 0.74 0.60 0.79 EMG-3 0.88 1.33 0.75 1.29 1.17

Trapezoidal rule estimate of area is efficient, consistent, but biased? “True” area has the lowest probability

Integration of non-peak regions (partial integration of Gaussian) Rules 1 & 2 perform often worse, sometimes better than Simpson’s Rule (1+2)/2 performs “almost” always better than Simpson’s Euler-Maclaurin rule with analytically calculated 1st derivative is always the best Excel table available

Conclusions Rectangle or trapezoidal rule is the best (and only) way of peak integration, giving efficient estimate of peak area Peak area can be reliably (0.1%) measured for peaks of 0.62 points per sigma for symmetric peaks, up to 1.6 points per sigma for strongly asymmetric peaks Peak moments can be used to evaluate narrow peak shape properties. In the case of partial peak integration Euler-Maclaurin rule significantly outperforms Simpson’s rule.