How to solve authentication problems Semenov Institute of Chemical Physics, RAS Moscow Russian Chemometric Society Oxana Rodionova, Alexey Pomerantsev WSC 10
What is authentication? It is the process of determining whether an object is, in fact, what it is declared to be ! Quick/ relatively cheap / often non-destructive measurements + chemometrics Direct chemical analysis WSC 10
Typical authentication problems Counterfeit drug Illegal additives in fuels Food adulteration Confirmation of geographical origin WSC 10
Discriminant analysis Fisher Iris data (1936) setosa versicolor virginica WSC 10
Discrimination vs. Class Modeling Target class WSC 10
Main differences Class modeling problems Discriminant problems The goal Determination whether an object is, in fact, what it is declared to be Determination of a membership of an object to one of the predefined classes Data sets Objects that represent a target class Several sets of objects that represent predefined classes Statistical/Chemometric methods UNECO, SIMCA, SVDD, etc… LDA, QDA, PLS-DA, SVM, etc ... Result of data modeling/ Decision rule development Decision rule for a given value Boundaries/delineators between classes Figures of merit Sensitivity is given a priori. Specificity can be found theoretically when an alternative class is given. Sensitivity and specificity are found empirically post factum WSC 10
Main steps of class modeling Definition of a target class, objects which undoubtedly belong to the target class Data are divided into training and validation sets. Data processing. Establish a decision rule, acceptance area and/or values of thresholds Validation. Carefully trained decision rules has to be suspiciously validated against new genuine objects. Figures of merit. Type I error, sensitivity, type II error, specificity WSC 10
Figures of merit α β ‘Pure’ one-class classifier Type I error α is the rate of wrong rejections of the target class samples Sensitivity=(1-α)100% Availability of alien class/classes α Type II error β is the rate of wrong acceptances of aliens Specificity=(1-β)100% β WSC 10
PLS DA Training Validation/Prediction CLASS 1 PLS CLASS 2 CLASS 3 x11 x12 … x1k xi1 Xi2 xik xi+1,1 xi+1,2 x1+1,k ... xn1 xn2 Xnk xnk 1 CLASS 1 PLS CLASS 2 CLASS 3 Fingerprints Class membership 1.01 0.02 -0.05 0.98 1.02 -0.03 0.95 0.06 0.04 -0.02 1.05 0.08 1.1 x11 x12 … x1k x21 x22 x2k x31 x32 x3k x41 x42 x4k x51 x52 x5k x61 x62 ... x6k PLS CLASS 1 CLASS 2 CLASS 3 Validation/Prediction WSC 10
DD-SIMCA PCA Acceptance area Orthogonal distance vi Score distance hi WSC 10
Example Raw spectra Measurements in the diffuse reflection mode through a PVC blister Working range: 7482–4056 cm-1 (889 wavenumbers). WSC 10
Data description ● ■ A3 ▲ 50 20 200 A4 30 180 A7 80 Calcium channel blocker, uncoated tablets ( API 10 mg) Name Marker Number of training objects Number of validation objects Tablet mass, (mg) A3 ▲ 50 20 200 A4 ■ 30 180 A7 ● 80 O.Ye. Rodionova, K.S. Balyklova, A.V. Titova, A.L. Pomerantsev "Quantitative risk assessment in classification of drugs with identical API content", J. Pharm. Biomed. Anal. 2014, 98, 186-192 WSC 10
Discriminat analysis PLS 1 PLS 2 Target class A4 Target class A7 Sensitivity/specificity PLS2-DA (3 PLS-components) A3 A4 A7 100% 97% 96% WSC 10
Class modeling approach Target class A4 Training Validation/Prediction WSC 10
DD-SIMCA. Sensitivity and specificity a=0.05 expected/observed (%) A3 A4 A7 95/96 100/100 95/95 95/99 a=0.01 expected/observed (%) A3 A4 A7 99/97 100/100 99/100 99/99 WSC 10
Inactive substance Batch 1 FT-NIR DR spectra Batch 2 Training set 15 samples Test Set 10 samples FT-NIR DR spectra Batch 2 Training set 15 samples Test Set 10 samples WSC 10
Difference in modeling PLS-DA PCA Prediction WLoading WSC 10
Conclusions 1 Class-modeling methods develop the acceptance area around the target class, and, thus, delimit the target objects from any other objects and classes. 2 Using one-class classifier, we should always account for a risk of misclassification of alien objects. It is both important to validate the model using an independent set of the target objects, and to verify the model against a wide variety of the alien objects. 3 A well constructed discrimination method will perfectly classify a new sample only if this sample is a member of one of the predefined classes . If the new sample does not belong to any of such classes, the discriminant analysis is unable to properly define the membership of the sample. Discriminant methods are inappropriate for solving authentication problems. 4 Every task at hand requires an application of a pertinent chemometric method best suited to answer the posed question. WSC 10
Thank you for your attention! WSC 10