Download presentation
Presentation is loading. Please wait.
Published byHendra Kusnadi Modified over 5 years ago
1
Data Driven SIMCA – more than One-Class Classifier
Semenov Institute of Chemical Physics, RAS Moscow Russian Chemometric Society Oxana Rodionova, Alexey Pomerantsev WSC-11
2
Soft Independent Modeling of Class Analogy - SIMCA (S
Soft Independent Modeling of Class Analogy - SIMCA (S. Wold: Pattern Recognition by Means of Disjoint Principal Components Models, (1976) t1 t2 t3 Disjoint PCA class -modeling Cut-off levels using orthogonal distances New object is compared with each class by calculation of the orthogonal distance … … … many additions and modifications WSC-11
3
Data driven approach Projection Orthogonal distance vi
Score distance hi WSC-11
4
Distribution of distances: DoF estimation
= h/h0 x= = v/v0 x1,...., xI ~ χ2(N)/N N = ? Method of Moments Interquartile Approach x(1) ≤ x(2 )≤ .... ≤ x(I-1) ≤ x(I) ¼ IQR ¼ WSC-11
5
Total Distance and Cut-off Level
Total distance (TD) For given α, the rate of wrong rejections of the target class samples, a type I error WSC-11
6
2 distribution (reminder 1)
=h/h0 x= =v/v0 x ~ χ2(N)/N N = DoF E(x) = 1 D(x) = 2/N A chi-squared variable with N degrees of freedom is defined as the sum of the squares of N independent standard normal random variables. WSC-11
7
2 distribution (reminder 2)
1001 10020 N(0,) E(i,j) ~ χ2(20) WSC-11
8
Simulated example 1 Gaussian noise only
10020 N(0,) E PCA DoFs Rank(E)=K=20 DoF(SD)= A (principal component) DoF(OD)=K-A WSC-11
9
Simulated example 2 Structure & no Noise
100200 ΛT=(150, 100, 50, 20, 1, 0.001) S Rank(S)=6 S=UΛVT PCA DosF WSC-11
10
DoFs for matrix S, α=0.05 PC=1 PC=2 Theory Estimate Nh=1 Nv=5 Nv=2
10 out 6 out PC=1 Theory Estimate Nh=2 Nv=4 Nv=1 9 out 4 out PC=2 WSC-11
11
Simulated example 3 Structure + Additive Noise
100200 ΛT=(150, 100, 50, 20, 1,0.001) S 100200 N(0,) E PCA X= + Estimates of DoFs for various WSC-11
12
Extreme plot. Dependence on α
=0.1 =0.05 =0.01 Demonstrates the dependence of the observed number of the extremes versus theoretically expected values, calculated as n=I. The plot is obtained by varying =n/I. WSC-11
13
Extreme plot. Training & Test sets
Test set (20 objects) Training set (80 objects) PC=4 PC=3 PC=7 PC=2 PC=1 WSC-11
14
Simulated example 4. Test set partly carries different structure
100200 Straining=UΛVt+E(0,) Λ U(100×6) Vnt(6×200) Vt(6×200) 100200 Stest=UΛVnt+E1(0,) Rank(S)=6 S=UΛVT ΛT=(150, 100, 50, 20, 1,0.001) WSC-11
15
Simulated example 4 (PCs=3)
Training set Test set PC=3 Nh=4 Nv=1 WSC-11
16
Simulated example 4 (PCs=4)
Training set Test set PC=4 Nh=5 Nv=1 WSC-11
17
Real-world example “Olives in brine”
3 Classes 233 objects 1258 variables Measurements: NIR spectra in DR mode cm-1 Class 1 Training set: 75 objects Test set : 44 objects O.Ye. Rodionova, P. Oliveri, A.L. Pomerantsev, "Rigorous and compliant approaches to one-class classification", Chemom. Intell. Lab. Syst. 159, (2016) WSC-11
18
Class 1.Training set 75×1258 PC=4 α=0.05 WSC-11
19
Application of the Extreme plot (1)
‘Olive in brine’ Class 1. Test set:44 objects PCs=4 PCs=5 PCs=7 PCs=2 WSC-11
20
Application of the Extreme plot (2)
Assessment of instruments’ performance for monitoring of tablets’ quality and anti-counterfeiting MicroNIR 1700 by VIAVI Solution Working range is nm Resolution is less than 12.5 nm Aims: 1. Compare the results of 2 instruments 2. Compare the results of measurements in 2 days WSC-11
21
Anti-inflammatory medicine packed in PVC blisters
Objects: 50 tablets from 5 batches Dataset (50 × 125) DD-SIMCA model, PCs=3 Dataset “Instrument #1 day 1” Dataset “Second day” Dataset “Second instrument” WSC-11
22
New Datasets Second day PCs=3 PCs=4 PCs=2 PCs=1 Second instrument
WSC-11
23
Interquartile Approach
Outlier detection DoF: Classical and Robust estimates = h/h0 x= = v/v0 x1,...., xI ~ χ2(N)/N N = ? Method of Moments Interquartile Approach x(1) ≤ x(2 )≤ .... ≤ x(I-1) ≤ x(I) ¼ IQR ¼ WSC-11
24
Real-world example (Poster #23)
Confocal Raman Spectroscopy and MDA in Evaluation of Spermatozoa with Normal and Abnormal Morphology Morphology classification 125 Normal 36 Abnormal Study the sperm nuclear DNA quality. Compare the results of morphology and Raman spectroscopy analysis in revealing normal and abnormal cells. WSC-11
25
Sequential application of DD-SIMCA for outlier detection
Sequential application of DD-SIMCA for outlier detection. ‘Normal’ model. Initial step Classical Nv=1; Nh=1 Classical Nv=1; Nh=2 PCs=4 PCs=3 Robust Nv=2; Nh=3 Robust Nv=3; Nh=4 WSC-11
26
Sequential application of DD-SIMCA for outlier detection
Sequential application of DD-SIMCA for outlier detection. ‘Normal’ model. Final step Classical Nv=2; Nh=3 Classical Nv=2; Nh=3 PCs=4 PCs=3 Robust Nv=3; Nh=4 Robust Nv=2; Nh=3 WSC-11
27
17 ‘Abnormal’-’Normal’ objects
Objects partitioning Morphology classification 17 ‘Abnormal’-’Normal’ objects 125 Normal 102 Normal 23 Abnormal Spectral classification 36 Abnormal 17 Normal 19 Abnormal WSC-11
28
Conclusions (1) PCA. Determination of the number of principal components for… Description of the X- data in details Determination of hidden structures even for higher PCs Separate structure from random noise Revealing the main common features of the X-data, without analyzing between-objects’ differences Outlier detection Estimation of the DoF for SD and OD distances Tool: WSC-11
29
Conclusions (2) Tool: PCA, application of various datasets for…
Comparison, to what extent the test set is similar to the training set Comparison a new data with the training objects Extreme plots for the training and test/new sets Tool: WSC-11
30
Conclusions (3) Both tools may be used not only for DD-SIMCA but for the preliminary analysis of any data set. We acknowledge partly funding from the IAEA in the frame of projects D5240 and G42007 WSC-11
31
Software tools For Chemometrics Add-In users: SIMCA Template.xlsb
For Matlab users: DD-SIMCA — a MATLAB GUI tool Github: Y.V. Zontov, O.Ye. Rodionova, S.V. Kucheryavskiy, A.L. Pomerantsev, "DD-SIMCA – A MATLAB GUI tool for data driven SIMCA approach", Chemom. Intell. Lab. Syst. 167, (2017)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.