Multi-class PLS-DA: soft and hard approaches

Slides:



Advertisements
Similar presentations
Application of NIR for counterfeit drug detection Another proof that chemometrics is usable: NIR confirmed by HPLC-DAD-MS and CE-UV Institute of Chemical.
Advertisements

PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Component Analysis (Review)
Evaluating Classifiers
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Simple Interval Calculation bi-linear modelling method. SIC-method Rodionova Oxana Semenov Institute of Chemical Physics RAS & Russian.
WSC-6 Critical levels in projection Alexey Pomerantsev Semenov Institute of Chemical Physics, Moscow.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
x – independent variable (input)
Classification and risk prediction
DEVELOPMENT OF A NOVEL CONTINUOUS STATISTICAL MODELLING TECHNIQUE FOR DETECTING THE ADULTERATION OF EXTRA VIRGIN OLIVE OIL WITH HAZELNUT.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Classification Supervised and unsupervised Tormod Næs Matforsk and University of Oslo.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
An Introduction to Support Vector Machines (M. Law)
1 (21) EZinfo Introduction. 2 (21) EZinfo  A Software that makes data analysis easy  Reveals patterns, trends, groups, outliers and complex relationships.
This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.
CLASSIFICATION. Periodic Table of Elements 1789 Lavosier 1869 Mendelev.
Evaluating Results of Learning Blaž Zupan
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Principal Component Analysis (PCA)
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Dual data driven SIMCA as a one-class classifier WSC-9 Alexey Pomerantsev ICP RAS.
Martina Uray Heinz Mayer Joanneum Research Graz Institute of Digital Image Processing Horst Bischof Graz University of Technology Institute for Computer.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Principal Component Analysis (PCA)
Chapter 12 – Discriminant Analysis
Matteo Reggente Giulia Ruggeri Satoshi Takahama
JMP Discovery Summit 2016 Janet Alvarado
How to forecast solar flares?
Chapter 7. Classification and Prediction
Chapter 3: Maximum-Likelihood Parameter Estimation
Course survey: what has been done, and what should be done
Evaluation – next steps
Real-Time Soft Shadows with Adaptive Light Source Sampling
ELECTRONIC TONGUE BY R.PAVAN KUMAR, RIPER-ANANTAPUR.
Chemometrics for Analysis of NIR Spectra
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Sofus A. Macskassy Fetch Technologies
How to solve authentication problems
Evaluating Results of Learning
CH 5: Multivariate Methods
ECE 5424: Introduction to Machine Learning
Machine Learning Basics
Chapter 7 – K-Nearest-Neighbor
Interval selection complexity
Quality Control at a Local Brewery
Pattern Recognition and Image Analysis
Nat. Rev. Nephrol. doi: /nrneph
Addressing THE Problem of NIR
X.1 Principal component analysis
Generally Discriminant Analysis
Data Driven SIMCA – more than One-Class Classifier
Parametric Methods Berlin Chen, 2005 References:
Chapter 7: Transformations
Multivariate Methods Berlin Chen
Feature Selection Methods
Multivariate Methods Berlin Chen, 2005 References:
Recognition of the 'high quality’ forgeries among medicines
What is Artificial Intelligence?
An introduction to Machine Learning (ML)
Is Statistics=Data Science
Presentation transcript:

Multi-class PLS-DA: soft and hard approaches Alexey Pomerantsev, Oxana Rodionova Semenov Institute of Chemical Physics 27.02.18 WSC-11

Papers by Scopus 27.02.18 WSC-11

Motivation: 1 of 4 PLS-DA is an enormously popular method that, however, has few theoretical papers. 1. L. Stahle, S. Wold, Partial Least Squares Analysis with Cross-Validation for the Two-Class Problem: A Monte Carlo Study, J. Chemom., 1, 185-196 (1987). 2. M. Barker, W. S. Rayens, Partial least squares for discrimination. J. Chemom., 17: 166-173 (2003) 3. U. G. Indahl, H. Martens, T. Næs, From dummy regression to prior probabilities in PLS-DA. J. Chemom. 21:529–536 (2007) 27.02.18 WSC-11

Motivation: 2 of 4 Most papers are concern with a binary PLS-DA. In attempts to avoid the actual multi-class discrimination, researchers invent very complex schemes that split a multi-class task into a set of binary problems . Fig. 1. Calculation of the reliability of classification for a classification problem with C= 3 classes. N. F. Perez, J. Ferre, R. Boque, Multi-class classification with probabilistic discriminant partial least squares (p-DPLS), Anal. Chim. Acta, 664: 27–33 (2010) 27.02.18 WSC-11

Motivation: 3 of 4 The application of the PLS scores for classification can lead to incorrect results and wrong interpretations. Figure 4. Score plot for a PLSDA model of random data. Excellent—but meaningless—class separation is obtained K. Kjeldahl, R. Bro, Some common misunderstandings in chemometrics, J. Chemom.; 24: 558–564 (2010) 27.02.18 WSC-11

Fig. 2. Application of PLS-DA for authentication. Motivation: 4 of 4 PLS-DA is an inappropriate method of authentication. In fact, PLS-DA has a serious shortcoming being a hard classification tool. Is it possible to soften it? Fig. 2. Application of PLS-DA for authentication. O.Ye. Rodionova, A.V. Titova, A.L. Pomerantsev, Discriminant analysis is an inappropriate method of authentication, Trends Anal. Chem., 78 (4), 17-22 (2016) 27.02.18 WSC-11

X Y Basics CLASS 1 CLASS 2 PLS2 CLASS K K classes J variables (fingerprints) x11 x12 … x1k xi1 Xi2 xik xi+1,1 xi+1,2 x1+1,k ... xn1 xn2 Xnk xnk X Y PLS2 CLASS 1 e1=(1,0,0,..., 0) e2=(0,1,0,..., 0) e3=(0,0,1,..., 0) e1 e2 eK ... CLASS 2 I samples CLASS K 27.02.18 WSC-11

PLS2 Technical Details X Y 1 3 Xs Ys 2 27.02.18 WSC-11

Naïve decision Y CLASS 1 CLASS 2 CLASS K 1.01 0.02 … -0.05 0.98 1.02 1.02 -0.03 0.95 0.06 0.04 -0.02 1.05 0.08 ... 1.1 1 … ... CLASS 1 CLASS 2 CLASS K 27.02.18 WSC-11

Key property of Y 1 … ... Σ=1 … 1.01 0.02 … -0.05 0.98 1.02 -0.03 0.95 … ... Σ=1 … 1.01 0.02 … -0.05 0.98 1.02 -0.03 0.95 0.06 0.04 -0.02 1.05 0.08 ... 1.1 Σ=1 … 27.02.18 WSC-11

Geometry of All Ŷ values are located on the hyperplane, which passes through the basic vectors: e1=(1, 0,...,0)t e2=(0, 1,...,0)t ..... eK=(0, 0,...,1)t 27.02.18 WSC-11

Superscores: PCA on PCA Class centers el ck cl,1 ... ... 1 cl,K-1 ... 1 cl,1 ... cl,K-1 PCA Known centers of classes: ck Generic Mahalanobis metrics ttΛ-1t Orthogonality Λ=TtT 27.02.18 WSC-11

PLS-DA route PLS-DA is not a classifier. It serves as a feature extractor from high-dimensional X space into low-dimensional Y space. Similar to PCA. 27.02.18 WSC-11

Datasets Name Classes Variables Samples Juices Olives Drugs 3 Concentrations 15 20+7+11=38 Fidelis, M., et al., Authentication of juices from antioxidant and chemical perspectives: A feasibility quality control study using chemometrics, Food Control, 73 ,796-805 (2017) Olives NIR spectra 1250 111+72+50=233 Oliveri P. et all, Partial least squares density modeling (PLS-DM) – A new class-modeling strategy applied to the authentication of olives in brine by near-infrared spectroscopy, Anal. Chim. Acta. 851, 30–36 (2014) Drugs 7 NIR spectra 890 30+ 50+70+50+30+50+100=380 Rodionova O.Ye. et al, Quantitative risk assessment in classification of drugs with identical API content, J. Pharm. Biomed. Anal. 98, 186-192 (2014) 27.02.18 WSC-11

LDA Hard PLS-DA: Juices Λ=TtT dik = (ti – ck)t Λ–1(ti – ck) min (dik ) All hyperplanes intersect at a one point. 27.02.18 WSC-11

QDA Soft PLS-DA: Juices dik<χ–2(1–α, K–1) Soft classification: a sample can be attributed to several classes, or not classified 27.02.18 WSC-11

Outliers and aliens: Drugs A4, A6 new classes A2, A5, A7 training classes 27.02.18 WSC-11

Confusion matrix: Juices Hard Soft 27.02.18 WSC-11

Figures of Merit: True Positive C A S True Positive TP 18 7 11 False Positive FP Class Sensitivity CSNS Class Specificity CSPS Class Efficiency CEFF 27.02.18 WSC-11

Figures of Merit: False Positive C A S True Positive TP 18 7 11 False Positive FP 1 Class Sensitivity CSNS Class Specificity CSPS Class Efficiency CEFF 27.02.18 WSC-11

Figures of Merit: Class Sensitivity True Positive TP 18 7 11 False Positive FP 1 Class Sensitivity CSNS 90% 100% Class Specificity CSPS Class Efficiency CEFF 27.02.18 WSC-11

Figures of Merit: Class Specificity True Positive TP 18 7 11 False Positive FP 1 Class Sensitivity CSNS 90% 100% Class Specificity CSPS 94% 97% Class Efficiency CEFF 27.02.18 WSC-11

Figures of Merit: Class Efficiency True Positive TP 18 7 11 False Positive FP 1 Class Sensitivity CSNS 90% 100% Class Specificity CSPS 94% 97% Class Efficiency CEFF 92% 98% 27.02.18 WSC-11

Figures of Merit: Total Sensitivity 38 Total Sensitivity TSNS 95% Total Specificity TSPS Total Efficiency TEFF 27.02.18 WSC-11

Figures of Merit: Total Specificity 38 Total Sensitivity TSNS 95% Total Specificity TSPS Total Efficiency TEFF 27.02.18 WSC-11

Figures of Merit: Total Efficiency 38 Total Sensitivity TSNS 95% Total Specificity TSPS Total Efficiency TEFF 27.02.18 WSC-11

PLS-DA complexity: Olives Hard Soft 27.02.18 WSC-11

Results: Olives Hard 9 LVs Soft 13 LVs T L C Training Sensitivity 100% Specificity Test T L C Training Sensitivity 96% 95% 98% Specificity 100% Test 27.02.18 WSC-11

Binary (two class) PLS-DA 3, 7 training classes 4 new class y Y 1 … +1 … −1 soft acceptance areas PLS2 hard acceptance PLS outliers borders 27.02.18 WSC-11

Need for balance? Not at all! Olives case TEFF Hard Soft 92% 86% 91% 81% 89% TEFF Hard Soft 92% 91% 75% PLS-DA utilizes a regression approach, the efficiency of which mostly depends on the design of experiment, rather than on the size of the data. 27.02.18 WSC-11

PLS-DA and SIMCA: 'vatrushka’ case Data set TEFF=84 % TEFF= 53/52 % 27.02.18 WSC-11

Poster P23 Confocal Raman spectroscopy and multivariate data analysis in evaluation of spermatozoa with normal and abnormal morphology 27.02.18 WSC-11

PLS-DA and SIMCA: 'stroopwafel' case Data set TEFF=65 % TEFF= 84/72 % 27.02.18 WSC-11

‘One vs All’ or ‘All vs All’ genuine fakes suspect A1 A2 A3 A4 A5 A6 A7 A2 A3 A4 A5 A1 A6 One vs All two-class training A7 prediction A6 A2 A3 A4 A5 A1 All vs All multi-class training A7 prediction 27.02.18 WSC-11

Results: Drugs Training A7 TSNS TSPS One vs All All vs All SIMCA 94% 100% 98% All vs All 99% SIMCA 100% 27.02.18 WSC-11

PLS-DA Template for Excel 27.02.18 WSC-11

Y.V. Zontov, O.Ye. Rodionova, S.V. Kucheryavskiy, A.L.Pomerantsev Matlab: see Poster P03 Software implementation of the Hard and Soft Partial Least Squares Discriminant Analysis Y.V. Zontov, O.Ye. Rodionova, S.V. Kucheryavskiy, A.L.Pomerantsev 27.02.18 WSC-11

Conclusions 1 of 6 We proposed the multi-class version of PLS-DA, which, in fact, is not more complex than the conventional binary (two-class) PLS-DA. The method does not utilize the PLS scores, but is entirely based on the predicted dummy responses. 27.02.18 WSC-11

Conclusions 2 of 6 We suggested using PCA that converts response matrix Ŷ, into the super score matrix T. We discussed an amusing geometry of the score space. 27.02.18 WSC-11

Conclusions 3 of 6 We consider that PLS-DA is not a classifier. It serves as a feature extractor from high-dimensional X space into low-dimensional Y space. Classification methods come after. 27.02.18 WSC-11

Conclusions 4 of 6 We introduced two classification methods that utilize that concept. The first is a conventional hard PLS-DA approach based on LDA. The second is a novel soft PLS-DA method based on QDA. 27.02.18 WSC-11

Conclusions 5 of 6 We defined the principal measures of classification quality (sensitivity, specificity, and efficiency) for the multi-class PLS-DA. We suggested using these characteristics for the selection of the PLS model complexity. 27.02.18 WSC-11

Conclusions 6 of 6 We compared the discriminant (PLS-DA) and the class-modeling (SIMCA). It was shown that SIMCA is better when one class is tight, and the other class is broad. On the contrary, PLS-DA is preferable in cases when two classes have the same major components but different impurities. 27.02.18 WSC-11

Conclusions 7 of 6 A popular opinion that an equal number of objects in the training classes is preferred for a good PLS-DA model is analyzed and found to be wrong. In fact, PLS-DA utilizes a regression approach, the efficiency of which depends primarily on the design of the experiment, rather than on the size of data. 27.02.18 WSC-11

Thank you for your attention A Lawyer’s failure 27.02.18 WSC-11

PLS-DA END MILE LDA TO QDA TO kNN TO 27.02.18 WSC-11