Chemometric functions in Excel

Slides:



Advertisements
Similar presentations
Workshop in Esbjerg Course survey: what has been done, and what should be done Semenov Institute of Chemical Physics Russian Chemometrics Society.
Advertisements

Multivariate Statistical Process Control and Optimization
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.
Catalysis/ Rothenberg, ISBN Catalysis: Concepts and Green Applications Lecture slides for Chapter 6: Computer.
Artificial Neural Networks
1 Simple Interval Calculation (SIC-method) theory and applications. Rodionova Oxana Semenov Institute of Chemical Physics RAS & Russian.
Simple Interval Calculation bi-linear modelling method. SIC-method Rodionova Oxana Semenov Institute of Chemical Physics RAS & Russian.
1 Status Classification of MVC Objects Oxana Rodionova & Alexey Pomerantsev Semenov Institute of Chemical Physics Russian Chemometric Society Moscow.
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
WSC-6 Critical levels in projection Alexey Pomerantsev Semenov Institute of Chemical Physics, Moscow.
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S.
4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004.
CALIBRATION Prof.Dr.Cevdet Demir
Multivariate R e g r e s s i o n
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
1 5. Multiway calibration Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
Factor Analysis Psy 524 Ainsworth.
Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural.
Sirius™ version 6.0 Sirius™ is a software package for multivariate data analysis and experimental design. Application areas: Spectral analysis and calibration.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Chapter 2 Dimensionality Reduction. Linear Methods
Workshop at VUT Chemometrics in Excel Semenov Institute of Chemical Physics Russian Chemometrics Society Alexey Pomerantsev, Oxana Rodionova.
Project: Distance Learning Course in Chemometrics for Technological and Natural-Science Mastership Education Grant awarded by the Nordic Council of Ministers.
Classification Supervised and unsupervised Tormod Næs Matforsk and University of Oslo.
Graphite 2004 Statistical Synthesis of Facial Expressions for the Portrayal of Emotion Lisa Gralewski Bristol University United Kingdom
The Unscrambler ® A Handy Tool for Doing Chemometrics Prof. Waltraud Kessler Prof. Dr. Rudolf Kessler Hochschule Reutlingen, School of Applied Chemistry.
Regression analysis Control of built engineering objects, comparing to the plan Surveying observations – position of points Linear regression Regression.
Successive Bayesian Estimation Alexey Pomerantsev Semenov Institute of Chemical Physics Russian Chemometrics Society.
WSC-4 Simple View on Simple Interval Calculation (SIC) Alexey Pomerantsev, Oxana Rodionova Institute of Chemical Physics, Moscow and Kurt Varmuza.
Data Mining Manufacturing Data Dave E. Stevens Eastman Chemical Company Kingsport, TN.
Subset Selection Problem Oxana Rodionova & Alexey Pomerantsev Semenov Institute of Chemical Physics Russian Chemometric Society Moscow.
1 (21) EZinfo Introduction. 2 (21) EZinfo  A Software that makes data analysis easy  Reveals patterns, trends, groups, outliers and complex relationships.
Mathematics of PCR and CCA Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January.
A B S T R A C T The study presents the application of selected chemometric techniques to the pollution monitoring dataset, namely, cluster analysis,
PLS Regression Hervé Abdi The university of Texas at Dallas
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
1 Modelling procedures for directed network of data blocks Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen Data structures : Directed.
Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama
Correlation Coefficient -used as a measure of correlation between 2 variables -the closer observed values are to the most probable values, the more definite.
From linearity to nonlinear additive spline modeling in Partial Least-Squares regression Jean-François Durand Montpellier II University Scuola della Società.
Equilibrium systems Chromatography systems Number of PCs original Mean centered Number of PCs original Mean centered
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis (PCA)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Stat240: Principal Component Analysis (PCA). Open/closed book examination data >scores=as.matrix(read.table(" hs.leeds.ac.uk/~charles/mva-
Date of download: 6/21/2016 Copyright © 2016 SPIE. All rights reserved. Cellular and functional characterization of hepatocyte cells at the last stage.
10th Winter Symposium on Chemometrics
Course survey: what has been done, and what should be done
COMP 1942 PCA TA: Harry Chan COMP1942.
How to solve authentication problems
Factor analysis Advanced Quantitative Research Methods
Residuals From the Carnegie Foundation math.mtsac.edu/statway/lesson_3.3.1_version1.5A.
Principal Component Analysis (PCA)
Food adulteration analysis without laboratory prepared or determined reference food adulterant values John H. Kalivasa*, Constantinos A. Georgioub, Marianna.
Dimension Reduction via PCA (Principal Component Analysis)
Example of PCR, interpretation of calibration equations
Review Homework.
Data Driven SIMCA – more than One-Class Classifier
Residuals From the Carnegie Foundation math.mtsac.edu/statway/lesson_3.3.1_version1.5A.
Factor Analysis (Principal Components) Output
Seasonal Forecasting Using the Climate Predictability Tool
SIMCA.XLA as an extension of Chemometrics Add-In
Road Sensor Data Marco Puts
Recognition of the 'high quality’ forgeries among medicines
Residuals From the Carnegie Foundation math.mtsac.edu/statway/lesson_3.3.1_version1.5A.
8/22/2019 Exercise 1 In the ISwR data set alkfos, do a PCA of the placebo and Tamoxifen groups separately, then together. Plot the first two principal.
Presentation transcript:

Chemometric functions in Excel Oxana Rodionova & Alexey Pomerantsev Semenov Institute of Chemical Physics rcs@chph.ras.ru 01.12.08

Distance Learning Course in Chemometrics for Technological and Natural-Science Mastership Education Unfulfilled need in chemometric education in Russia Low number of qualified specialists in chemometrics Large distances, e.g. Moscow – Barnaul is about 3000 km No modern chemometrics books in Russian No available chemometric software No support from officials: government, Academy, etc 3000 km Easy available everywhere => INTERNET Interactive layout: all calculations should be clear and repeatable Web friendly environment for the calculations => EXCEL Necessity to make and use our own (free) software => EXCEL Add-In 4000 km Barnaul 01.12.08

Chemometric calculations in Excel Provides user with all possibilities of Excel interface, worksheet calculations, worksheet functions, charts, etc. VBA helps to simplify routine work All calculations are made "on the fly“ and very fast 01.12.08

Installation http://rcs.chph.ras.ru/down/sacs.zip Chemometrics.dll  put in your Windows folder (C:\WINDOWS\) Chemometrics. xla  put in the AddInn folder (C:\Documents and Settings\ <User>\Application Data\ Microsoft\AddIns\) Load Chemometrics.xla by < Excel Options>  <Add-Ins> in the open Workbook 01.12.08

Matrix calculations in Excel ={TRANSPOSE(B6:F10)} Ctrl-Shift-Enter B6:F10 Barr ={MMULT(B6:F10,TRANSPOSE(Barr))} 01.12.08

Principal Component Analysis (PCA) Initial data = + × Error matrix E I J A Score matrix T I Loading matrix X P J A PT J A I J X=TPT+E 01.12.08

Chemometrics XLA. PCA Scores Xcal Xtst Centering AND/OR weighting ={ScoresPCA(Xcal,5,1,Xtst)} nPC 01.12.08

Chemometrics XLA. PCA Loadings Xcal Excel worksheet function =TRANSPOSE(LoadingsPCA(Xcal,5,1))} nPC Centering AND/OR weighting 01.12.08

List of chemometric functions PCA ScoresPCA <for calibration or test samples> LoadingsPCA PLS ScoresPLS <X-scores for calibration or test samples> UScoresPLS <Y-scores for calibration or test samples> LoadingsPLS <P-loadings> WLoadingsPLS QLoadingsPLS PLS2 ScoresPLS2 <X-scores for calibration or test samples> UScoresPLS2 <Y-scores for calibration or test samples> LoadingsPLS2 <P-loadings> WLoadingsPLS2 QLoadingsPLS2 Options: Centering AND/OR scaling Number of PCs 01.12.08

X data (calibration set) ScoresPCA X data (calibration set) ScoresPCA (rMatrix [, nPCs] [,nCentWeightX] [, rMatrixNew] )  Number of PC (A) Test set centering and/or scaling 1 centering 2 scaling 3 both X[IJ]  T[I A] 01.12.08

10% -out cross-validation Validation Rules If rMatrixNew is omitted then only calibration scores are calculated If rMatrixNew is specified then only test scores are calculated If rMatrixNew coincides with rMatrix then cross-validation is calculated 10% -out cross-validation 01.12.08

X data (calibration set) LoadingsPCA X data (calibration set) LoadingsPCA (rMatrix [, nPCs] [,nCentWeightX])  Number of PC (A) centering and/or scaling 1 centering 2 scaling 3 both X[IJ]  P[J A] 01.12.08

Explorative Data Analysis Case study 1: People 01.12.08

People 01.12.08

Dataset in Excel Workbook (People.xls) Number of objects (n) = 32 Number of variables (m) = 12 01.12.08

Data Preprocessing Aim: to transform the data into the most suitable form for data analysis 01.12.08

Autoscaling mean centering scaling autoscaling + = 01.12.08

People: Scores & Loadings (PC1 vs. PC2) “Map of Samples” “Map of Variables” 01.12.08

People: Scores & Loadings (PC1 vs. PC3) Loading plot Score plot 01.12.08

Case study 2: HPLC-DAD 01.12.08

Measurements 01.12.08

Dataset in Excel Workbook 01.12.08

If we observe X can we predict C and S ? Pure compounds A and B If we observe X can we predict C and S ? X=CST+E 01.12.08

Score plot A B 01.12.08

Conclusions from the Score Plot 1. Linear regions = Pure compounds 2. Curved line= Co-elution 3. Closer to the origin = Lower intensity 4. Number of bends = Number of different compounds 01.12.08

Factor analysis vs. PCA analysis X E1 + = C ST × 2 J I X E2 + = T PT × A J I 01.12.08

Scores and Loadings 01.12.08

Procrustes transformation X ≈ CST X ≈ TPT I = RRT = Identity matrix X ≈ T(RRT)PT = (TR)(PR)T C ≈ TR S ≈ PR R = Rstretch ×Rrotation ^ ^ 01.12.08

Scores Transformation Stretching Rotation 01.12.08

Procrustes analysis results 01.12.08

Conclusions Scaling and centering is problem dependent In this example number of PCs = Number of different compounds 01.12.08

Regression 01.12.08

Principal Component Regression (PCR) 1 t A ... P T X 1) PCA y T a = + e  2) MLR 01.12.08

Projection on Latent Structures (PLS) Q U P T W p 1 t A ... X u 1 A ... q t Y w 1 t A ... 01.12.08

Projection on Latent Structures (PLS) B = + e  Y 01.12.08

PLS and PLS2 b = + e  y T 1 PLS B = + E  Y T M PLS2 01.12.08

ScoresPLS X[IJ], Y[I1]  T[IA] X data (calibration set) Y data (calibration set) ScoresPLS (rMatrixX, rMatrixY [, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew]) Number of PC (A) X Test set centering and/or scaling of X 1 centering 2 scaling 3 both centering and/or scaling of Y 1 centering 2 scaling 3 both X[IJ], Y[I1]  T[IA] 01.12.08

UScoresPLS X[IJ] , Y[I1]  U[I A] X data (calibration set) Y data (calibration set) UScoresPLS (rMatrixX, rMatrixY [, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew] [, rMatrixYNew]) Number of PC (A) X Test set Y Test set centering and/or scaling of X 1 centering 2 scaling 3 both centering and/or scaling of Y 1 centering 2 scaling 3 both X[IJ] , Y[I1]  U[I A] 01.12.08

WLoadingsPLS X[IJ] , Y[I1]  W[J A] X data (calibration set) Y data (calibration set) WLoadingsPLS (rMatrixX, rMatrixY [, nPCs] [, nCentWeightX] [, nCentWeightY]) Number of PC (A) centering and/or scaling of X 1 centering 2 scaling 3 both centering and/or scaling of Y 1 centering 2 scaling 3 both X[IJ] , Y[I1]  W[J A] 01.12.08

LoadingsPLS X[IJ] , Y[I1]  P[JA] X data (calibration set) Y data (calibration set) LoadingsPLS (rMatrixX, rMatrixY [, nPCs] [, nCentWeightX] [, nCentWeightY]) Number of PC (A) centering and/or scaling of X 1 centering 2 scaling 3 both centering and/or scaling of Y 1 centering 2 scaling 3 both X[IJ] , Y[I1]  P[JA] 01.12.08

QLoadingsPLS X[IJ], Y[I1]  Q[1 A] X data (calibration set) Y data (calibration set) QLoadingsPLS (rMatrixX, rMatrixY [, nPCs] [, nCentWeightX] [, nCentWeightY]) Number of PC (A) centering and/or scaling of X 1 centering 2 scaling 3 both centering and/or scaling of Y 1 centering 2 scaling 3 both X[IJ], Y[I1]  Q[1 A] 01.12.08

ScoresPLS2 X[IJ], Y[IK]  T[I A] X data (calibration set) Y data (calibration set) ScoresPLS2 (rMatrixX, rMatrixY [, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew]) Number of PC (A) X Test set centering and/or scaling of X 1 centering 2 scaling 3 both centering and/or scaling of Y 1 centering 2 scaling 3 both X[IJ], Y[IK]  T[I A] 01.12.08

UScoresPLS2 X[IJ], Y[IK]  U[I A] X data (calibration set) Y data (calibration set) UScoresPLS2 (rMatrixX, rMatrixY [, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew] [, rMatrixYNew]) Number of PC (A) X Test set Y Test set centering and/or scaling of X 1 centering 2 scaling 3 both centering and/or scaling of Y 1 centering 2 scaling 3 both X[IJ], Y[IK]  U[I A] 01.12.08

LoadingsPLS2 WLoadingsPLS2 QLoadingsPLS2 X data (calibration set) Y data (calibration set) LoadingsPLS2 (rMatrixX, rMatrixY [, nPCs] [, nCentWeightX] [, nCentWeightY]) Number of PC (A) centering and/or scaling of X 1 centering 2 scaling 3 both centering and/or scaling of Y 1 centering 2 scaling 3 both X[IJ], Y[IK]  P[J A] or W[J A] or Q[K A] 01.12.08

Seventh Winter Symposium on Chemometrics near Tula city, February 2010 100 km 01.12.08