1 2. The PARAFAC model Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.

Slides:



Advertisements
Similar presentations
Analysis of the Visible Absorption Spectrum of I 2 in Inert Solvents Using a Physical Model Joel Tellinghuisen Department of Chemistry Vanderbilt University.
Advertisements

Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
pH Emission Spectrum Emission(3 λ) λ1 λ2 λ3 A λ λ1λ2λ3λ1λ2λ3 A Ex 1 Emission(3 λ) λ1λ2λ3λ1λ2λ3 A Ex 2 Emission(3 λ) λ1λ2λ3λ1λ2λ3 A Ex 3 λ1λ2λ3λ1λ2λ3.
بنام خدا 1. An Introduction to multi-way analysis Mohsen Kompany-Zareh IASBS, Nov 1-3, Session one.
Fitting the PARAFAC model Giorgio Tomasi Chemometrics group, LMT,MLI, KVL Frederiksberg. Denmark
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
1 Maarten De Vos SISTA – SCD - BIOMED K.U.Leuven On the combination of ICA and CPA Maarten De Vos Dimitri Nion Sabine Van Huffel Lieven De Lathauwer.
No Data Left Behind Modeling Colorful Compounds in Chemical Equilibria Mike DeVries D. Kwabena Bediako Prof. Douglas A. Vander Griend.
Lecture 7: Principal component analysis (PCA)
In the Name of God. Morteza Bahram Department of Chemistry, Faculty of Science, Urmia University, Urmia, Iran
By: S.M. Sajjadi Islamic Azad University, Parsian Branch, Parsian,Iran.
Computer Graphics Recitation 5.
Calculating Spectral Coefficients for Walsh Transform using Butterflies Marek Perkowski September 21, 2005.
Factor Analysis Purpose of Factor Analysis
The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S.
Probability theory 2011 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different definitions.
CALIBRATION Prof.Dr.Cevdet Demir
Initial estimates for MCR-ALS method: EFA and SIMPLISMA
Review of Matrix Algebra
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
Linear and generalised linear models
MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.
Linear and generalised linear models
1 5. Multiway calibration Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
1 Operations with Matrice 2 Properties of Matrix Operations
CALIBRATION METHODS.
Factor Analysis Psy 524 Ainsworth.
Solving Systems of Equations and Inequalities
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Solving Systems of Equations and Inequalities Section 3.1A-B Two variable linear equations Section 3.1C Matrices Resolution of linear systems Section 3.1D.
STAT 497 LECTURE NOTES 2.
1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
Peak-purity by LC-MS and LC-DAD Knut Dyrstad Erlend Hvattum Sharon Jara Arnvid Lie.
Threeway analysis Batch organic synthesis. Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 6 Solving Normal Equations and Estimating Estimable Model Parameters.
ASCA: analysis of multivariate data from an experimental design, Biosystems Data Analysis group Universiteit van Amsterdam.
COHA Update Jin Xu. Update 2003 and 2004 back-trajectories – done PMF modeling by groups using 2000 to 2004 IMPROVE data – done Analysis of PMF results.
CALIBRATION METHODS. For many analytical techniques, we need to evaluate the response of the unknown sample against the responses of a set of standards.
2009/9 1 Matrices(§3.8)  A matrix is a rectangular array of objects (usually numbers).  An m  n (“m by n”) matrix has exactly m horizontal rows, and.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
Physics 3210 Week 14 clicker questions. When expanding the potential energy about a minimum (at the origin), we have What can we say about the coefficients.
In the name of GOD. Zeinab Mokhtari 1-Mar-2010 In data analysis, many situations arise where plotting and visualization are helpful or an absolute requirement.
Linear Systems – Iterative methods
THREE-WAY COMPONENT MODELS pages By: Maryam Khoshkam 1.
Practical applications: CCD spectroscopy Tracing path of 2-d spectrum across detector –Measuring position of spectrum on detector –Fitting a polynomial.
Equilibrium systems Chromatography systems Number of PCs original Mean centered Number of PCs original Mean centered
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
1 4. Model constraints Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
Multiway Data Analysis
Rotational Ambiguity in Soft- Modeling Methods. D = USV = u 1 s 11 v 1 + … + u r s rr v r Singular Value Decomposition Row vectors: d 1,: d 2,: d p,:
Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP
Matrices and Determinants
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 16, 2009.
Laser light stimulates vibration, causing some energy to be absorbed and the rest to be scattered as a slightly red-shifted photon. Raman Microspectroscopic.
1 Robustness of Multiway Methods in Relation to Homoscedastic and Hetroscedastic Noise T. Khayamian Department of Chemistry, Isfahan University of Technology,
2/26/ Gauss-Siedel Method Electrical Engineering Majors Authors: Autar Kaw
Rank Annihilation Based Methods. p n X The rank of matrix X is equal to the number of linearly independent vectors from which all p columns of X can be.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Source apportionment of submicron organic aerosols at an urban site by linear unmixing of aerosol mass spectra V. A. Lanz 1, M. R. Alfarra 2, U. Baltensperger.
Self-Modeling Curve Resolution and Constraints Hamid Abdollahi Department of Chemistry, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan,
An Introduction to Model-Free Chemical Analysis Hamid Abdollahi IASBS, Zanjan Lecture 3.
Refitting PCA/MPCA and CLS/PARAFAC Models to Incomplete Data Records
Rotational Ambiguity in Hard-Soft Modeling Method
Strategies for Eliminating Interferences in Optical Emission Spectroscopy Best practices to optimize your method and correct for interferences to produce.
Example of PCR, interpretation of calibration equations
Addressing THE Problem of NIR
X.6 Non-Negative Matrix Factorization
Presentation transcript:

1 2. The PARAFAC model Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

2 Example: fluorescence data (1) Each fluorescence spectrum is a matrix of emission vs excitation wavelengths: X i (201  61)

3 Example: fluorescence data (2) Each spectrum is a linear sum of three components: tryptophan, phenylalanine and tyrosine. X i = a i1 b 1 c 1 T + a i2 b 2 c 2 T + a i3 b 3 c 3 T + E i concentration of tryptophan in sample i emission spectrum of pure tryptophan excitation spectrum of pure tryptophan XiXi = b1b1 c1Tc1T a i1  b2b2 c2Tc2T a i2  + b3b3 c3Tc3T a i3  + + Ei+ Ei

4 Example: fluorescence data (3) Five samples were measured and stacked to give a three-way array: X (5  201  61). X5X5 X4X4 X3X3 X2X2 X1X1 5 samples 201 emission ’s 61 excitation ’s = b1Tb1T c1Tc1T a1a1 b2Tb2T c2Tc2T a2a2 + b3Tb3T c3Tc3T a3a3 + + E concentration of tryptophan in each sample

5 Example: fluorescence data (4) If we are given a set of fluroescence spectra, X, how can we determine: –How many chemical species are present? –Which chemical species are present? What are their pure excitation and emission spectra? i.e. self-modelling curve resolution (SMCR) –What is the concentration of each species in each sample? i.e. (second-order) calibration Answer: use the PARAFAC model!

6 The PARAFAC model (1) E BTBT CTCT A + = K X J I = b2Tb2T c2Tc2T a2a2 + cRTcRT bRTbRT aRaR … + + E c1Tc1T b1Tb1T a1a1 Triad }

7 The PARAFAC model (2) Loadings –A (I  R) describes variation in the first mode. –B (J  R) describes variation in the second mode. –C (K  R) describes variation in the third mode. Residuals –E (I  J  K) are the model residuals. E BTBT CTCT A + = K X J I

8 Example: fluorescence data (5) Loadings –A (5  3) describes the component concentrations. –B (201  3) describes the pure component emission spectra. –C (61  3) describes the pure component excitation spectra. Residuals –E (5  201  61) describes instrument noise. E BTBT CTCT A + = X 5 samples 201 emission ’s 61 excitation ’s

9 Example: fluorescence data (6) A 3-component PARAFAC model describes 99.94% of X. B (201  3)C (61  3) phenylalanine tyrosine tryptophan tyrosine phenylalanine

10 Example: fluorescence data (7) The A-loadings describe the relative amounts of species 1 (tryptophan), 2 (tyrosine) and 3 (phenylalanine) in each sample: In order to know the absolute amounts, it is necessary to use a standard of known concentrations, i.e. sample 5. A (5  3) Concentrations (ppm)

11 The PARAFAC formula Data array –X (I  J  K) is matricized into X I  JK (I  JK) X I  JK = A(C  B) T + E I  JK Loadings –A (I  R) describes variation in the first mode –B (J  R) describes variation in the second mode –C (K  R) describes variation in the third mode Residuals –E (I  J  K) is matricized into E I  JK (I  JK) Khatri-Rao matrix product

12 PCA vs PARAFAC PCA Bilinear model X = AB T + E PARAFAC Trilinear model X I  JK = A(C  B) T + E I  JK Components are calculated sequentially in order of importance. Components are calculated simultaneously in random order. Solution is unique (i.e. not possible to rotate factors without losing fit). Solution has rotational freedom. Orthogonal, i.e. B T B = INot (usually) orthgonal.

13 Rotational freedom The bilinear model X = AB T + E contains rotational freedom. There are many sets of loadings (and scores) which give exactly the same residuals, E: X = AB T + E = ARR -1 B T + E = A*B* T + E (A*=AR B* T =R -1 B T ) This model is not unique – there are many different sets of loadings which give the same % fit.

14 PARAFAC solution is unique The trilinear model X = A(C  B) T + E is said to be unique, because it is not possible to rotate the loadings without changing the residuals, E: X = A(C  B) T + E = ARR -1 (C  B) T + E = A*(C*  B*) T + E* This is why PARAFAC is able to find the correct fluorescence profiles – because the unique solution is close to the true solution.

15 Spot the difference! PCA loadings PARAFAC loadings

16 Alternating least squares (ALS) How to estimate the PCA model X = AB T + E? Step 0 - Initialize B Step 1 - Estimate A using least squares: Step 2 - Estimate B using least squares: Step 3 - Check for convergence - if not, go to Step 1. Each update must reduce the sum-of-squares,

17 Three different unfoldings – the formula is symmetric X I  JK = A(C  B) T + E I  JK X J  KI = B(A  C) T + E J  KI X K  IJ = C(B  A) T + E K  IJ or X I  JK X J  KI X K  IJ

18 How is the PARAFAC model calculated? Step 0 - Initialize B & C Step 1 - Estimate A: Step 4: Check for convergence. If not, go to Step 1. Step 3 - Estimate C in same way: Step 2 - Estimate B in same way: How to estimate the model X = A(C  B) T + E?

19 Good initialization is sometimes important Initialization methods –random numbers (do this ten times and compare models) –use another method to give rough estimate (e.g. DTLD, MCR) –use sensible guesses (e.g. elution profiles are Gaussian) response surface initialize B & Cgood solution local minium initialize B* & C* ALS

20 Conclusions (1) The PARAFAC model decomposes a three-way array array into three sets of loadings – one for each ‘mode’.Each set of loadings describes the variation in that mode, e.g. differences in concentration, changes in time, spectral profiles etc. PARAFAC components are calculated together and have no particular order. PARAFAC components are not orthogonal and cannot be rotated. PARAFAC can be used for curve resolution and for calibration.

21 Conclusions (2) Some data sets have a chemical structure which is particularly suitable for the PARAFAC model, e.g. fluorescence spectroscopy. The PARAFAC model can also be used for four-way, five-way, N-way etc. data by simply using more sets of loadings.