Download presentation
Presentation is loading. Please wait.
Published byLoraine Georgina Boone Modified over 9 years ago
1
1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP
2
2 How many components to use? Use ‘unfolding trick’ i.e. look at rank of each mode. –does not have strict statistical basis, but generally works well! Use core-consistency diagnostic (PARAFAC). –also seems to work well in practice Split-half analysis. Does algorithm converge without problems? Use full cross-validation. –N-way Toolbox now has a routine for this – can be slow! Look at loadings and residuals. Use chemical knowledge.
3
3 Preprocessing: centering (1) We are often interested in the differences between objects, not in their absolute values. –building calibration models: differences between samples Mean-centering removes offsets from the data –removes constant background effects –can help to linearize data, i.e.
4
4 Preprocessing: centering (2) When performing a calibration, it is most common to remove the mean value from each column: X object variable Two-way X primary variable secondary variable object x jk Three-way
5
5 Preprocessing: scaling (1) Sometimes we want to analyse variables measured in different units –chemical engineering: temperatures, pressures, flow rates –QSAR: ionization constants, Hammett constants, dipole moments These variables should be scaled in order to give variables an equal chance to appear in the model.
6
6 Preprocessing: scaling (2) For two-way arrays (object variables), it is common to divide by the standard deviation after mean- centering the data (‘autoscaling’): X object variable Two-way X primary variable secondary variable object x jk Three-way Autoscaling can destroy multilinear structure!
7
7 Preprocessing: scaling (3) process variable time object X XjXj Slab scaling maintains the multilinear structure! process variable 1 process variable 2 object X XjXj XkXk Double slab scaling may also be useful - ITERATIVE
8
8 Tucker models Tucker1: X = AG + E –Tucker1 = PCA Tucker2: X = G(B A) T + E –G (I R 2 R 3 ) –very rarely used Tucker3: X = AG(C B) T + E
9
9 PARAFAC2 time shift wavelength (J) time (K) object (I) In PARAFAC2, only the matrix product X i X i T (J J) is modelled. It works if the correlation structures in the objects are the same. time shift
10
10 Missing data Expectation-maximization (EM) is a technique for estimating models (PARAFAC, Tucker, PLS, PCA etc.) when some of the data is missing: X = [X* X # ] known missing 0. Initialize X # 1. Estimate model, (maximization) 3. Repeat until convergence 2. Replace missing values with model values (expectation)
11
11 Muitoobrigado parasua atenção!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.