1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.

1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

2 How many components to use? Use ‘unfolding trick’ i.e. look at rank of each mode. –does not have strict statistical basis, but generally works well! Use core-consistency diagnostic (PARAFAC). –also seems to work well in practice Split-half analysis. Does algorithm converge without problems? Use full cross-validation. –N-way Toolbox now has a routine for this – can be slow! Look at loadings and residuals. Use chemical knowledge.

3 Preprocessing: centering (1) We are often interested in the differences between objects, not in their absolute values. –building calibration models: differences between samples Mean-centering removes offsets from the data –removes constant background effects –can help to linearize data, i.e.

4 Preprocessing: centering (2) When performing a calibration, it is most common to remove the mean value from each column: X object variable Two-way X primary variable secondary variable object x jk Three-way

5 Preprocessing: scaling (1) Sometimes we want to analyse variables measured in different units –chemical engineering: temperatures, pressures, flow rates –QSAR: ionization constants, Hammett constants, dipole moments These variables should be scaled in order to give variables an equal chance to appear in the model.

6 Preprocessing: scaling (2) For two-way arrays (object  variables), it is common to divide by the standard deviation after mean- centering the data (‘autoscaling’): X object variable Two-way X primary variable secondary variable object x jk Three-way Autoscaling can destroy multilinear structure!

7 Preprocessing: scaling (3) process variable time object X XjXj Slab scaling maintains the multilinear structure! process variable 1 process variable 2 object X XjXj XkXk Double slab scaling may also be useful - ITERATIVE

8 Tucker models Tucker1: X = AG + E –Tucker1 = PCA Tucker2: X = G(B  A) T + E –G (I  R 2 R 3 ) –very rarely used Tucker3: X = AG(C  B) T + E

9 PARAFAC2 time shift wavelength (J) time (K) object (I) In PARAFAC2, only the matrix product X i X i T (J  J) is modelled. It works if the correlation structures in the objects are the same. time shift

10 Missing data Expectation-maximization (EM) is a technique for estimating models (PARAFAC, Tucker, PLS, PCA etc.) when some of the data is missing: X = [X* X # ] known missing 0. Initialize X # 1. Estimate model, (maximization) 3. Repeat until convergence 2. Replace missing values with model values (expectation)

11 Muitoobrigado parasua atenção!

1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.

Similar presentations

Presentation on theme: "1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.

Similar presentations

Presentation on theme: "1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP."— Presentation transcript:

Similar presentations

About project

Feedback