GRAPHICAL REPRESENTATIONS OF A DATA MATRIX
SYSTEM CHARCTERISATION Measure Numbers
CHARACTERISATION UV,IR,NMR, MS,GC,GC-MS ..................... Sample Instrument + Computer UV,IR,NMR, MS,GC,GC-MS Instrumental Profiles Data matrix ..................... .................... . ....................
Numbers Measure Latent Projections Information (Graphics) Modelling
X Data matrix x’k xi Object vectors Variable vectors (row vectors) (column vectors)
DATA MATRIX / DATA TABLE i j k 1 5 l 3 1 m 8 6 Object/Sample Variable
i j k [ 1 5 ] l [ 3 1 ] m [ 8 6 ] Object vectors Object/Sample Variable Object vectors
i j k 1 5 l 3 1 m 8 6 Object/Sample Variable Variable vectors
i j k 1 5 l 3 1 m 8 6 i j k [ 1 5 ] l [ 3 1 ] m [ 8 6 ] Object vectors Object/Sample Variable i j k [ 1 5 ] l [ 3 1 ] m [ 8 6 ] Object/Sample Variable Object vectors i j k 1 5 l 3 1 m 8 6 Object/Sample Variable Variable vectors
Subtract variable mean, xi=4, xj=4 Object Variable i j k 1 5 l 3 1 m 8 6 Original data matrix Subtract variable mean, xi=4, xj=4 Object Variable i j k -3 1 l -1 -3 m 4 2 Column-centred data matrix
Shows relationships between objects (angle kl measures similarity). VARIABLE SPACE variable i variable j x’m x’k kl i j k -3 1 l -1 -3 m 4 2 x’l Shows relationships between objects (angle kl measures similarity). cos kl = x’k xl/|| x’k || || xl ||
OBJECT SPACE xi i j k -3 1 l -1 -3 m 4 2 xj object k object m object l xi xj ij i j k -3 1 l -1 -3 m 4 2 Shows relationships (correlation/covariance) between variables (correlation structure) The angle ij represents the correlation between variable i and j. cos ij = x’i xj/|| x’i || || xj ||
Object space shows common variation in a suite of variables! common variation underlying factor!
VARIABLE SPACE AND OBJECT SPACE CONTAIN TOGETHER ALL AVAILABLE INFORMATION IN A DATA MATRIX
WHAT TO DO IF THE NUMBER OF VARIABLES IS GREATER THAN 2-3? PROJECT ONTO LATENT VARIABLES (LV)!
PROJECTING ONTO LATENT VARIABLES xk LV e1 e2 wa tka Projection (in variable space) of object vector xk (object k) on latent variable wa : tka = x’kwa , k=1,2,..,N (score)
LATENT VARIABLE PROJECTIONS Object space pa’ = ta’X/ta’ta Variable Correlation Variable space ta = Xwa Object Correlation v2 v1 p1 o1 o3 LVV Object vectors t3 t2 t1 X Data matrix Variable vectors LV o2 Score plot axes (w1,w2…) Loading plots Axes (t1/||t1||,t2/||t2||…) BIPLOT
Successive orthogonal projections (SOP) i) Select wa ii) Project objects (sample, experiment) on wa: ta = Xawa iii) Project variable vectors on t: p’a = t’aXa/t’ata iv) Remove the latent-variable a from preditor space, i.r. substitute Xa with xa - tap’a. Repeat i) - iv) for a= 1,2,..A, where A is the dimension of the model
METHOD OVERVIEW PCA/SVD wa = pa/||pa|| PLS wa = u’aXa/|| u’aXa || MVP wa = ei MOP wa = xk/||xk|| TP wa = bk/||bk||
METHOD OVERVIEW Decomposition Properties/Criteria Principal Components (PCA) Maximum variance Partial Least Squares (PLS) Relevant components Rotated (target) “Real” factors Marker Projections (MOP/MVP) “Real” factors
LATENT PROJECTION IS AN INSTRUMENT TO CREATE ORDER (MODEL) OUT OF CHAOS (DATA)
LATENT VARIABLE MODEL X = UG1/2P’ + E T U orthonormal matrix of score vectors, {ua} G diagonal matrix, ga = t’ata P’ loading matrix BIPLOT (SVD, PLS, orthogonal rotations,...) Scores: UG1/2 Loadings: G1/2P’
PCA/PLS (orthogonal scores) X - X P’ T E = + Centred Data Scores Loadings Residuals Scores - projection of the object vectors (in variable space) (scores - samples) Loadings - projection of the variable vectors (in object space) shows the variables correlation structure
Biplot plot - Scores and loadings in one plot! Visual Interface Score plot - variable space Loading plot - object space Biplot plot - Scores and loadings in one plot!
EXTENDING THE LATENT VARIABLE MODEL - introduce interactions and squared terms in the variables (non-additive model) Horst (1968) Personality: measurements of dimensions Clementi et al. (1988), Kvalheim (1988) - introduce interactions and squared terms in the latent variables McDonal (1967) Nonlinear factor analysis Wold, Kettanch-Wold (1988), Vogt (1988) - introduce new sets of measurements, new data matrices systematic method for induction Kvalheim (1988)