The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S.

The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S

Eigenvectors and Eigenvalues For a symmetric, real matrix, R, an eigenvector v is obtained from: Rv = v is an unknown scalar-the eigenvalue Rv – v  = 0 (R – I  v  = 0 The vector v is orthogonal to all of the row vector of matrix (R- I) Rv = v0v - RI =

0.1 0.2 0.3 0.2 0.4 0.6 A= R=A T A = 0.14 0.28 0.28 0.56 Rv = v (R – I  v  = 0 0.14 0.28 0.28 0.56 1 0 0 1 - v1v2v1v2 0000 = = 0.14 - 0.28 0.28 0.56 - = 0000 v1v2v1v2 0 0.14 0.28 0.28 0.56 - v1v2v1v2

(0.14 – ) (0.56 – ) – (0.28) (0.28) = 0   = 0   = 0   &    0.14 - 0.28 0.28 0.56 - = 0

For    = 0000 0.14 –  0.28 0.28 0.56 –  v 11 v 21 -0.56 0.28 0.28 -0.14 = v 11 v 21 -0.56 v 11 + 0.28v 21 = 0 0.28 v 11 - 0.14 v 21 = 0 v 21 = 2 v 11 Normalized vector v 1 = 0.4472 0.8944 If v 11 = 1 v 21 = 2

0.14 0.28 0.28 0.56 = 0000 v 12 v 22 0.14 v 12 + 0.28 v 22 = 0 0.28 v 12 +0.56 v 22 = 0 v 12 = -2 v 22 If v 22 = 1 v 12 = -2 Normalized vector v 1 = -0.8944 0.4472 For   

0.1 0.2 0.3 0.2 0.4 0.6 A= Rv = v RV = V  V = -0.8944 0.4472 0.8944  0.7 0 0 v 1 v 2 =0 R=A T A = 0.14 0.28 0.28 0.56 More generally, if R (p x p) is symmetric of rank r≤p then R posses r positive eigenvalues and (p-r) zero eigenvalues tr(R) = i = 0.7 + 0.0 =0.7 ∑

Example

Consider 15 sample each contain 3 absorbing components

? Show that in the presence of random noise the number of non-zero eigenvalues is larger than numbers of components

Variance-Covariance Matrix x 11 – mx 1 … x 21 – mx 1 (x n1 – mx 1 ) x 12 – mx 2 x 22 – mx 2 x n2 – mx 2 … … … … … x 1p – mx p x np – mx p … X = Column mean centered matrix X T X = var(x 1 ) …… … … … …… var(x 2 ) var(x p ) covar(x 1 x 2 )covar(x 1 x p ) covar(x 2 x 1 )covar(x 2 x p ) covar(x p x 1 )covar(x p x 2 )

mmcn.m file for mean centering a matrix

? Use anal.m file and mmcn.m file and verify that each eigenvalue of an absorbance data matrix is correlated with variance of data

Singular Value Decomposition SVD of a rectangular matrix X is a method which yield at the same time a diagnal matrix of singular values S and the two matrices of singular vectors U and V such that : X = U S V T U T U = V T V =I r The singular vectors in U and V are identical to eigenvectors of XX T AND X T X, respectively and the singular values are equal to the positive square roots of the corresponding eigenvalues X = U S V T X T = V S U T X X T = U S V T VSU T = US 2 U T (X X T ) U = US 2

= X = U S V T = s 1 u 1 v 1 T + … + s r u r v r T X m n U m n S n n VTVT n n If the rank of matrix X=r then; X m n = U m r S r r VTVT r n

Singular value decomposition with MATLAB

Consider 15 sample containing 2 component with strong spectral overlapping and construct their absorbance data matrix accompany with random noise Ideal data A Noised data nd Reconstructed data rd residual R1 Ideal data A residual R2 - = - = It can be shown that the reconstructed data matrix is closer to ideal data matrix

Anal.m file for constructing the data matrix

Spectral overlapping of two absorbing species

Ideal data matrix A

Noised data matrix, nd, with 0.005 normal distributed random noise

nf.m file for investigating the noise filtering property of svd reconstructed data

? Plot the %relative standard error as a function of number of eigenvectors

x 11 x 12 x 114 x 21 x 214 … … Principal Component Analysis (PCA) x1x1 x2x2

PCA u1u1 u2u2 u 11 u 12 u 114 …

x1x1 x2x2 x 11 x 12 x 114 x 21 x 214 … …

u1u1 u2u2 u 11 u 12 u 114 … u 21 u 22 u 214 …

Principal Components in two Dimensions u 1 = ax 1 + bx 2 u 2 = cx 1 + dx 2 0.10.2 0.20.4 0.30.6 1 2 s1s1 s2s2 s3s3 In principal components model new variables are found which give a clear picture of the variability of the data. This is best achieved by giving the first new variable maximum variance, the second new variable is then selected so as to be uncorrelated with the first one, and so on

The new variables can be uncorrelated if: ac + bd =0 a=1 b=2 c=-1 d=0.5 0.1 0.2 0.3 x 1 = 0.2 0.4 0.6 x 2 = 0.5 1.0 1.5 u 1 = var(u 1 )=0.25 a=2 b=4 c=-2 d=1 1.0 2.0 3.0 u 1 = var(u 1 )=1.0 Orthogonality constraint

Normalizing constraint a 2 + b 2 = 1 c 2 + d 2 = 1 a=1 b=2 c=-1 d=0.5 a=0.4472 b=0.8944 c=-0.8944 d=0.4472 Normalizing a=2 b=4 c=-2 d=1 Normalizing a=0.4472 b=0.8944 c=-0.8944 d=0.4472

Maximum variance constraint u 1 = ax 1 + bx 2  2 u1 = a 2  2 x1 + b 2  2 x2 + 2ab  x1-x2  2 u1 = [ a b ]  2 x1  x1-x2  x1-x2  2 x2 a b =  2 u1  2 x1  x1-x2  x1-x2  2 x2 a b a b

Principal Components in m Dimensions x 11 x 12 … x 1m x 21 x 22 … x 2m x n1 x n2 … x nm ……… X= u 1 = v 11 x 1 + v 12 x 2 + … + v 1m x m u 2 = v 21 x 1 + v 22 x 2 + … + v 2m x m u m = v m1 x 1 + v m2 x 2 + … + v mm x m …………

var(x 1 ) …… …… … …… var(x 2 ) var(x m ) covar(x 1 x 2 )covar(x 1 x m ) covar(x 2 x 1 )covar(x 2 x m ) covar(x m x 1 )covar(x m x 2 ) v 11 v 21 v m1 …var(u 1 )= v 11 v 21 v m1 … CVV  = X n m m m V U n m =

X V =U X n m m m V U n m = Loading vectors Score vectors X V T V = UV T V T V = I X = UV T = S L T X n m s n m m m LTLT = X = USV T S = US

More generally, when one analyzes a data matrix consisting of n objects for which m variables have been determined, m principal components can then be extracted (as long as m<n. PC1 represents the direction in the data containing the largest variation. PC2 is orthogonal to PC1 and represents the direction of the largest residual variation around PC1. PC3 is orthogonal to the first two and represents the direction of the highest residual variation around the plane formed by PC1 and PC2.

PCA.m file

10 mixtures of two components anal.m file

? Perform PCA on data matrix obtained from an evolutionary process, such as kinetic data (kin.m file) and interpret the score vectors.

Classification with PCA The most informative view of a data set, in terms of variance at least, will be given by consideration of the first two PCs. Since the scores matrix contains a value for each sample corresponding to each PC, it is possible to plot these values against one another to produce a low dimensional picture of a high-dimensional data set.

Suppose there are 20 sample from two different class Class I Class II

Class I Class II

Multiple Linear Regression (MLR) y = b 1 x 1 + b 2 x 2 + … + b p x p x 11 … x 21 x n1 x 12 x 22 x n2 … … … … … x 1p x 2p x np … b1b1 b2b2 bpbp … y1y1 y2y2 ynyn … = y = X b = y n 1 X n p b 1 p

If p>n a 1 =  1 c 11 +  2 c 12 +  3 c 13 a 2 =  1 c 21 +  2 c 22 +  3 c 23 There is an infinite number of solution for , which all fit the equation If p=n a 1 =  1 c 11 +  2 c 12 +  3 c 13 a 2 =  1 c 21 +  2 c 22 +  3 c 23 a 3 =  1 c 31 +  2 c 32 +  3 c 33 It gives a unique solution for  provided that the X matrix has ful rank

If p<n y 1 =  1 c 11 +  2 c 12 +  3 c 13 a 2 =  1 c 21 +  2 c 22 +  3 c 23 a 3 =  1 c 31 +  2 c 32 +  3 c 33 a 4 =  1 c 41 +  2 c 42 +  3 c 43 This does not allow an exact solution for , but one can get a solution by minimizing the length of the residual vector e The least squares solution is  = (C T C) -1 C T a

Least Squares in Matrix Equations = y n 1 X n p y = X b y n 1 = x1x1 n 1 x2x2 n 1 xpxp n 1 b1b1 1 1 bpbp 1 1 b2b2 1 1 + + … + For solving this system the Xb-y must be perpendicular to the column space of X 1 p b

Suppose vector Xc is a linear combination of the columns of X : (Xc) T [Xb – y]=0 c [X T Xb –X T y]=0 X T Xb = X T y b = (X T X) -1 X T y The projection of y onto the column space of X is therefore p=Xb = (X (X T X) -1 X T )y

Least Squares Solution

Projection the y vector in column space of X

The error vector

The error vector is perpendicular to all columns of X matrix

MLR with more than one dependent variable = y1y1 n 1 X n p 1 p b1b1 y3y3 n 1 y2y2 n 1 1 p b2b2 1 p b3b3 = Y n m X n p B p m Y= X B B= (X T X) -1 Y

Classical Least Squares (CLS) A= C K = A n m C n p K p m Calibration step K = (C T C) -1 C T A The number of calibration standards should at least be as large as the number of analytes The rank of C must be equal to p Prediction step a T un = c T un Kc un = (KK T ) -1 K a un Number of wavelengths mustbe equal or larger than number of components

Advantages of CLS Full spectral domain is used for estimating each constituent. Using redundant information has an effect equivalent to replicated measurement and signal averaging, hence it improves the precision of the concentration estimates. Disadvantages of CLS The concentration of all the constituents in the calibration set have to be known

Simultaneous determination of two components with CLS

Random design of concentration matrix

Pure component spectra

Absorbance data matrix

Data matrices for mlr.m file

mlr.m file for multiple linear regression

Predicted concentrations Real concentrations

? Use CLS method for determination of one component in binary mixture samples

Inverse Least Squares (ILS) c= A b Calibration step b = (A T A) -1 A T c 1 The number of calibration samples should at least be as large as the number of wavelengths The rank of A must be equal to p Prediction step c T un = a T un b = c1c1 n 1 A n p 1 p b

Advantages of ILS It is not necessary to know all the information on possible constituents, analyte of interest and interferents Disadvantages of ILS The number of calibration samples should at least be as large as the number of wavelengths The method can work in principal when unknown chemical interferents are present. It is important that such interferents are present in calibration samples

Determination of x in the presence of y by ILS method

15 x 9 absorbance data matrix

ILS.m file

ILS calibration

Predicted concentrations Real concentrations

? Does in ILS method the accuracy of final results is dependent to number of wavelength?

Principal Component Regression (PCR) PCR is simply PCA followed by a regression step A= C E = S L AC E = S L = A= C E = (S R) (R -1 L) C = S R CS R = S r = c1c1

A data matrix can be represented by its score matrix A regression of score matrix against one or several dependent variables is possible, provided that scores corresponding to small eigenvalues are omitted This regression gives no matrix inversion problem PCR has the full-spectrum advantages of the CLS method PCR has the ILS advantage of being able to perform the analysis one chemical components at a time while avoiding the ILS wavelength selection problem

Validation How many meaningful principal components should be retained? *Percentage of explained variance If all possible PCs are used in the model 100% of the variance is explained sd2 =sd2 = ∑ i i=1 d ∑ i i=1 p x 100

Percentage of explained variance for determination of number of PCs

Spectra of 20 samples of various amount of 2 components

Pev.m file for percentage of explained variance method

Performing pev.m file on nd absorbance data matrix

? Show the validity of results of Percentage Explained Variance method is dependent to spectral overlapping of individual components

A n p loading n p n n score n or r? PCA

A n p A’ n-1 p PCAc p 1 a Cross-Validation

Creating absorbance data for performing cross-validation method

cross.m file for PCR cross-validation

cross-validation plot

c = S b Calibration and Prediction Steps in PCR = c1c1 n 1 S n r b r 1 b = ( S T S) -1 S T c Calibration Step AxAx m p L p r r m SxSx = Prediction Step S x = A x L c x = S x b

Pcr.m file for calibration and prediction by PCR method

Input data for pcr.m file pcr.m function

Predicted and real values for first component

? Compare the CLS, ILS and PCR methods for prediction in a two components system with strong spectral overlapping

The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S.

Similar presentations

Presentation on theme: "The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S.

Similar presentations

Presentation on theme: "The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S."— Presentation transcript:

Similar presentations

About project

Feedback