Presentation is loading. Please wait.

Presentation is loading. Please wait.

In the Name of God. Morteza Bahram Department of Chemistry, Faculty of Science, Urmia University, Urmia, Iran

Similar presentations


Presentation on theme: "In the Name of God. Morteza Bahram Department of Chemistry, Faculty of Science, Urmia University, Urmia, Iran"— Presentation transcript:

1 In the Name of God

2 Morteza Bahram Department of Chemistry, Faculty of Science, Urmia University, Urmia, Iran m.bahram@urmia.ac.ir morteza.bahram@gmail.comm.bahram@urmia.ac.irmorteza.bahram@gmail.com دانشگاه اروميه Modeling Multi-Way Data with Linearly Dependent Loadings Modeling Multi-Way Data with Linearly Dependent LoadingsPARALIND

3 1. Introduction Many methods have been proposed for multivariate curve resolution and more generally for factor or component modeling of (multi-way) data, 1)Tucker 2)PARAFAC 3)Positive matrix factorization (PMF) 4)MCR-ALS 5)….

4 independent effects in two modes, yet nonetheless be linearly dependent in a third mode With three-way data, it becomes possible for patterns generated by the underlying sources of variation to have independent effects in two modes, yet nonetheless be linearly dependent in a third mode. When such linear dependencies exist in the latent factor structure, the most appropriate PARAFAC solution would show the same dependencies in the recovered factors. This solution could be called rank deficient in the sense that the component matrices for one – or even several – modes would have less than full column rank. However, the obtained PARAFAC solution will never have this property because noise causes the estimated loadings for collinear factors to become linearly independent (though usually they are still quite correlated).

5 Kiers and Smilde rigorously proved that the uniqueness of PARAFAC does not hold in cases with collinear factors. For example, linear dependences could arise when two or more fluorophores at fixed ratios are present throughout a series of experiments. Linear dependences also could occur in spectra modes because of certain types of fluorescence energy transfer from one type of fluorophore to another one

6 As stated by Bro if a three-mode array is modeled by uninformed PARAFAC and if two factors have collinear profiles in only one mode, the two factors cannot be uniquely determined in other two modes; if two factors have collinear profiles in two modes, the two factors will become undistinguishable and will collapse to a single factor.

7 Kruskal gives even less restricted conditions for uniqueness. He uses the k-rank of the loading matrices, which is a term Introduced by Harshman & Lundy (1984). If any combination of k A columns of A have full column-rank, and this does not hold for k A+1, then the k-rank of A is kA. The k-rank is thus related, but not equal, to the rank of the matrix, as the k-rank can never exceed the rank. Kruskal proves that if k A +k B +k C ≥ 2F+2 Kruskal condition guarantees the uniqueness and it is a sufficient not necessary condition for uniqueness.

8 A

9 1)Fluorescence excitation-emission matrices (EEMs) with correlated concentration of component. 2) pH – Spectrophotometric data in different concentrations 3) Flow Injection analysis Data 4) GC-MS data with linear dependency 5) Standard addition three–way data 6) etc. Which data are subjected to be analyzed by PARALIND?

10 HAA-HBB-HCC- HAA-HBB-HCC- A A Sample mode pH profiles

11 = C

12 HAA-HBB-HCC- A Sample mode = HAHBHC 110000001100 000011 H A _

13

14 Theoretical background Assume that a three-way data array X (I × J × K) is given for which an S-component PARAFAC model holds. Hence, k = 1... K. (1) (I × S) X x+1 x+2 x+3 x+4 x+5 C2 C4 I S C3 Rank=2 K,rank= 1

15 PARALIND; WHEN? (!!) The presence of negative Core Consistency associated with a perfect PARAFAC model would imply the presence of very special linear dependences in EEMs, which would be used as an ‘‘alarm’’ for the investigators to interpret the data more carefully when dealing with complicated environmental EEMs in the absence of a priori knowledge.

16 Solving matrix effect in three-way data using parallel profiles with linear dependencies 1

17 Introduction When a multivariate calibration model is used it is usually required that there are no new constituent(s) in the samples being analyzed. If there are new constituents, a recalibration including this new constituent will be necessary in order to be able to predict accurately, but this will be possible only if the interference(s) can be identified. Several methods for doing so have been developed; most notably generalized rank annihilation methods and parallel factor analysis (PARAFAC). In case of multi-way data, it is possible to handle unknown interferences as part of the calibration.

18 Chemical analysis can be further complicated by matrix effects. When the sensitivity of the response depends on the matrix composition, quantitative predictions based on pure standards may be affected by differences in the sensitivity of the response of the analyte in the presence and in the absence of chemical matrix of the sample.

19 The standard addition method can be used to compensate for such matrix effects. Standard addition can compensate for non-spectral interferences which enhance or depress the analytical signal of the analyte concentration.

20 As stated above, certain second-order calibration methods are able to resolve and recover the pure analyte response even in the presence of new interferences. In these cases pure analyte standards are commonly used for quantifying unknown samples even though matrix effects may degrade the quality of the resulting predictions.

21 The main problem using a curve-resolution method such as PARAFAC is that the model will not reflect what is known about the data. For example, it is a fact that the concentrations of the unknown interferences will be constant in all the samples that are varying only by different amounts of added analyte. Recently several methods were presented based on combining the second-order advantage and standard addition. 1)MCR-ALS 2) PARAFAC etc.

22 Due to the properties of the PARAFAC algorithm, however, each estimated component will typically have different estimated scores even though they should theoretically be identical. Another related problem is that the spectral loadings will be mathematically unique due to noise in the data even though they are in fact unidentified.

23 Fitting a PARAFAC model under such circumstances will not provide a unique solution for factors two and three, because they are dependent in the first mode. As the first mode loading matrix has a k-rank of one, the uniqueness of the model is not guaranteed by the Kruskal conditions. Another problem is that the linear dependency intrinsic to the physical model is not actively enforced if PARAFAC is used. Noise may therefore lead to actual PARAFAC models, which are not rank-deficient as they should be. The factor matrices that should physically be rank-deficient will obtain full rank by fitting the noise part of the data.

24 By introducing a new matrix, H, which is called a dependency matrix (from a PARALIND perspective) or an interaction matrix (from a Tucker perspective), the intrinsic rank-deficiency can be explicitly incorporated into the model in a concise and parsimonious way. If the rank of à is R (≤S) then it holds that may be expressed ………..Paratuck2, Restricted Tucker 3, ……. The rank-deficient may be written

25 where A is an I × R matrix and H is and R × S matrix. If there are e.g. four different components in the above example then S = 4. Assuming that the first component corresponds to the analyte, then the three last columns in must be identical. This can be achieved by defining A= [a1 a2] and It directly follows that R=rank = 2, S= number of components = 4

26 X (I×JK) = A (C©B) T X (I×JK) =Ã (C©B) T PARAFAC PARALIND X (I×JK) =AH (C©B) T Or simply

27 “In some exploratory applications, the dependency matrix H need not even be predefined. This matrix, which defines the pattern and strength of the interactions, may also be estimated from the data if no prior knowledge is available. The approach would then be more similar to the PARATUCK2.

28 3. Data and models 3.1. Simulated data Several different EEM fluorescence samples were simulated. Each sample contained 3 chemical species of which one was considered the analyte of interest. For every sample, five successive additions of the analyte were done and a 6 (addition mode) × 91 (emission) × 21 (excitation) array for each sample obtained.

29 X 6 91 21

30 For 3 components simulated data the results of PARAFAC and PARALIND was comparable.

31 For each sample a 5×13×442 three-way array was obtained. Salicylate determination in plasma using standard addition method For each three-way array three to four components was indicated by using singular value decomposition for each slab of excitation × emission matrix. For e.g. a three-component model the PARALIND interaction matrix was defined as

32 PARAFAC PARALIND

33 PARAFAC PARALIND

34 Recovery (%)Predicted Concentration a R 2 c Standard addition equation b Amount added a 97.8 5.870.9957y = 1.5498‍C + 9.10126.0 107.5 6.450.9967y = 0.0942C + 0.60866.0 101.6 7.620.9553y = 0.7173C + 5.47227.5 105.3 7.90.9719y = 2.2663C + 18.0267.5 98.9 8.90.9978y = 2.1851C + 19.4579.0 100.2 12.020.9994y = 1.069C + 12.82512.0 103.5 12.420.9998y = 6.7145C + 83.41812.0 104.3 15.650.9993y = 0.3589C + 5.623815.0 100.5 15.071.000y = 3.2058C + 48.32315.0 92.3 13.840.9837y = 2.4244C + 33.56315.0 125.3 1.880.9999y = 1.2448C + 2.34311.5 104.6 25.100.9896y = 0.0693C + 1.739624.0 102.8 24.680.9659y = 0.3018C + 7.457224.0 102.7 24.640.9972y = 1.4908C + 36.74124.0 101.9 27.50.9958y = 1.2158C + 33.47127.0 102.2 27.60.9984y = 0.0152C + 0.419827.0 107.3 3.220.9429y = 8.5343C + 27.5083.0 103.3 3.100.9798y = 0.8407C + 2.60793.0 109.0 3.270.9713y = 1.0516C+ 3.43823.0 108.9 4.90.9992y = 0.6065C + 3.00584.5 104.4 4.70.9882y = 1.534C + 7.30364.5 104.0 Mean recovery 3.5 RSE (%) Results obtained for PARALIND modeling for analysis of salicylate in different plasma samples

35 The results shown for three components indicate that similar results are obtained for PARAFAC and PARALIND with respect to predictions. In order to test a four-component model, a single experiment was modeled with both PARAFAC and PARALIND. In each case, the model was refitted leaving out one sample at a time in order to monitor how stable the model would be towards changes in the data.

36

37 Sample Loading for PARALIND PARAFAC

38 As shown in Fig., the PARALIND model is very stable and provides spectral estimates that are consistent across samples as well as consistent with the overall model. Hence, PARAFAC is not able to predict the analyte concentration and this points to the main advantage of using PARALIND for second order standard addition. Even when possibly minor components are included, the model results remain stable. PARAFAC on the other hand, fails completely to model the analyte spectrum because the analyte spectrum becomes mixed up with one of the interference spectra.

39 2) 2) Comparison of PARAFAC and PARALIND in modeling three-way fluorescence data array with special linear dependences in three modes: a case study in 2-naphthol The EEMs of 2-naphthol with linear dependencies in three modes are very different than any reported EEMs in the literature. J. Chemometrics (2010) Hao Chen, Binghui Zheng, Yonghui Song

40

41 It was concluded in this paper that whether a proper fit would be obtained depends on how to properly put constraint in the profile matrices (B and C) in PARALIND. When dealing with complicated environmental samples without a priori knowledge of the spectra characteristics of the underlying factors, PARAFAC rather than PARALIND would be employed by the investigators. The presence of very overlapping spectra as well as fairly good fit (e.g. small residuals) despite negative CC may function as an ‘‘alarm’’ that linear dependences in some modes due to complex physical/chemical processes are present, and great care must be taken in interpreting the data.

42 However, the concentration profiles became unique and chemically meaningful. Compared with uninformed PARAFAC, PARALIND therefore improves the fit on recovery of concentrations of collinear factors in this example. There has been increasing concern about linear dependencies in three-mode data, for instance sample-pH- absorbance data and sample-kinetic-spectra data. constrained form of PARAFAC PARALIND is a constrained form of PARAFAC, and it can be implemented by means of imposing proper constraints in PARAFAC codes.

43 3

44 1)PARAFAC 2)PARALIND 3)MCR-ALS 4)PLS-RB Were compared MODEL 1

45 This report discusses a modified second-order standard addition method, in which the test data matrix is subtracted from the standard addition matrices, and quantization proceeds via the classical external calibration procedure. It is shown that this novel data processing method allows one to apply not only parallel factor analysis (PARAFAC) and multivariate curve resolution alternating least-squares (MCR-ALS),

46 MODEL 2

47 V.A. Lozano et al. / Analytica Chimica Acta 651 (2009) 165–172

48 For MCR-ALS results; inspection of this Fig. reveals a bias in the complete results using model 1, with a significant improvement on employment of model 2 (in fact, the small remaining bias is comparable to the uncertainty in nominal concentrations, i.e., 0.01 units). The origin of the bias in the former case is unclear, but may be related to the strong correlations when mode 1 is used.

49 Experimental system 1 The determination of salicylate in serum requires standard addition, due to changes in the analyte spectrum by interactions with the serum background. Experimental system 2 The determination of fluoroquinolone antibiotics in serum, such as danofloxacin, requires standard addition due to changes in the analyte spectrum by interactions with the serum background Both experimental data were estimated to have 3 components For Experimental system 1 RMSE values of PARAFAC, PARALIND and MCR-ALS are comparable For Experimental system 1 RMSE values of PARAFAC, PARALIND and MCR-ALS are comparable ? ?

50 AlgorithmPARAFAC Model 2 PARALINDMCR-ALS Model2 RMSE10 30 For Experimental system 2 Specific prediction results for the set of spiked test samples In this case, where lower sensitivity towards the analyte is attained, and heavy spectral overlapping occurs in both data dimensions, the RMSE is rather high in comparison with the mean analyte concentration across the set of samples. As with the previous experimental system, the prediction results obtained from PARALIND were identical to those corresponding to PARAFAC model 2. When applying MCR-ALS, the predictions were clearly worse, indicating that the combination of low analyte signal and spectral overlapping have a stronger effect on this algorithm than on PARAFAC decomposition.

51

52 R. Bro, R.A. Harshman, N.D. Sidiropoulos, M.E. Lundy, J. Chemom. 23 (2009) 324–340.

53 Want to know more and downlod some mfiles Look at Rasmus Bro’s website

54 Thanks for your attentions Any question


Download ppt "In the Name of God. Morteza Bahram Department of Chemistry, Faculty of Science, Urmia University, Urmia, Iran"

Similar presentations


Ads by Google