In the Name of God. Morteza Bahram Department of Chemistry, Faculty of Science, Urmia University, Urmia, Iran

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Advertisements

pH Emission Spectrum Emission(3 λ) λ1 λ2 λ3 A λ λ1λ2λ3λ1λ2λ3 A Ex 1 Emission(3 λ) λ1λ2λ3λ1λ2λ3 A Ex 2 Emission(3 λ) λ1λ2λ3λ1λ2λ3 A Ex 3 λ1λ2λ3λ1λ2λ3.
CHEMICAL AND PHASE EQUILIBRIUM (1)
KINETICS.
Fitting the PARAFAC model Giorgio Tomasi Chemometrics group, LMT,MLI, KVL Frederiksberg. Denmark
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
On the alternative approaches to ITRF formulation. A theoretical comparison. Department of Geodesy and Surveying Aristotle University of Thessaloniki Athanasios.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Introduction to Finite Elements
ASYMPTOTIC PROPERTIES OF ESTIMATORS: PLIMS AND CONSISTENCY
Control of Multiple-Input, Multiple-Output Processes
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Using process knowledge to identify uncontrolled variables and control variables as inputs for Process Improvement 1.
Active Calibration of Cameras: Theory and Implementation Anup Basu Sung Huh CPSC 643 Individual Presentation II March 4 th,
PARAFAC and Fluorescence Åsmund Rinnan Royal Veterinary and Agricultural University.
Basic Questions Regarding All Analytical & Instrumental Methods (p 17-18) What accuracy and precision are required? How much sample do I have available,
By: S.M. Sajjadi Islamic Azad University, Parsian Branch, Parsian,Iran.
Curve-Fitting Regression
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
1 2. The PARAFAC model Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
Atomic Absorption Spectroscopy
CHEM-3245 Quantitative Analysis
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
CALIBRATION METHODS.
Chemometrics Method comparison
1 Improved Subjective Weighting Function ANSI C63.19 Working Group Submitted by Stephen Julstrom for October 2, 2007.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Siddarth Chandrasekaran “Advanced Spectroscopy in Chemistry” “Advanced Spectroscopy in Chemistry” University of Leipzig 18/12/2009 Module: Spectroscopy.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
General Characteristics of Negative-Feedback Amplifiers
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
CSDA Conference, Limassol, 2005 University of Medicine and Pharmacy “Gr. T. Popa” Iasi Department of Mathematics and Informatics Gabriel Dimitriu University.
Copyright © 2001, S. K. Mitra Digital Filter Structures The convolution sum description of an LTI discrete-time system be used, can in principle, to implement.
Quality Control Lecture 5
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
+ Simulation Design. + Types event-advance and unit-time advance. Both these designs are event-based but utilize different ways of advancing the time.
Progress in identification of damping: Energy-based method with incomplete and noisy data Marco Prandina University of Liverpool.
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Confirmatory Factor Analysis Psych 818 DeShon. Construct Validity: MTMM ● Assessed via convergent and divergent evidence ● Convergent – Measures of the.
Quality Assurance How do you know your results are correct? How confident are you?
Estimation of Number of PARAFAC Components
General Characteristics of Negative-Feedback Amplifiers
CALIBRATION METHODS. For many analytical techniques, we need to evaluate the response of the unknown sample against the responses of a set of standards.
Uncertainty Management in Rule-based Expert Systems
Advanced Analytical Chemistry – CHM 6157® Y. CAIFlorida International University Updated on 9/26/2006Chapter 3ICPMS Interference equations Isobaric.
THREE-WAY COMPONENT MODELS pages By: Maryam Khoshkam 1.
Validation Defination Establishing documentary evidence which provides a high degree of assurance that specification process will consistently produce.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
Equilibrium systems Chromatography systems Number of PCs original Mean centered Number of PCs original Mean centered
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
1 MODELING MATTER AT NANOSCALES 6. The theory of molecular orbitals for the description of nanosystems (part II) Perturbational methods for dealing.
1 4. Model constraints Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP
Ultra-high dimensional feature selection Yun Li
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
MBF1413 | Quantitative Methods Prepared by Dr Khairul Anuar 8: Time Series Analysis & Forecasting – Part 1
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Self-Modeling Curve Resolution and Constraints Hamid Abdollahi Department of Chemistry, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan,
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
이 장 우. 1. Introduction  HPLC-MS/MS methodology achieved its preferred status -Highly selective and effectively eliminated interference -Without.
15th Iranian Workshops on Chemometrics, IASBS, Zanjan, May 2017
An Introduction to Correlational Research
Presentation transcript:

In the Name of God

Morteza Bahram Department of Chemistry, Faculty of Science, Urmia University, Urmia, Iran دانشگاه اروميه Modeling Multi-Way Data with Linearly Dependent Loadings Modeling Multi-Way Data with Linearly Dependent LoadingsPARALIND

1. Introduction Many methods have been proposed for multivariate curve resolution and more generally for factor or component modeling of (multi-way) data, 1)Tucker 2)PARAFAC 3)Positive matrix factorization (PMF) 4)MCR-ALS 5)….

independent effects in two modes, yet nonetheless be linearly dependent in a third mode With three-way data, it becomes possible for patterns generated by the underlying sources of variation to have independent effects in two modes, yet nonetheless be linearly dependent in a third mode. When such linear dependencies exist in the latent factor structure, the most appropriate PARAFAC solution would show the same dependencies in the recovered factors. This solution could be called rank deficient in the sense that the component matrices for one – or even several – modes would have less than full column rank. However, the obtained PARAFAC solution will never have this property because noise causes the estimated loadings for collinear factors to become linearly independent (though usually they are still quite correlated).

Kiers and Smilde rigorously proved that the uniqueness of PARAFAC does not hold in cases with collinear factors. For example, linear dependences could arise when two or more fluorophores at fixed ratios are present throughout a series of experiments. Linear dependences also could occur in spectra modes because of certain types of fluorescence energy transfer from one type of fluorophore to another one

As stated by Bro if a three-mode array is modeled by uninformed PARAFAC and if two factors have collinear profiles in only one mode, the two factors cannot be uniquely determined in other two modes; if two factors have collinear profiles in two modes, the two factors will become undistinguishable and will collapse to a single factor.

Kruskal gives even less restricted conditions for uniqueness. He uses the k-rank of the loading matrices, which is a term Introduced by Harshman & Lundy (1984). If any combination of k A columns of A have full column-rank, and this does not hold for k A+1, then the k-rank of A is kA. The k-rank is thus related, but not equal, to the rank of the matrix, as the k-rank can never exceed the rank. Kruskal proves that if k A +k B +k C ≥ 2F+2 Kruskal condition guarantees the uniqueness and it is a sufficient not necessary condition for uniqueness.

A

1)Fluorescence excitation-emission matrices (EEMs) with correlated concentration of component. 2) pH – Spectrophotometric data in different concentrations 3) Flow Injection analysis Data 4) GC-MS data with linear dependency 5) Standard addition three–way data 6) etc. Which data are subjected to be analyzed by PARALIND?

HAA-HBB-HCC- HAA-HBB-HCC- A A Sample mode pH profiles

= C

HAA-HBB-HCC- A Sample mode = HAHBHC H A _

Theoretical background Assume that a three-way data array X (I × J × K) is given for which an S-component PARAFAC model holds. Hence, k = 1... K. (1) (I × S) X x+1 x+2 x+3 x+4 x+5 C2 C4 I S C3 Rank=2 K,rank= 1

PARALIND; WHEN? (!!) The presence of negative Core Consistency associated with a perfect PARAFAC model would imply the presence of very special linear dependences in EEMs, which would be used as an ‘‘alarm’’ for the investigators to interpret the data more carefully when dealing with complicated environmental EEMs in the absence of a priori knowledge.

Solving matrix effect in three-way data using parallel profiles with linear dependencies 1

Introduction When a multivariate calibration model is used it is usually required that there are no new constituent(s) in the samples being analyzed. If there are new constituents, a recalibration including this new constituent will be necessary in order to be able to predict accurately, but this will be possible only if the interference(s) can be identified. Several methods for doing so have been developed; most notably generalized rank annihilation methods and parallel factor analysis (PARAFAC). In case of multi-way data, it is possible to handle unknown interferences as part of the calibration.

Chemical analysis can be further complicated by matrix effects. When the sensitivity of the response depends on the matrix composition, quantitative predictions based on pure standards may be affected by differences in the sensitivity of the response of the analyte in the presence and in the absence of chemical matrix of the sample.

The standard addition method can be used to compensate for such matrix effects. Standard addition can compensate for non-spectral interferences which enhance or depress the analytical signal of the analyte concentration.

As stated above, certain second-order calibration methods are able to resolve and recover the pure analyte response even in the presence of new interferences. In these cases pure analyte standards are commonly used for quantifying unknown samples even though matrix effects may degrade the quality of the resulting predictions.

The main problem using a curve-resolution method such as PARAFAC is that the model will not reflect what is known about the data. For example, it is a fact that the concentrations of the unknown interferences will be constant in all the samples that are varying only by different amounts of added analyte. Recently several methods were presented based on combining the second-order advantage and standard addition. 1)MCR-ALS 2) PARAFAC etc.

Due to the properties of the PARAFAC algorithm, however, each estimated component will typically have different estimated scores even though they should theoretically be identical. Another related problem is that the spectral loadings will be mathematically unique due to noise in the data even though they are in fact unidentified.

Fitting a PARAFAC model under such circumstances will not provide a unique solution for factors two and three, because they are dependent in the first mode. As the first mode loading matrix has a k-rank of one, the uniqueness of the model is not guaranteed by the Kruskal conditions. Another problem is that the linear dependency intrinsic to the physical model is not actively enforced if PARAFAC is used. Noise may therefore lead to actual PARAFAC models, which are not rank-deficient as they should be. The factor matrices that should physically be rank-deficient will obtain full rank by fitting the noise part of the data.

By introducing a new matrix, H, which is called a dependency matrix (from a PARALIND perspective) or an interaction matrix (from a Tucker perspective), the intrinsic rank-deficiency can be explicitly incorporated into the model in a concise and parsimonious way. If the rank of à is R (≤S) then it holds that may be expressed ………..Paratuck2, Restricted Tucker 3, ……. The rank-deficient may be written

where A is an I × R matrix and H is and R × S matrix. If there are e.g. four different components in the above example then S = 4. Assuming that the first component corresponds to the analyte, then the three last columns in must be identical. This can be achieved by defining A= [a1 a2] and It directly follows that R=rank = 2, S= number of components = 4

X (I×JK) = A (C©B) T X (I×JK) =Ã (C©B) T PARAFAC PARALIND X (I×JK) =AH (C©B) T Or simply

“In some exploratory applications, the dependency matrix H need not even be predefined. This matrix, which defines the pattern and strength of the interactions, may also be estimated from the data if no prior knowledge is available. The approach would then be more similar to the PARATUCK2.

3. Data and models 3.1. Simulated data Several different EEM fluorescence samples were simulated. Each sample contained 3 chemical species of which one was considered the analyte of interest. For every sample, five successive additions of the analyte were done and a 6 (addition mode) × 91 (emission) × 21 (excitation) array for each sample obtained.

X

For 3 components simulated data the results of PARAFAC and PARALIND was comparable.

For each sample a 5×13×442 three-way array was obtained. Salicylate determination in plasma using standard addition method For each three-way array three to four components was indicated by using singular value decomposition for each slab of excitation × emission matrix. For e.g. a three-component model the PARALIND interaction matrix was defined as

PARAFAC PARALIND

PARAFAC PARALIND

Recovery (%)Predicted Concentration a R 2 c Standard addition equation b Amount added a y = ‍C y = C y = C y = C y = C y = 1.069C y = C y = C y = C y = C y = C y = C y = C y = C y = C y = C y = C y = C y = C y = C y = 1.534C Mean recovery 3.5 RSE (%) Results obtained for PARALIND modeling for analysis of salicylate in different plasma samples

The results shown for three components indicate that similar results are obtained for PARAFAC and PARALIND with respect to predictions. In order to test a four-component model, a single experiment was modeled with both PARAFAC and PARALIND. In each case, the model was refitted leaving out one sample at a time in order to monitor how stable the model would be towards changes in the data.

Sample Loading for PARALIND PARAFAC

As shown in Fig., the PARALIND model is very stable and provides spectral estimates that are consistent across samples as well as consistent with the overall model. Hence, PARAFAC is not able to predict the analyte concentration and this points to the main advantage of using PARALIND for second order standard addition. Even when possibly minor components are included, the model results remain stable. PARAFAC on the other hand, fails completely to model the analyte spectrum because the analyte spectrum becomes mixed up with one of the interference spectra.

2) 2) Comparison of PARAFAC and PARALIND in modeling three-way fluorescence data array with special linear dependences in three modes: a case study in 2-naphthol The EEMs of 2-naphthol with linear dependencies in three modes are very different than any reported EEMs in the literature. J. Chemometrics (2010) Hao Chen, Binghui Zheng, Yonghui Song

It was concluded in this paper that whether a proper fit would be obtained depends on how to properly put constraint in the profile matrices (B and C) in PARALIND. When dealing with complicated environmental samples without a priori knowledge of the spectra characteristics of the underlying factors, PARAFAC rather than PARALIND would be employed by the investigators. The presence of very overlapping spectra as well as fairly good fit (e.g. small residuals) despite negative CC may function as an ‘‘alarm’’ that linear dependences in some modes due to complex physical/chemical processes are present, and great care must be taken in interpreting the data.

However, the concentration profiles became unique and chemically meaningful. Compared with uninformed PARAFAC, PARALIND therefore improves the fit on recovery of concentrations of collinear factors in this example. There has been increasing concern about linear dependencies in three-mode data, for instance sample-pH- absorbance data and sample-kinetic-spectra data. constrained form of PARAFAC PARALIND is a constrained form of PARAFAC, and it can be implemented by means of imposing proper constraints in PARAFAC codes.

3

1)PARAFAC 2)PARALIND 3)MCR-ALS 4)PLS-RB Were compared MODEL 1

This report discusses a modified second-order standard addition method, in which the test data matrix is subtracted from the standard addition matrices, and quantization proceeds via the classical external calibration procedure. It is shown that this novel data processing method allows one to apply not only parallel factor analysis (PARAFAC) and multivariate curve resolution alternating least-squares (MCR-ALS),

MODEL 2

V.A. Lozano et al. / Analytica Chimica Acta 651 (2009) 165–172

For MCR-ALS results; inspection of this Fig. reveals a bias in the complete results using model 1, with a significant improvement on employment of model 2 (in fact, the small remaining bias is comparable to the uncertainty in nominal concentrations, i.e., 0.01 units). The origin of the bias in the former case is unclear, but may be related to the strong correlations when mode 1 is used.

Experimental system 1 The determination of salicylate in serum requires standard addition, due to changes in the analyte spectrum by interactions with the serum background. Experimental system 2 The determination of fluoroquinolone antibiotics in serum, such as danofloxacin, requires standard addition due to changes in the analyte spectrum by interactions with the serum background Both experimental data were estimated to have 3 components For Experimental system 1 RMSE values of PARAFAC, PARALIND and MCR-ALS are comparable For Experimental system 1 RMSE values of PARAFAC, PARALIND and MCR-ALS are comparable ? ?

AlgorithmPARAFAC Model 2 PARALINDMCR-ALS Model2 RMSE10 30 For Experimental system 2 Specific prediction results for the set of spiked test samples In this case, where lower sensitivity towards the analyte is attained, and heavy spectral overlapping occurs in both data dimensions, the RMSE is rather high in comparison with the mean analyte concentration across the set of samples. As with the previous experimental system, the prediction results obtained from PARALIND were identical to those corresponding to PARAFAC model 2. When applying MCR-ALS, the predictions were clearly worse, indicating that the combination of low analyte signal and spectral overlapping have a stronger effect on this algorithm than on PARAFAC decomposition.

R. Bro, R.A. Harshman, N.D. Sidiropoulos, M.E. Lundy, J. Chemom. 23 (2009) 324–340.

Want to know more and downlod some mfiles Look at Rasmus Bro’s website

Thanks for your attentions Any question