Advanced process modelling with multivariate curve resolution Anna de Juan 1,(*) and Romà Tauler Chemometrics group. Universitat de Barcelona. Diagonal, Barcelona. 2. Dept. of Environmental Chemistry. IIQAB-CSIC. Barcelona.
Process. Definition and underlying model. Evolving chemical system monitored by a multivariate signal. Reaction system with a known mechanism (kinetic process) Evolving system with inexistent mechanism (chromatographic elution)
DDADA DBDB =+ DADA DBDB D =+ s A c B s B c A A cBcB sAsA c sBsB Process. Definition and underlying model. D = C STST s B s A c B c A C STST D = CS T + E Bilinear model
Known mechanism Hard-modeling (HM) No mechanism Soft-modeling (SM) Process. Definition and underlying model. = D Time A B C C STST ABCABC Ordered evolving concentration pattern
Process soft-modeling (Multivariate Curve Resolution, MCR)
MCR in process analysis D Time Process raw data = A B C C STST ABCABC Time Concentration Wavelength Absortivities D = CS T Process description MCR Evolution of process contributions (model) Structural information of compounds (identification)
Multivariate Curve Resolution – Alternating Least Squares (MCR-ALS) Determination of the number of components (PCA). Building of initial estimates (C or S T ) (EFA, SIMPLISMA, prior knowledge...) Iterative least squares calculation of C and S T subject to constraints. Check for satisfactory CS T data reproduction. Data exploration Input of external information Optimal and chemically meaningful process description D = CS T + E R. Tauler. Chemom. Intell. Lab. Sys. 30 (1995) 133. A. de Juan and R. Tauler. Anal. Chim. Acta 500 (2003) 195. J. Jaumot et al. Chemom. Intell. Lab. Sys. 76 (2005) 101.
Constraints Definition Any property systematically present in the profiles of the compounds in our data set. Chemical origin Mathematical properties. Application C and S can be constrained differently. The profiles within C and S T can be constrained differently. Reflect the inherent order in a process
Process constraints Non-negativity (C, S) Unimodality (C) Processes evolving in emergence- decay profiles Closure (C) Mass balance Selectivity!!
MCR in process modelling Advantages (low requirements) Bilinear data structure No process model required. No previous identification of process compounds needed. Limitations We model what we measure (non-absorbing species) Each compound should have a distinct concentration profile and spectrum (rank-deficiency).
MCR in process modelling Limitations We model what we measure (non-absorbing species) Each compound should have a distinct concentration profile and spectrum (rank-deficiency). Multiset process analysis Incorporation of hard-modelling information
Advanced process modeling Multiset analysis
Processes and multiset models The same process monitored with different techniques Several processes/batches monitored with the same technique Several processes monitored with several techniques
Multiset arrangements. Advantages. The chemometric reasons Rotational ambiguity decreases/is suppressed. Rank-deficiency problems are solved. Noise effect is minimized The chemical reasons More information introduced in the process modelling. More robustness in the process description. Better characterization of process compounds (multitechnique analysis). More global description of process evolution and of effect of inducing agents. (multiexperiment analysis).
Rank-deficient systems (the concept) Detectable rank < nr. of process contributions = D C STST Rank(D) = min(rank C, rank S T ) Equally shaped concentration profiles A + B C [A] = [B] Rank 2 Equally shaped spectra D L (enantiomers) Spectra D = Spectra L Rank 1 Rank-deficiency can be linked to C or to S T
Rank-deficient systems (the concept) Equally shaped concentration profiles A + B C Rank 2 = D C STST cBcB cAcA [A] o = 1 [B] o = 3 3c A = c B (rank 2) D1D1 = D C STST [A] o = 2 [B] o = 1 cBcB cAcA c A = 2c B (rank 2) D2D2
Rank-deficient systems (the concept) [A] o = 1 [B] o = 3 [A] o = 2 [B] o = 1 3c A = c B = STST cBcB cAcA D1D1 D C cBcB cAcA c A = 2c B D2D2 = STST D1D1 D C cBcB cAcA c A kc B (rank 3) D2D2
Breaking rank-deficiency (multiset data) = C S UV T s A = ks B sBsB sAsA S CD T s A ks B sBsB sAsA D UV D D CD = C STST sBsB sAsA D UV D D CD s A ks B (rank 2)
Multitechnique process analysis
Multitechnique data analysis Only the concentration direction is shared by all experiments. Completely different techniques can be treated together Higher spectral discrimination power among compounds. The augmented response contains complementary information of all techniques (‘superspectrum’). The single matrix of process profiles provides cleaner process profiles and a more robust description of the process. Process profiles are not affected by specific noise patterns of particular techniques. Process description should be valid for all measurements collected. Multiset multi-way
O N Fe O N pH-induced transitions in hemoglobin Spectroscopic monitoring between pH 1.5 and 10.5 Changes in secondary structure UV ( nm), far-UV CD ( nm) Changes in tertiary structure UV, near-UV CD ( nm), fluorescence ( nm) Binding of heme group UV, Soret CD ( nm) Evolution of protein conformations Global process: many events at different structural levels. No mechanism defined. Muñoz, G.; de Juan, A. Anal. Chim. Acta 2007, 595, 198.
pH-induced transitions in hemoglobin (single technique resolution) D1D1 pH D2D2 D3D3 D4D4 D5D5 UVFar-UV CD D1D1 Fluorescence D3D3 Near-UV CD Wavelengths (nm) Soret CD D2D2 Wavelengths (nm) pH ary structure 3 ary structure Heme bindingGlobal
pH-induced transitions in hemoglobin (single technique resolution) TechniqueChemical event Nr. of process contributions pH transition values Explained variance (%) Far-UV CDChanges 2 ary structure Near-UV CDChanges 3 ary structure FluorescenceChanges 3 ary structure3 4.2 / Soret CDHeme binding UV-visibleGlobal process42.8 / 3.9 / Some chemical events are simpler than the global process. Non absorbing species are not modelled. Too similar spectral contributions may not be distinguished. Multitechnique analysis is needed to complete the puzzle.
pH-induced transitions in hemoglobin Global process resolution (multitechnique analysis) D1D1 D2D2 D3D3 D4D4 D5D5 pH UVFar-UV CD D1D D4D D5D5 Fluorescence Near-UV CD D3D3 Wavelengths (nm) Soret CD D2D2 Wavelengths (nm)
pH-induced transitions in hemoglobin Global process resolution Fluorescence Wavelengths (nm) Far-UV CD Wavelengths (nm) UV Wavelengths (nm) Near-UV CD Wavelengths (nm) Soret CD Wavelengths (nm) pH Non-absorbing species are modelled (Soret CD). Similar spectral contributions are distinguished (near-UV CD). C S 1 T (2) * S 3 T (2) S 2 T (2) S 4 T (3) S 5 T (4) * Figures in parentheses are number of resolved species in single technique analysis. Native HbD1D1 OxyHb D2D2
Multiexperiment process analysis
Multiexperiment data analysis Only the spectral direction is shared by all experiments. No batch synchronisation is needed. Process induced by different agents and performed in different conditions can be treated together The single matrix S T provides cleaner pure spectra and a more robust structural characterisation of process compounds. Easier modelling of minor process contributions by using experiments with complementary information. Good experimental design may provide experiments with presence/absence of different species. Multiset multi-way
Protein-drug interaction Dominant at low [ligand:protein] ratio and low [ligand]. Protein + TSPP [Protein-TSPP] complex TSPP aggregate Dominant at high [ligand:protein] ratio and high [ligand]. Multiexperiment analysis of experiments enhancing low and high [protein:ligand] ratios help in the definition of all species involved.
Protein-drug interaction D 1 : protein-ligand complex dominates. D 2 : aggregate dominates Absobance (a.u.) 0 M 7.5 MM Protein concentration D1D Wavelength (nm) Absorbance (a.u.) 0 M 40 M TSPP concentration D2D2
Protein-drug interaction Wavelength (nm) Absorbance (a.u.) STST The aggregate could not be recovered using only D 1 TSPP and the complex are very minor to be correctly recovered only from D 2 The different presence/absence of species in D 1 and D 2 and the decorrelated information in terms of [TSPP:complex:aggregate] helps to a better definition of the pure spectra.
Advanced process modeling (Incorporating hard models)
Process modelling Hard-modeling. The variation of a process is fully described by fitting a specific mathematical model (physicochemical or empirical) to the experimental measurements. Soft-modeling. The variation of a process is described by the bilinear model of the measurements, optimised under chemical and/or mathematical constraints. No explicit mathematical model is used.
Process hard-modeling x 10 4 Wavelengths Absortivities LS (D, C) (S T ) STST Output: C, S and model parameters. Unique solutions The model must describe all the experimental variation Wavelength Absorbance Time Concentration D C Non-linear model Fitting min(D(I-CC + ) C = f(k 1, k 2 ) D = CS T ; D = CC + D
Process Hard modeling (multibatch/multiexperiment) Need of one global model or Knowledge of the link expression among different batch models Batch/ exp. 1 DC STST = Batch/ exp. 2 Batch/ exp. 3 Batch/ exp. n Link among batches model
Soft- modeling (one experiment) x 10 4 Wavelengths Absortivities STST D C Constrained ALS optimisation LS (D,C) S* LS (D,S*) C* min (D –C*S*), Output: C and S. Solutions might be ambiguous. All absorbing contributions in and out of the process are modelled Time Concentration
Soft-modeling (multibatch/multiexperiment) Batch/ exp. 1 DC STST = Batch/ exp. 2 Batch/ exp. 3 Batch/ exp. n Different experiments can be analysed together Experimental conditions, link among batches may be unknown. Link among batches pure spectra
Incorporating hard-modeling in MCR All or some of the concentration profiles can be constrained. All or some of the batches can be constrained.
Hybrid hard- and soft-modeling MCR (HS-MCR) Output: C, S and model parameters. Hard models and soft-modeling constraints act simultaneously. Off-process contributions can be modelled separately. Process model can be recovered in the presence of absorbing interferences x 10 4 Wavelengths Absortivities STST Wavelength Absorbance D C Time Concentration
HS-MCR (multibatch/multiexperiment) Batch/ exp. 1 DC STST = Batch/ exp. 2 Batch/ exp. 3 Batch/ exp. n Link among batches (pure spectra) Global or individual models can be used. Link among different models can be unknown or inexistent. Model-free and model-based experiments can be analysed together.
Myoglobin denaturation Mechanism Steady-state process Native (N) Intermediate (I s ) Denatured (D) Kinetic transient (I t ) Kinetic process Steady-state process UV spectra, pH range N I s ? D Unknown model Kinetic process UV spectra, pH-jump stopped-flow First-order consecutive reactions P. Culberg, P.J. Gemperline, A. de Juan. (submitted)
Hard-modelling (kinetic unfolding, 1 st order reactions) Soft-modelling constraints Myoglobin denaturation = Steady- state unfolding Kinetic unfolding pH time C STST. C pH CtCt D pH time Model-free and model-based experiments can be analyzed together.
Myoglobin denaturation Formation of a kinetic transient was detected and hard-modelled. k 1 = 4.05 s.1 k 2 = 0.62 s -1 Steady-state unfolding was modelled with soft constraints. Steady-state process Native (N) Denatured (D) Kinetic transient (I t ) Kinetic process 1 0 pH time Wavelengths
BDE-209 (flame retardant) Photodegradation of decabromodiphenil ether O Br UV kinetic monitoring in several THF/ water mixtures (10% water, 20% water, 30% water, 40% water) Three replicates per solvent composition. Wavelength (nm) S. Mas, A. de Juan, S. Lacorte, R. Tauler (submitted)
Data arrangement Global model 1 Global model 2 One global kinetic model per solvent composition k3k3 C k2k2 B k1k1 A D Off-process contribution (spectral solvent effects)
Photodegradation of BDE % water C 10% water20% water30% water STST Compositionk 1 (x ) * k 2 (x ) * k 3 (x ) * 90:10 THF-water 2.76 (1)2.60 (2)1.38 (6) 80:20 THF-water (8)1.613 (5)1.362 (4) 70:30 THF-water 2.41 (1)0.99 (4)0.77 (4) 60:40 THF-water (6)1.092 (3)0.68 (2) k3k3 C k2k2 B k1k1 A D Off-process contribution Rate constants
MCR in process modelling. Conclusions Low requirements Bilinear data structure No process model required. No previous identification of process compounds needed. High flexibility In data arrangements Multitechnique analysis Multiexperiment analysis. Multitechnique and multiexperiment analysis. In input information Soft-modeling constraints. Hard models. Adaptable to individual compounds and/or experiments.
Acknowledgements Glòria Muñoz (pH-dependent hemoglobin example) Susana Navea (Protein-drug interaction). Sílvia Mas (UB and IIQAB-CSIC) (BDE-209 example) Pat Culberg, East Carolina University (myoglobin example). Lionel Blanchet, UB and Université des Sciences et Technologies de Lille (photochemical example) Financial support by Spanish Government Group Web page:
Process. Definition and underlying model. Evolving chemical system monitored by a multivariate signal. Reaction system with a known mechanism (kinetic process) Evolving system with inexistent mechanism (chromatographic elution) Process variable Measurement channel
Protein photochemical reaction Photochemical kinetic process Protein conformational change Light on Light off time Measurement: IR rapid-scan spectroscopy (difference spectra) ( cm -1 ) Fe P B A H A Q A Q B H B B B Q i 2QH 2 QH 2 Q Q 40 Å Cytochrome complex Reaction center CYTOPLASM 2 H H+ h e - Q1Q1 Q2Q2 Photosynthetic reaction center Rhodobacter Spheroides Blanchet, L.; Ruckebusch, C.; Huvenne, J. P.; de Juan, A. Chemom. Intell. Lab. Sys. 2007, 89, 26.
Protein photochemical reaction time Light on Light off Kinetics of ubiquinol are modelled in the presence of an interference (protein absorption). time Q2Q2 P2P2 C STST Hard-modeling (ubiquinol formation and decay contribution) Soft-modeling constraints = D
Protein photochemical reaction Kinetics of ubiquinol formation and decay are modelled (hard- modeling constraint). k 1 = s -1 k -1 = s -1 Photoinduced protein conformational change (model-free) is modelled Off On Time (s) Wavenumber (cm -1 ) Amide II Amide I -Q1-Q1 +Q2+Q2
Rotational ambiguity and noise minimization Single set of process profiles for all techniques C,S T possible combinations with optimal fit are less (rotational ambiguity decreases) Noise is technique- and data set-dependent. C encloses common information for all techniques (noise effect is minimized)
Breaking rank-deficiency (multiset data) = D C S CD T s A ks B (rank 2) sBsB sAsA D CD = D C S UV T s A = ks B (rank 1) sBsB sAsA D UV Equally shaped spectra D L (enantiomers) Spectra D = Spectra L Rank 1
D = CS T D = CT inv(T)S T