1 Modelling procedures for directed network of data blocks Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen Data structures : Directed.

1 Modelling procedures for directed network of data blocks Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen Data structures : Directed network of data blocks Input data blocks Output data blocks Intermediate data blocks Methods Optimization procedures for each passage through the network Balanced optimization of fit and prediction (H-principle) Scores, loadings, loading weights, regression coefficients for each data block Methods of regression analysis applicable at each data block Evaluation procedures at each data block Graphic procedures at each data block

2 Chemometric methods 1.Regression estimation, X, Y. Traditional presentation: Y est =XB, and standard deviations for B. Latent structure: X=TP’ + X 0. X 0 not used. Y=TQ’+Y 0.Y 0 not explained. 2.Fit and precision. Both fit and precision are controlled. 3.Selection of score vectors As large as possible describe Y as well as possible modelling stops, when no more found (cross-validation) 4. Graphic analysis of latent structure Score and loading plots Plot of weight (and loading weight) vectors

Chemometric methods 3 5. Covariance as measure of relationship X’Y for scaled data measures strength X 1 ’Y=0, implies that X 1 is remmoved from analysis 6. Causal analysis T=XR From score plots we can infer about the original measurement values Control charts for score values can be related to contribution charts 7. Analysis of X Most time of analysis is devoted to understand the structure of X. Plots are marked by symbols to better identify points in scor or loading plots. 8. Model validation. Cross-validation is used to validate the results Bootstrapping (re-sampling from data) used to establish confidence intervals

Chemometric methods 4 9. Different methods Different types of data/situations may require different type of method One is looking for interpretations of the latent structure found 10. Theory generation Results from analysis are used to establish views/theories on the data Results motivate further analysis (groupings, non-linearity etc)

5 Partitioning data, 1 X1X1 X2X2 XLXL Y1Y1 Y2Y2 Z1Z1 Z2Z2 Z3Z3 Measurement data Response data Reference data

6 Partitioning data, 2 -There is often a natural sub-division of data. - It is often required to study the role of a sub-block - Data block with few variables may ’disappear’ among one with many variables, e.g. Optical instruments often give many variables.

7 Path diagram 1 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 Examples: Production process Organisational data Diagram for sub-processes Causal diagram

8 Path diagram 2, schematic application of modelling X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 x 10 x 20 x 30 x 10 is a new sample from X 1, x 20 is a new one from X 2, x 30 is a new one from X 3, how do they generate new samples for X 4, X 5, X 6 and X 7 ? Resulting estimating equations X 4,est =X 1 B 14 +X 2 B 24 +X 3 B 34 X 5,est =X 1 B 15 +X 2 B 25 +X 3 B 35 X 6,est =X 4 B 46 +X 5 B 56 X 7,est =X 6 B 67

9 Path diagram 3 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 Time t 1 Time t 2 Data blocks can be aligned to time. Modelling can start at time t 2.

10 Notation and schematic illustrations XY Instrumental data Response data w t q u w: weight vector (to be found) t: score vector, t = Xw =w 1 x 1 +... + w K x K q: loading vector, q =Y T t = [ (y 1 T t),..., (y M T t) ] u: Y-score vector, u=Yq = q 1 y 1 +... + q M y M Vectors are collected into matrices, e.g., T=(t 1,..., t A ) Adjustments: X  X – t p T /(t T t) Y  Y – t q T /(t T t) 

11 Conjugate vectors 1 X w t p r: t=Xw, p=X T t. p a T r b =0 for a  b. X r t q r r: t=Xq, q a T r b =0 for a  b. X t p r r and s: t=Xw, p=X T v, p a T r b =0, t a T s b =0 for a  b. w s v

12 Conjugate vectors 2 The conjugate vectors R=(r 1, r 2,..., r A ) satisfy: T=XR. Latent structure solution: X = T P T + X 0, where X 0 is the part of X that is not used Y = T Q T + Y 0, where Y 0 is the part of Y that could not be explained Y = T Q T + Y 0 = X (R Q T ) + Y 0 = X B + Y 0,for B= R Q T The conjugate vectors are always computed together with the score vectors. When regression on score vectors has been computed, the regression on the original variables is computed as shown.

13 Optimization procedure, 1 Two data blocks: X1X1 X2X2 w1w1 t1t1 q2q2 |q 2 | 2  max  One data block: X1X1 w1w1 t1t1 |t 1 | 2  max

14 Three data blocks Z tztz qzqz Start |q z | 2  max X1X1 t1t1 X2X2 q2q2 w X t Y tyty qyqy w X3X3 t3t3 X4X4 t4t4 q4q4 X basisY estimatedY basisZ estimated Adjustments: t 1 describes X 1 : X 1  X 1 -t 1 p 1 T /(t 1 T t 1 ), p 1 =X 1 T t 1. t 1 describes X 2 : X 2  X 2 -t 1 q 2 T /(t 1 T t 1 ), q 2 =X 2 T t 1. q 2 describes X 3 : X 3  X 3 -t 3 q 2 T /(q 2 T q 2 ), t 3 =X 3 q 2. t 3 describes X 4 : X 4  X 4 -t 3 q 4 T /(t 3 T t 3 ), q 4 =X 4 T t 3.

15 Optimization procedure, 2 Two input and two output data blocks: X2X2 X1X1 X3X3 X4X4 w1w1 t1t1 w2w2 t2t2 q 13 q 23 q 14 q 24 Find w 1 and w 2 : |q 13 +q 23 +q 14 +q 24 | 2  max Two input, one intermediate and one output data blocks: X2X2 X1X1 X3X3 X4X4 w1w1 t1t1 w2w2 t2t2 q 13 q 23 q 134 q 234 Find w 1 and w 2 : |q 134 +q 234 | 2  max

16 Balanced optimization of fit and prediction (H-principle) X Y Linear regression In linear regression we are looking for a weight vector w, so that the resulting score vector t=Xw is good! The basic measure of quality is the prediction variance for a sample, x 0. Assuming negligible bias it can be written (assuming standard assumptions) F(w) = Var(y(x 0 )) = k[1 – (y T t) 2 /(t T t)]  [1 + t 0 2 /(t T t)]. It can be shown that F(cw)=F(w) for all c>0. Choose c such that (t T t)=1. Then F(w) = k[1 – (y T t) 2 ]  [1 + t 0 2 ]. In order to get a prediction variance as small as possible, it is natural to choose w such that (y T t) 2 becomes as large as possible, maximize (y T t) 2 = maximize |q| 2 (PLS regression)

17 Optimization procedure, 3 Weighing along objects (rows) (same algorithm, but using the transposes): X1X1 X2X2 v1v1 p1p1 t2t2 Task: find weight vector v 1 : maximize |t 2 | 2 X1X1 X2X2 v1v1 p1p1 t2t2 Task: find weight vector v 1 : maximize |q 3 | 2 X3X3 q3q3

18 Optimization procedure, 4 X1X1 X2X2 p1p1 t2t2 Task: find weight vector w 1 : maximize |q 3 | 2, where X3X3 q3q3 w1w1 t1t1 q3=X3Tt2=X3TX2p1=X3TX2X1Tt1=X3TX2X1TX1w1q3=X3Tt2=X3TX2p1=X3TX2X1Tt1=X3TX2X1TX1w1 Regression equations X 3,est =X 2 B 23 X 2,est =B 12 X 1 X 1,est =X 1 B 11 If p 1 is a good weight vector for X 2, a good result may be expected. Pre-processing may be needed to find variables in X 1 and in X 2 that are highly correlated to each other.

19 Three types of reports Reports: How a data block is doing in a network How a data block can be described by data blocks that lead to it. How a data block can be described by one data block that leads to it. XiXi X i-1 X i-2 XiXi X i-3 XiXi XiXi X i-2

20 Production data, 1 X2X2 Y X1X1 X 1 : Process parameters, 8 variables X 2 : NIR data, 1560 variables (reduced to 120) No  |  X 2 | 2  |  Y| 2  |  X| 2  |  Y| 2 178,96151,48374,96951,964 291,53867,55986,78669,553 396,35176,29191,62780,643 497,94281,38395,37385,058 598,62083,90095,91989,056 698,96785,70597,05490,050 799,20587,91797,50891,990 899,29490,47297,99093,455 999,34992,18398,66794,020 1099,42692,94798,89694,708 1199,60693,08499,10395,082 1299,65793,37699,20295,740 X 1 ’disappears’ in the NIR data X 2.

21 Production data, 2 At each step: X1X1 X2X2 w1w1 t1t1 w2w2 t2t2 Y Results for X 2, process parameters: 5 score vectors explain 11.92% of Y. Results for X 1, NIR data: 12 score vectors explain 84.141% of Y. NoStep  |  Y| 2 114,957 229,315 3510,393 4610,929 5811,920 NoStep  |  Y| 2 1151,483 2269,121 3373,070 4476,506 5578,669 6680,923 7782,129 8882,552 9983,132 10 83,590 11 83,881 12 84,141 Total 96.06%=11.920%+84.14% is explained of Y. At each step the score vectors are evaluated. Non-significant ones are excluded.

22 Production data, 3 Plot of estimated versus observed quality variable using only score vectors for process parameters. X2X2 X1X1 Y 75.12% 96.06% R 2 -values: 87.75% The process parameters contribute marginally by 11.92%. But if only they were used, they would explain 75.12% of the variation of Y. R 2 =0.7512

23 Directed network of data blocks... Input blocksIntermediate blocksOutput blocks Give weight vectors for initial score vectors Are described by previous blocks and give score vectors for succeeding blocks Are described by previous blocks

24 Magnitudes computed between two data blocks XiXi XkXk T i : Score vectors Q i : Loading vectors B i : Regression coefficients Measures of precision Measures of fit Etc Different views: a)As a part of a path b)If the results are viewed marginally c)If only X i  X k...

25 Stages in batch processes Y Time Batches Stages XkXk X2X2 X1X1 12 K Final quality Paths: X 1  X 2 ...  X K  YGiven a sample x 10, the path model gives estimated samples for later blocks [X 1 X 2 X 3 ]  X 4  YGiven values of (x 10 x 20 x 30 ), estimates for values of x 4 and y are given. [X 1 X 2 X 3 ]  [X 4 X 5 ]  YGiven values of (x 10 x 20 x 30 ), estimates for values of (x 4 x 5 ) and y are given.

26 Schematic illlustration of the modelling task for sequential processes Stages X1X1 Initial conditions Known process parameters X2X2 X3X3 Next stage X4X4 Later stages Now Y

27 Plots of score vectors X1X1 t1t1 X2X2 t2t2 XLXL tLtL X1X1 X 1 – X 2 t1t1 t2t2 X 1 – X L t1t1 tLtL The plots will show how the changes are relative to the first data block.

28 Graphic software to specify paths X4X4 X5X5 XLXL... X1X1 X2X2 X3X3 Blocks are dragged into the screen. Relationships specified.

29 Pre-processing of data Centring. If desired centring of data is carried out Scaling. In the computations all variables are scaled to unit length (or unit standard deviation if centred). It is checked if scaling disturbs the variable, e.g. if it is constant except for two values, or if the variable is at the noise level. When analysis has been completed, values are scaled back so that units are in original values. Redundant variable. It is investigated if a variable does not contribute to the explanation of any of the variables that the presnt block lead to. If it is redundant, it iseliminated from analysis. Redundant data block. It is investigated if a data block can provide with a significant description of the block that it is connected to later in the network. If it can not contribute to the description of the blocks, it is removed from the network.

30 Post-processing of results Score vectors computed in the passages through the network are evaluated in the analysis at one passage. Apart from the input blocks the score vectors found between passages are not independent. The score vectors found in a relationship X i  X j are evaluated to see if all are significant or some should be removed for this relationship. Cross-validation like in standard regression methods Confidence intervals for parmeters by resampling technique

31 International workshop on Multi-block and Path Methods 24. – 30. May 2009, Mijas, Malaga, Spain

1 Modelling procedures for directed network of data blocks Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen Data structures : Directed.

Similar presentations

Presentation on theme: "1 Modelling procedures for directed network of data blocks Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen Data structures : Directed."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Modelling procedures for directed network of data blocks Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen Data structures : Directed.

Similar presentations

Presentation on theme: "1 Modelling procedures for directed network of data blocks Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen Data structures : Directed."— Presentation transcript:

Similar presentations

About project

Feedback