Refitting PCA/MPCA and CLS/PARAFAC Models to Incomplete Data Records

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
Ch11 Curve Fitting Dr. Deshi Ye
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Getting rid of Rayleigh Åsmund Rinnan. Introduction Fluorescence Light source Sample Detector Excites sample Emitted from sample.
Removal of the 1st order Rayleigh scatter effect Åsmund Rinnan.
Curve-Fitting Regression
Multivariate R e g r e s s i o n
The Basics of Regression continued
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Basics of regression analysis
1 5. Multiway calibration Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
REGRESSION Predict future scores on Y based on measured scores on X Predictions are based on a correlation from a sample where both X and Y were measured.
1 2. The PARAFAC model Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
1 A Fast-Nonegativity-Constrained Least Squares Algorithm R. Bro, S. D. Jong J. Chemometrics,11, , 1997 By : Maryam Khoshkam.
CSDA Conference, Limassol, 2005 University of Medicine and Pharmacy “Gr. T. Popa” Iasi Department of Mathematics and Informatics Gabriel Dimitriu University.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Statistical Methods Statistical Methods Descriptive Inferential
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Solution of a Partial Differential Equations using the Method of Lines
Northern Hemisphere Southern Hemisphere Clouds Craters Ratio Are craters the cause? Are these two ratios equal? This is NOT a.
Robust data filtering in wind power systems
AP STATISTICS LESSON 14 – 1 ( DAY 1 ) INFERENCE ABOUT THE MODEL.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Chapter 3- Model Fitting. Three Tasks When Analyzing Data: 1.Fit a model type to the data. 2.Choose the most appropriate model from the ones that have.
1 4. Model constraints Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP
1 Robustness of Multiway Methods in Relation to Homoscedastic and Hetroscedastic Noise T. Khayamian Department of Chemistry, Isfahan University of Technology,
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
ELEC 413 Linear Least Squares. Regression Analysis The study and measure of the statistical relationship that exists between two or more variables Two.
8- Multiple Regression Analysis: The Problem of Inference The Normality Assumption Once Again Example 8.1: U.S. Personal Consumption and Personal Disposal.
Estimating standard error using bootstrap
The simple linear regression model and parameter estimation
Notes on Weighted Least Squares Straight line Fit Passing Through The Origin Amarjeet Bhullar November 14, 2008.
Part 5 - Chapter
ECE3340 Numerical Fitting, Interpolation and Approximation
Analysis of variance ANOVA.
Probability Theory and Parameter Estimation I
Linear Regression.
Objectives By the end of this lecture students will:
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
Kin 304 Regression Linear Regression Least Sum of Squares
Projection on Latent Variables
CS548 Fall 2017 Anomaly Detection
Maximum Likelihood & Missing data
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
BPK 304W Regression Linear Regression Least Sum of Squares
BA 275 Quantitative Business Methods
Regression Computer Print Out
Unfolding Problem: A Machine Learning Approach
Modelling data and curve fitting
The Normal Probability Distribution Summary
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Descriptive and Inferential
5.2 Least-Squares Fit to a Straight Line
Generally Discriminant Analysis
Department of Chemical Engineering National Tsing Hua University
Multivariate Analysis Regression
Chapter 14 Inference for Regression
POWER CHALLENGES Several Ways To Solve 7 CHALLENGES.
Multiple Regression Berlin Chen
Testing Causal Hypotheses
Probabilistic Surrogate Models
Presentation transcript:

Refitting PCA/MPCA and CLS/PARAFAC Models to Incomplete Data Records Barry M. Wise

Outline The partial refit problem Approaches Example Data Sets PCA/MPCA CLS/PARAFAC Example Data Sets Comparison of results Conclusions

Why Partial Partial Refitting? PCA/MPCA and CLS/PARAFAC models generally developed on complete data records Sometimes want to compare incoming data to model before complete record is available Batch process monitoring most common Where is process going to wind up? Can faults and off-normal batches be detected mid batch?

Approaches to Partial Records Fill record with data mean Propagate current deviation Solve for missing variables-complete the squares Solve for scores with truncated Classical Least Squares

Missing Variables in PCA

Partition Residual Q

Minimize Q Is intended to replace N-way Toolbox of Anderson and Bro Basis for PLS_Toolbox missing data functions and PCA cross-validation

Replacement Matrix Map RR into matrix RM which replaces variables with estimate most consistent with PCA model Most challenging computation is inverse of R11 B. M. Wise and N. L. Ricker, "Recent Advances in Multivariate Statistical Process Control: Improving Robustness and Sensitivity,” IFAC Symposium on Advanced Control of Chemical Processes, pps. 125-130, Toulouse, France, October 1991

Effect of Replacement Residuals on replaced variables are zero

Truncated CLS Approach Calculate scores by fitting truncated loadings to available data Normally done by projection, but loadings not orthogonal after truncation Use Classical Least Squares approach

CLS Approach to PCA Replacement P. Nomikos and J.F. MacGregor, Multivariate SPC Charts for Monitoring Batch Processes, Technometrics 1995, 37(1), 41-58. S. Garcia-Munoz, T. Kourti and J.F. MacGregor, Model Predictive Monitoring for Batch Processes, Ind. Eng. Chem. Res. 2004, 43, 5929-5941 PPOLS

Surprise! Solutions are the same numerically Not obvious mathematically though!

Refitting PARAFAC Models Refitting PARAFAC model with fixed loadings in all but one mode is a single CLS step Loadings of fixed factors multiplied out and unfolded Unfolded loadings fit to data unfolded in sample mode

Residuals in a CLS Fit

Comment on CLS with Missing Data Solution based on complete the squares works for CLS, as does solution based on truncated CLS Solutions the same, as before

Data Sets Semiconductor etch process EEM of sugar 80 time steps by 12 variables by 107 batches 20 test batches EEM of sugar 7 excitation by 44 emission by 268 samples Dupont batch polymer process 100 time steps by 10 variables by 47 batches 8 test batches

MPCA on Etch Cal

PARAFAC on Etch Cal

MPCA on Etch Test

PARAFAC on Etch Test MATLAB Demos

Summary Method based on minimizing residual and truncated CLS equivalent for refitting PCA/MPCA models to incomplete data records Same is true for CLS/PARAFAC models