St. Petersburg State University, St. Petersburg, Russia March 1st 2016

Slides:

Advertisements

Similar presentations

Chapter 5 Multiple Linear Regression

Advertisements

Brief introduction on Logistic Regression

Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.

« هو اللطیف » By : Atefe Malek. khatabi Spring 90.

Collinearity. Symptoms of collinearity Collinearity between independent variables – High r 2 High vif of variables in model Variables significant in simple.

CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.

x – independent variable (input)

The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S.

Engineering Optimization

CALIBRATION Prof.Dr.Cevdet Demir

Multivariate Data Analysis Chapter 4 – Multiple Regression.

Simple Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas –quantitative vs. binary predictor.

Bivariate & Multivariate Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas process of.

A Sparsification Approach for Temporal Graphical Model Decomposition Ning Ruan Kent State University Joint work with Ruoming Jin (KSU), Victor Lee (KSU)

Relationships Among Variables

Quantitative Methods – Week 7: Inductive Statistics II: Hypothesis Testing Roman Studer Nuffield College

Objectives of Multiple Regression

© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.

Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.

2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.

Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.

WSC-4 Simple View on Simple Interval Calculation (SIC) Alexey Pomerantsev, Oxana Rodionova Institute of Chemical Physics, Moscow and Kurt Varmuza.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Chapter 16 Data Analysis: Testing for Associations.

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.

Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama

Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;

Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.

Correlation & Regression Analysis

Adaptive Control Loops for Advanced LIGO

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.

Machine Learning 5. Parametric Methods.

QUANTITATIVE ANALYSIS OF POLYMORPHIC MIXTURES USING INFRARED SPECTROSCOPY IR Spectroscopy Calibration –Homogeneous Solid-State Mixtures –Multivariate Calibration.

Logistic Regression Saed Sayad 1www.ismartsoft.com.

Tutorial I: Missing Value Analysis

A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,

Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.

Data collection  Triticale samples from 2002 to 2005 (Iowa, USA).  Foss Infratec™ 1241 (transmittance instrument).  Crude protein analysis by AACC Method.

Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 10: Correlational Research 1.

LESSON 4.1. MULTIPLE LINEAR REGRESSION 1 Design and Data Analysis in Psychology II Salvador Chacón Moscoso Susana Sanduvete Chaves.

2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.

Linear Discriminant Function Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification & LDF.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Fundamentals of Data Analysis Lecture 10 Correlation and regression.

Estimating standard error using bootstrap

Sparsity Based Poisson Denoising and Inpainting

Robert Anderson SAS JMP

Clustering CSC 600: Data Mining Class 21.

Dr. Amjad El-Shanti MD, PMH,Dr PH University of Palestine 2016

Graduate School of Business Leadership

Predicting the Market Value of the Property Using JMP® Pro 11

Strategies for Eliminating Interferences in Optical Emission Spectroscopy Best practices to optimize your method and correct for interferences to produce.

Collaborative Filtering Matrix Factorization Approach

Information Overload and National Culture: Assessing Hofstede’s Model Based on Data from Two Countries Ned Kock, Ph.D. Dept. of MIS and Decision Science.

Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.

Qi Li,Qing Wang,Ye Yang and Mingshu Li

10701 / Machine Learning Today: - Cross validation,

Principal Component Analysis

Product moment correlation

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

CHAPTER 2: Basic Summary Statistics

Research Design and Methods

Qi Li,Qing Wang,Ye Yang and Mingshu Li

Model selection and fitting

Reinforcement Learning (2)

Presentation transcript:

St. Petersburg State University, St. Petersburg, Russia March 1st 2016 Three-point calibration models by correlation constrained MCR-ALS: A feasibility study B. Debus, D. Kirsanov, V.V Panchuk, A.A.Goydenko, V.G Semenov, A. Legin St. Petersburg State University, St. Petersburg, Russia March 1st 2016

Background First order multivariate analysis … and eventually get evidence of the possible interferent species Quantitative analysis of complex mixtures in the presence of unknown interfering species Recovery of pure individual spectral components of the target analyte

Validation / Prediction Background Partial Least Squares (PLS) Xcal Ycal β R2 R2 RMSEC RMSEP Calibration Validation / Prediction RE (%) RE (%) Strongly collinear variables / Noisy data / Prediction of more than one Y variable For this presentation we will focus mainly on the fist part that is calibration

Background Calibration set design Evaluation of the number of samples Selection of a set of uniformly distributed samples Kennard-stone Xcal? k-mean X D-optimal design 1) Large calibration set = if the number of calibration sample is too small, the predictive performance of the PLS model tends to decrease and provide biased estimates for future predictive errors. 2) Representative set of samples. 3) Kennard-stone maximum distance between pair of samples / D-optimal design = maximize the determinant of the information matrix X’X 4) k-mean and Kohonen maping = clustering techniques 5) Say that we have interest in decreasing the number of calibration samples Xtest? Human decision Reference data Complex algorithms

Sample selection criterion? Raised issues Can we save time and efforts on calibration by using a limited number of samples? Now the question we will try to answer today is the following. In our case we will use 3 points for the calibration. Sample selection criterion? Robustness / accuracy?

Correlation constrained MCR-ALS (CC-MCR1) k k = bx +b0 y = bp +b0 Select profile Update profile C profile is divided into calibration (x) and test set prediction (p) Build a local univariate calibration model with known concentrations (k) Prediction of the test set concentrations based on regression coefficients (b, b0) Everything is done in a single loop Calibration and test set prediction are performed iteratively until convergence All information of the dataset is used to optimize the model 1Antunes, M. C.; J. Simao, J. E.; Duarte, A. C.; Tauler, R. Analyst 2002, 127, 809-817

Correlation constrained MCR-ALS (CC-MCR1) Advantages of CC-MCR No influence on the number of samples Possibility to recover “pure” spectral contributions Get evidence of potential interfering species Advantages (if we compare with standard PLS method) Number of sample: because in PLS 3 pts = 2 LVs / for CC-MCR we work on the full C profile + (directly in concentration unit) 2) Comparison MCR components Vs PLS regression coefficients

Datasets Simulated datasets D1 D2 35 samples 40 samples Uniformly distributed concentration profiles D2 40 samples

Datasets Real datasets 6 lanthanides mixture2 (Ce, Pr, Nd, Sm, Eu, Gd) DTXRF 38 samples Ternary alcohol mixture (propanol, butanol, pentanol) Total reflection X-ray fluorescence (TXRF) DNMR http://www.models.life.ku.dk 225 samples

Results Simulated data D1 PLS CC-MCR Simulated profiles R2 RMSEP Explain how the points are selected (min, max and average) Simulated profiles R2 RMSEP RE (%) PLS 0.66 2.3 × 10-1 24.42 CC -MCR 0.99 6.3 × 10-3 0.73

Results Simulated data D2 PLS CC-MCR Simulated profiles R2 RMSEP Do not talk about the recovery of spectra profiles because we will see it in details in the case of real samples Simulated profiles R2 RMSEP RE (%) PLS 0.05 3.2 × 100 91.83 CC -MCR 0.89 4.4 × 10-1 12.43

Results Simulated datasets RMSEP RE (%) Ncal PLS 0.99 5.4 × 10-3 0.64 20pts CC-MCR 5.3 × 10-3 0.62 0.66 2.3 × 10-1 24.42 3 pts 6.3 × 10-3 0.73 D2 R2 RMSEP RE (%) Ncal PLS 0.98 2.0 × 10-1 6.24 30pts CC-MCR 2.3 × 10-1 7.17 0.05 3.2 × 100 91.83 3 pts 0.89 4.4 × 10-1 12.43 Similar prediction performance when the number of calibration samples is significant Strong increase of the prediction error for 3-pts PLS regression models Moderate increase of the prediction error for 3-pts CC-MCR regression models

Results TXRF dataset

Results TXRF dataset CC-MCR gives better performance than OLS and PLS Parameters Analyte Method R2 RMSEP (mol/L) RE (%) Nd OLS 0.825 1.3 × 10-4 34.87 PLS 0.835 1.6 × 10-4 42.10 CC-MCR 0.985 4.8 × 10-5 13.02 Sm 0.645 1.7 × 10-4 54.71 0.430 3.1 × 10-4 98.99 0.982 5.8 × 10-5 14.88 - 7.21 % + 30.8 % + 4.5 % + 22.5 % + 81.8 % + 1.1 % Say the order of magnitude is the same PLS : CC-MCR for large calibration set Introduce the case of Sm for which PLS cannot build a predicted model (overlap)  show the slide before CC-MCR gives better performance than OLS and PLS Moderate increase of the relative error in predicted concentration for CC-MCR PLS fails to build a predictive model for Sm

Results Reliable estimate of the signal for the target analyte Pure spectra Reliable estimate of the signal for the target analyte Selectivity of the PLS model can be questioned CC-MCR enable the estimation of possible interfering species PLS CC-MCR

Results NMR3 dataset Low S/N ratio (< 40 %) Here we arbitrary selected Propanol and Butanol for quantitative analyse whereas Pentanol was considered as an interferent Low S/N ratio (< 40 %) 3Winning and als. Journal of Magnetic Resonance,2008

Results NMR dataset Similar performance reported for PLS and CC-MCR Parameters Analyte Method R2 RMSEP (%) RE (%) Propanol PLS 0.993 2.362 5.72 CC-MCR 0.997 1.479 3.58 Butanol 2.343 5.60 0.998 1.248 2.98 + 3.1 % + 0.5 % + 3.3 % + 0.6 % Similar performance reported for PLS and CC-MCR Lower error in predicted concentration for CC-MCR Possibility to accommodate low S/N ratio with CC-MCR

Results Interpretation Interfering species “Pure spectra” PLS Interfering species “Pure spectra” CC-MCR Interfering species

Perspectives CC-MCR can be extended to 3-pts calibration models with reasonable relative error in predicted concentrations (4 – 15 %) Simple selection criterion for calibration samples Simple choice for calibration samples (min, max, average) To a certain extend it is possible so save time, money and effort on calibration Conclusion about NIR data (not very well appropriate) Both qualitative and quantitative information can be derivate from CC-MCR regression models

Thank you for your attention Acknowledgments A. Legin D. Kirsanov VV. Panchuk M. Khaydukova A.A Goydenko V.G Semenov Thank you for your attention Signaler que les resultats seront publies prochainement dans ACA