Interval selection complexity

Slides:



Advertisements
Similar presentations
13-Jun-14ASQ-FDC\FDA Conference Process Analytical Technology: What you need to know Frederick H. Long, Ph.D. President, Spectroscopic Solutions
Advertisements

The world leader in serving science TQ ANALYST SOFTWARE Putting your applications on target.
Application of NIR for counterfeit drug detection Another proof that chemometrics is usable: NIR confirmed by HPLC-DAD-MS and CE-UV Institute of Chemical.
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
1 st post launch SCIAMACHY calibration & Verification Meeting L1b Astrium Friedrichshafen – Germany 24 July 2002 First Level 1b Spectral Calibration analysis.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Variability in quality of wheat straw in terms of bio-ethanol production Jane Lindedam¹, Jacob Wagner Jensen², Sander Bruun¹, Claus Felby² and Jakob Magid¹.
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
QUALITY CONTROL OF COMPOSITION OF BLACK POLYMERES.
Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England The Conjunction of Process and.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
,. Sugar measurements in soybeans using Near Infrared Spectroscopy Introduction  Soluble carbohydrates are the third compound of soybeans by weight (11%),
Quick guide to pre-processing Use [Alt-Tab] to go to LatentiX (if running) Press [Page Down] or [Enter] to continue Press [ESC] to end the show.
SPECTRAL AND HYPERSPECTRAL INSPECTION OF BEEF AGEING STATE FERENC FIRTHA, ANITA JASPER, LÁSZLÓ FRIEDRICH Corvinus University of Budapest, Faculty of Food.
Walloon Agricultural Research Center Walloon Agricultural Research Center, Quality Department Chaussée de Namur, 24 – 5030 GEMBLOUX - Tél :++ 32 (0) 81.
Permeation is the passage of contaminants through porous and non-metallic materials. Permeation phenomenon is a concern for buried waterlines where the.
CEMTREX, INC. MODCON SYSTEMS LTD NIR Systems July 2010.
Modeling and simulation of systems Simulation optimization and example of its usage in flexible production system control.
Interactive Series Baseline Correction Algorithm
Practical Statistical Analysis Objectives: Conceptually understand the following for both linear and nonlinear models: 1.Best fit to model parameters 2.Experimental.
Universit at Dortmund, LS VIII
ASC2003 (July 15,2003)1 Uniformly Distributed Sampling: An Exact Algorithm for GA’s Initial Population in A Tree Graph H. S.
Food Quality Evaluation Techniques Beyond the Visible Spectrum Murat Balaban Professor, and Chair of Food Process Engineering Chemical and Materials Engineering.
„The perfect is not good enough!” (Carl Benz) V ISUALIZATION OF HIGH DIMENSIONAL DATA BY USE OF GENETIC PROGRAMMING – APPLICATION TO ON - LINE INFRARED.
Raw material verification with AssureID: problems and solutions Dr. Yaroslav Sokovikov Yuri Shishkin SchelTec AG.
Use of spectral preprocessing to obtain a common basis for robust regression 5 spectral preprocessing combinations gave significantly higher RPDs (α =
Taguchi. Abstraction Optimisation of manufacturing processes is typically performed utilising mathematical process models or designed experiments. However,
Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Optimizing Peak Separation for Simultaneous Pu and U Measurements Shane Knockemus US EPA / NAREL May 3, 2005.
QUANTITATIVE ANALYSIS OF POLYMORPHIC MIXTURES USING INFRARED SPECTROSCOPY IR Spectroscopy Calibration –Homogeneous Solid-State Mixtures –Multivariate Calibration.
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Data collection  Triticale samples from 2002 to 2005 (Iowa, USA).  Foss Infratec™ 1241 (transmittance instrument).  Crude protein analysis by AACC Method.
Standardization of NIR Instruments: How Useful Are the Existing Techniques? Benoit Igne Glen R. Rippke Charles.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Development of Assessments Laura Mason Consultant.
Potential of Hyperspectral Imaging to Monitor Cheese Ripening
a full-featured chemometrics software online
Chapter 40 Quantum Mechanics
10th Winter Symposium on Chemometrics
Piecewise Polynomials and Splines
St. Petersburg State University, St. Petersburg, Russia March 1st 2016
Term project for the coursework AE 569
Development of PAT tools using guided microwave spectroscopy and chemometrics for meat and dairy processing applications Ming Zhao,¹ Bhavya Panikuttira,¹.
Chemometrics for Analysis of NIR Spectra
Rule Induction for Classification Using
Let’s continue to do a Bayesian analysis
CH 5: Multivariate Methods
RAPID DETERMINATION OF PROTEIN CONTENT IN PROTEIN POWDER FINISHED PRODUCT USING NEARINFRARED (NIR) Roney Christiana, Yanjun Zhangb, Kan Heb, Piyush Purohita,
Strategies for Eliminating Interferences in Optical Emission Spectroscopy Best practices to optimize your method and correct for interferences to produce.
Han Zhao Advisor: Prof. Lei He TA: Fang Gong
How to handle missing data values
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Genetic Algorithm Optimization for Selecting the Best Architecture of a Multi-Layer Perceptron Neural Network. A Credit Scoring Case Alejandro Correa,
Collaborative Filtering Matrix Factorization Approach
Multi-mode fiber spectroscopy for cancer diagnostics
Ying shen Sse, tongji university Sep. 2016
Chapter 40 Quantum Mechanics
Combination Approaches
Estimating the nutritional quality of milk fat in cow milk
○ Hisashi Shimosaka (Doshisha University)
IMPROVING THE REGIONAL DIMENSION OF EU-SILC
Biochemical Changes In Sorghum Due To Ergot
Presentation of the main new features in
Supervised machine learning: creating a model
Chapter 40 Quantum Mechanics
WLTP CoP Procedure for CO2/FC
Introduction to Machine learning
Recognition of the 'high quality’ forgeries among medicines
On-Line Prediction and Classification using The Unscrambler On-Line Predictor (OLUP) and Classifier (OLUC)
Presentation transcript:

Interval selection complexity Reduced spectra (Xr) , size(Xr) = [m,k] C, approx. 1st interval … k-th interval V1,1 V1,2 V1,k V2,1 V2,2 V2,k .. Vm,1 Vm,2 Vm,k k n C 5 100 1011 10 1021 1000 1016 1031 10000 1041 Polynomial complexity WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016 Solution to be found In the case of interval notation problem complexity is defined by this polynomial equation. Therefore the task for 1 one thousand variables is much lower than for binary notation. I1 I2 … Ik + width concerning k intervals. Each interval Ij can contain any number of variables, width less than number variables in spectra n. Complexity for that case: C =𝑛k+1

Idea The idea of JVSPO algorithm Search the optimal solution as a combination of intervals, practically their centers Simultaneously optimize interval width (one for all) Simultaneously optimize preprocessing from the list of chosen candidates Use no restrictions on interval arrangement: Let intervals to be on the spectral boundaries Let intervals to be overlapped Perform optimization using any appropriate routine (GA, SA, PSO, MC, etc.) WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016

Genetic algorithm based optimization WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016 The skim of the genetic algorithm we use for optimization looks like this. The intervals and data preprocessing are applied inside of this fitness function.

Examples of parameters for optimization Interval wavelength selection only I1 .. In IW Interval selection and preprocessing optimization SNV, MSC, AS I1 .. In IW WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016 SG.W SG.P SG.D I1 .. In IW Interval selection, preprocessing optimization and modeling metaparameters There are chromosome structures used in JVSPO. In general our parameter set consists of 3 parts: data preprocessing, intervals and modeling algorithm metaparameters. It can be tempting to optimize the maximum number of available parameters in one task. However this strategy doesn’t seem optimal because it may lead to the excessive complexity reducing the probability of finding a global optimum. Therefore we propose to keep the chromosome reasonably compact. It’s suggested to chose modeling strategies and try them instead of the total optimization. SG.W SG.P SG.D I1 .. In IW nLV Generalized structure Data pretreatment part Variable selection part Modeling part

Dataset 1 WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016 The optimization has been tested on a previously published dataset including 1000 spectra of raw milk in visible and short-wave near infrared region used to predict fat and total protein content in the region 400–1100 nm in the presence of seasonal and geographical variability by means of PLS regression. Milk samples were collected and analyzed at a large dairy (Danone-Unimilk, Samara, Russia) over one year, from October 2013 to September 2014 from dairy farms in the district of Samara and surrounding regions. Reference: Melenteva A., Galyanin V., Savenkova E., Bogomolov A., Building global models for fat and total protein content in raw milk based on historical spectroscopic data in the visible and short-wave near infrared range. Food Chemistry 203 (2016) 190–198

Calibration and validation stats for Dataset 1 Algorithm nLV Calibration CV Prediction RMSE R2 Fat content raw data 5 0.163 0.882 0.172 0.867 0.165 0.881 iPLS 0.096 0.959 0.102 0.953 0.097 0.956 SG1D2.19+int. 0.093 0.961 0.091 SG1D2.19 0.154 0.893 0.167 0.875 0.160 Protein content 6 0.115 0.691 0.125 0.636 0.122 0.682 0.101 0.760 0.107 0.730 0.104 0.776 SG1D2.11+int. 0.797 0.758 0.805 SG1D2.11 0.121 0.657 0.135 0.571 0.677 WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016 We have compared several regression methods for fat and protein content determination. JVSPO exhibited the best results using Savitzky-Golay derivative as a preprocessing. Derivatives parameters and interval width were included in to the optimization. Without variable selection the same preprocessing doesn’t give any noticeable improvement compaired to the raw data. Another interval method iPLS has also shown a good model performance, but it still worse than our JVSPO. Reference: Melenteva A., Galyanin V., Savenkova E., Bogomolov A., Building global models for fat and total protein content in raw milk based on historical spectroscopic data in the visible and short-wave near infrared range. Food Chemistry 203 (2016) 190–198

Models for Dataset 1 fat protein Full spectra models JVSPO models WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016 JVSPO models

Dataset 2, IDRC 2014 shootout PCA a) c) b) WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016 b) Another dataset was at software shootout at IDRC 2014. Using our JVSPO we won 2nd prize.

Validation stats for Dataset 2 Participant 1 (We, 2nd prize) Participant 2 Participant 3 (Winner) Participant 4 RMSEP 0.119 0.567 0.105 0.202 SEP 0.112 0.523 0.099 0.193 R2 0.935 0.002 0.984 0.921 Bias -0.039 0.220 -0.035 -0.059 WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016 The winner has used local regression approach. He build several models for different sample groups. Reference: Benoit Igne, Andrey Bogomolov, Dongsheng Bu, Pierre Dardenne, Vladislav Galyanin and Peter Tillmann, Summary of the 2014 IDRC software shoot-out, NIR News 26 (2015) 8-14

Memento WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016

JVSPO is available online JVSPO was performed in TPT-cloud, the web-based chemometrics software Both models for milk available online for registered users fat: http://tptcloud.com/model/graph/369 protein: http://tptcloud.com/model/graph/370 JVSPO and interval selection can be performed online in a few clicks http://tptcloud.com/workspace http://tptcloud.com WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016

Conclusion JVSPO is very efficient algorithm for the optimization of the multivariate models. Typically JVSPO works better than either intervals or optimal preprocessing determined individually. JVSPO is especially advantageous when analyzing spectroscopy data with multicollinearity. Selected intervals and preprocessing may provide useful information on data interpretation. WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016

Thanks for the attention! v.galyanin@gmail.com WSC-10, Russia, samara, 29 Feb. – 4 Mar. 2016 That’s all. Thanks for attention.