Definition and overview of chemometrics
Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa btk.slu.se syh.fi
Project geography
Chemometrics Mathematics Statistics Computer Science In Chemistry
Similar fields Biometrics ±1900 Psychometrics ±1930 Econometrics ±1950 Technometrics ±1960
Chemometrics Design of Experiments (DOE) Exploratory Data Analysis Classification Regression and Calibration
Design of Experiments Most important where possible Uses: ANOVA F-test t-test Plots Response Surfaces
Design of Experiments y = b 0 + b 1 x 1 + b 2 x b K x K + b 11 x b 22 x b KK x K 2 + b 12 x 1 x Factors x 1, x 2,...x K changed systematically Response y measured and modeled
Exploratory Data Analysis Design not possible Sampling situations Find structure Find groupings Find outliers
Classification Check for groupings = UNSUPERVISED Existing groupings = SUPERVISED Visualize groupings Classify Test
Regression / Calibration Two types of variables X / y Relationship linear / nonlinear Model Diagnostics Residual
x y
Multivariate Data Analysis
Sampled data and design with too many reponses: Mining Hospitals Agriculture Food industry More
Nomenclature Samples are objects What is measured on the object is a variable
34.92 Spectrum SamplesSamples Vectors 1 K 1 I
A vector is a collection of numbers. It is always a column vector.
The transpose of a vector is a row vector. Symbols for transpose are ’ and T. a’ or a T
Particle size, 1 sample
Small particles, 35 samples
The Data Matrix A data matrix is a vector of vectors I K
Size histograms, all samples Particle area
NIR wavelengths Times in batch reaction
Geometry of multivariate space
Problem I and K can be large Correlation Univariate statistics does not apply
I patients 3 variables: blood oxygen, iron, hemoglobin
O2O2 Fe Hb
O2O2 Fe Hb
O2O2 Fe Hb
O2O2 Fe Hb
O2O2 Fe Hb
O2O2 Fe Hb
O2O2 Fe Hb
O2O2 Fe Hb
O2O2 Fe Hb
Properties of multivariate space Rotation vectors unchanged / distance unchanged Translation vectors changed / distance unchanged Rescaling / change units all changes
Consequences We can move the coordinate sytem around The relative distances between objects do not change We can rotate the coordinate system Scale changes are important Move coordinate system to center of data Scale properly
Vectors (physics) x = [ x 1, x 2, x 3 ] || x || = ( x x x 3 2 ) 1/2
Geometry a b c c 2 = a 2 + b 2
Vectors (K dimensions) x = [ x 1, x 2,..., x K ] || x || = ( x x x K 2 ) 1/2
Problem We can not see in more than 3 dimensions Paper, computer screen: dimensions
O2O2 Fe Hb
O2O2 Fe Hb
Projection 2D plane (screen, paper) Many projections possible Find a good one Find a few good ones What is good?