Definition and overview of chemometrics. Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University.

Slides:



Advertisements
Similar presentations
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Advertisements

Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.
Psychology 202b Advanced Psychological Statistics, II February 3, 2011.
Lecture 6 Ordination Ordination contains a number of techniques to classify data according to predefined standards. The simplest ordination technique is.
x – independent variable (input)
Psych 524 Andrew Ainsworth Data Screening 1. Data check entry One of the first steps to proper data screening is to ensure the data is correct Check out.
Part I: Basics of Computer Graphics Viewing Transformation and Coordinate Systems Chapter
Dimensional reduction, PCA
Copyright © 2005 Department of Computer Science CPSC 641 Winter Data Analysis and Presentation There are many “tricks of the trade” used in data.
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
3-D Geometry.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
Data Description Tables and Graphs Data Reduction.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #18.
Repeated Measures ANOVA Used when the research design contains one factor on which participants are measured more than twice (dependent, or within- groups.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural.
Paul Geladi feb 06 Is Hyperspectral Imaging an Analytical Instrument?
Little Linear Algebra Contents: Linear vector spaces Matrices Special Matrices Matrix & vector Norms.
Threeway analysis Batch organic synthesis. Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University.
Brain Mapping Unit The General Linear Model A Basic Introduction Roger Tait
Matrices, Transformations and the 3D Pipeline Matthew Rusch Paul Keet.
Data Mining Manufacturing Data Dave E. Stevens Eastman Chemical Company Kingsport, TN.
1 Multivariate Linear Regression Models Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra.
Geometric Camera Models
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
 Muhamad Jantan & T. Ramayah School of Management, Universiti Sains Malaysia Data Analysis Using SPSS.
Class Opener:. Identifying Matrices Student Check:
PROCESS MODELLING AND MODEL ANALYSIS © CAPE Centre, The University of Queensland Hungarian Academy of Sciences Statistical Model Calibration and Validation.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Chapter 6 Simple Regression Introduction Fundamental questions – Is there a relationship between two random variables and how strong is it? – Can.
Correlations: Relationship, Strength, & Direction Scatterplots are used to plot correlational data – It displays the extent that two variables are related.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
1 Chapter 2: Geometric Camera Models Objective: Formulate the geometrical relationships between image and scene measurements Scene: a 3-D function, g(x,y,z)
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Honours Graphics 2008 Session 2. Today’s focus Vectors, matrices and associated math Transformations and concatenation 3D space.
Project NExT CAS Panel Session University of Wisconsin, Madison July 30, 2008 Mathematica in the Military: Using CAS tools at the United States Military.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
 1 More Mathematics: Finding Minimum. Numerical Optimization Find the minimum of If a given function is continuous and differentiable, find the root.
Graphics Graphics Korea University kucg.korea.ac.kr Mathematics for Computer Graphics 고려대학교 컴퓨터 그래픽스 연구실.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Unsupervised Learning
Introduction to Data Mining
MATHEMATICS YEAR (2017 – 18).
CH 5: Multivariate Methods
Chapter 12: Regression Diagnostics
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
Checking the data and assumptions before the final analysis.
Rotation and Translation
Multidimensional Space,
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Unsupervised Learning
Presentation transcript:

Definition and overview of chemometrics

Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa btk.slu.se syh.fi

Project geography

Chemometrics Mathematics Statistics Computer Science In Chemistry

Similar fields Biometrics ±1900 Psychometrics ±1930 Econometrics ±1950 Technometrics ±1960

Chemometrics Design of Experiments (DOE) Exploratory Data Analysis Classification Regression and Calibration

Design of Experiments Most important where possible Uses: ANOVA F-test t-test Plots Response Surfaces

Design of Experiments y = b 0 + b 1 x 1 + b 2 x b K x K + b 11 x b 22 x b KK x K 2 + b 12 x 1 x  Factors x 1, x 2,...x K changed systematically Response y measured and modeled

Exploratory Data Analysis Design not possible Sampling situations Find structure Find groupings Find outliers

Classification Check for groupings = UNSUPERVISED Existing groupings = SUPERVISED Visualize groupings Classify Test

Regression / Calibration Two types of variables X / y Relationship linear / nonlinear Model Diagnostics Residual

x y

Multivariate Data Analysis

Sampled data and design with too many reponses: Mining Hospitals Agriculture Food industry More

Nomenclature Samples are objects What is measured on the object is a variable

34.92 Spectrum SamplesSamples Vectors 1 K 1 I

A vector is a collection of numbers. It is always a column vector.

The transpose of a vector is a row vector. Symbols for transpose are ’ and T. a’ or a T

Particle size, 1 sample

Small particles, 35 samples

The Data Matrix A data matrix is a vector of vectors I K

Size histograms, all samples Particle area

NIR wavelengths Times in batch reaction

Geometry of multivariate space

Problem I and K can be large Correlation Univariate statistics does not apply

I patients 3 variables: blood oxygen, iron, hemoglobin

O2O2 Fe Hb

O2O2 Fe Hb

O2O2 Fe Hb

O2O2 Fe Hb

O2O2 Fe Hb

O2O2 Fe Hb

O2O2 Fe Hb

O2O2 Fe Hb

O2O2 Fe Hb

Properties of multivariate space Rotation vectors unchanged / distance unchanged Translation vectors changed / distance unchanged Rescaling / change units all changes

Consequences We can move the coordinate sytem around The relative distances between objects do not change We can rotate the coordinate system Scale changes are important Move coordinate system to center of data Scale properly

Vectors (physics) x = [ x 1, x 2, x 3 ] || x || = ( x x x 3 2 ) 1/2

Geometry a b c c 2 = a 2 + b 2

Vectors (K dimensions) x = [ x 1, x 2,..., x K ] || x || = ( x x x K 2 ) 1/2

Problem We can not see in more than 3 dimensions Paper, computer screen: dimensions

O2O2 Fe Hb

O2O2 Fe Hb

Projection 2D plane (screen, paper) Many projections possible Find a good one Find a few good ones What is good?