BIOSYST-MeBioSwww.biw.kuleuven.be The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius.

Slides:



Advertisements
Similar presentations
What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao.
Advertisements

Environmental Data Analysis with MatLab Lecture 21: Interpolation.
Topic 12: Multiple Linear Regression
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
11/11/02 IDR Workshop Dealing With Location Uncertainty in Images Hasan F. Ates Princeton University 11/11/02.
Flexible smoothing with B-splines and Penalties or P-splines P-splines = B-splines + Penalization Applications : Generalized Linear and non linear Modelling.
The General Linear Model Or, What the Hell’s Going on During Estimation?
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Ch.6 Simple Linear Regression: Continued
Model assessment and cross-validation - overview
Data mining and statistical learning - lecture 6
MATH 685/ CSI 700/ OR 682 Lecture Notes
Selected from presentations by Jim Ramsay, McGill University, Hongliang Fei, and Brian Quanz Basis Basics.
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
Basis Expansion and Regularization
P. Brigger, J. Hoeg, and M. Unser Presented by Yu-Tseh Chi.
A Similarity Analysis of Curves: A Comparison of the Distribution of Gangliosides in Brains of Old and Young Rats. Yolanda Munoz Maldonado Department of.
1Notes  Assignment 0 is due today!  To get better feel for splines, play with formulas in MATLAB!
CS CS 175 – Week 9 B-Splines Blossoming, Bézier Splines.
CS CS 175 – Week 9 B-Splines Definition, Algorithms.
An Introduction to Functional Data Analysis Jim Ramsay McGill University.
Jim Ramsay McGill University Basis Basics. Overview  What are basis functions?  What properties should they have?  How are they usually constructed?
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Kernel methods - overview
統計計算與模擬 政治大學統計系余清祥 2003 年 6 月 9 日 ~ 6 月 10 日 第十六週:估計密度函數
1cs426-winter-2008 Notes  Ian Mitchell is running a MATLAB tutorial, Tuesday January 15, 5pm-7pm, DMP 110 We won’t be directly using MATLAB in this course,
Spatial Interpolation
1cs426-winter-2008 Notes  Assignment 0 is due today  MATLAB tutorial tomorrow 5-7 if you’re interested (see web-page for link)
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
1 Dr. Scott Schaefer Catmull-Rom Splines: Combining B-splines and Interpolation.
Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated.
Sparse Kernels Methods Steve Gunn.
Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England The Conjunction of Process and.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated.
Gaussian process modelling
Outline Separating Hyperplanes – Separable Case
Chapter 8 Curve Fitting.
Splines Vida Movahedi January 2007.
Multiple Linear Regression: Cloud Seeding By: Laila Rozie Rozie Vimal Vimal.
Receptor Occupancy estimation by using Bayesian varying coefficient model Young researcher day 21 September 2007 Astrid Jullion Philippe Lambert François.
Regression. Population Covariance and Correlation.
Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.
Basis Expansions and Regularization Part II. Outline Review of Splines Wavelet Smoothing Reproducing Kernel Hilbert Spaces.
1 Estimating the Term Structure of Interest Rates for Thai Government Bonds: A B-Spline Approach Kant Thamchamrassri February 5, 2006 Nonparametric Econometrics.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Univ logo Piecewise Gaussian Process for System Identification Juan Yan Prof Kang Li and Prof Erwei Bai Queen’s University Belfast UKACC PhD Presentation.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Generalized Additive Models: An Introduction and Example
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
SIAM Conference on Geometric Desing & Computing Approximation of spatial data with shape constraints Maria Lucia Sampoli University of Siena, Italy.
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations Day 3. Evaluating relationships Scatterplots and correlation Day 4. Regression and.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning.
Estimating standard error using bootstrap
Piecewise Polynomials and Splines
ECE3340 Numerical Fitting, Interpolation and Approximation
Non-linear relationships
Human Growth: From data to functions
Spline Interpolation Class XVII.
Lecture 1: Introduction to Machine Learning Methods
Basis Expansions and Generalized Additive Models (2)
Basis Expansions and Generalized Additive Models (1)
SKTN 2393 Numerical Methods for Nuclear Engineers
政治大學統計系余清祥 2004年5月26日~ 6月7日 第十六、十七週:估計密度函數
Presentation transcript:

BIOSYST-MeBioSwww.biw.kuleuven.be The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIOSYST-MeBioS The Potential of FDA for Chemometrics Introduction to FDA Introduction to Chemometrics Using FDA in chemometrics For prediction For Analysis Of Variance Conclusions

BIOSYST-MeBioS What is Functional Data Analysis? Developed by Ramsay & Silverman (1997) Analyse Data By approximating it Using some kind of functional basis Mainly for longitudinal data High correlation between neighbouring datapoints

BIOSYST-MeBioS Why use FDA? Data as single entity individual observations Make a function of your data Derivatives Reduce the amount of data Noise -> smoothing Impose some known properties on the data Monotonicity, non-negativeness, smoothness,...

BIOSYST-MeBioS Basis Functions? Polynomials: 1, t, t², t³,... Fourier: 1, sin(ωt), cos(ωt), sin(2ωt), cos(2ωt) Splines Wavelets Depends on your data

BIOSYST-MeBioS Chemometrics Measure optical properties of material Transmission or reflection of light At a large number of wavelengths Use these properties to predict something else

BIOSYST-MeBioS Why Chemometrics? Fast Cheap Non-destructive Environment-friendly

BIOSYST-MeBioS Classical methods Ignore correlation between neighbouring wavelengths:

BIOSYST-MeBioS FDA in chemometrics NIR spectra Absorption peaks Width and height Basis: B-splines ~ shape of absorption peaks Preserve the vicinity constraint

BIOSYST-MeBioS Spline Functions Piecewise joining polynomials of order m Fast evaluation Continuity of derivatives Up to order m-2 In L interior knots Degrees of freedom: L + m Flexible

BIOSYST-MeBioS

Constructing a spline basis Order What to use the model for Mostly cubic splines (order 4) Number and position of knots Use enough Look at the data !Overfitting

BIOSYST-MeBioS Position of knots More variation -> more knots

BIOSYST-MeBioS B-spline approximation

BIOSYST-MeBioS FDA for prediction Functional regression models P-Spline Regression (Marx and Eilers) Non-Parametric Functional Data Analysis (Ferraty and Vieu)

BIOSYST-MeBioS Functional Regression Models Project spectra to spline basis Apply Multivariate Linear Regression to the spline coefficients Great reduction in system complexity Natural shape of absorption peaks is used

BIOSYST-MeBioS Functional Regression Models: case study 420 samples of hog manure Reflectance spectra Total nitrogen (TN) and dry matter (DM) content PLS and Functional Regression applied

BIOSYST-MeBioS Functional Regression: case study (ct'd)

BIOSYST-MeBioS Functional Regression: case study: results

BIOSYST-MeBioS P-Spline Regression (PSR) By Marx and Eilers Construct with B-splines : Use roughness parameter on Minimize Full spectra are used for regression

BIOSYST-MeBioS P-Spline Regression: case study 121 samples of seed pills y is % humidity PLS: RMSEP = 1,19 PSR: RMSEP = 1,115 # B-spline coefficients = 7 λ = 0.001

BIOSYST-MeBioS Non-Parametric Functional Data Analysis By F. Ferraty and P. Vieu No regression model is involved Prediction by applying local kernel functions in function space So far, no good results yet

BIOSYST-MeBioS FDA in Anova setting: FANOVA ANOVA: “Study the relation between a response variable and one or more explanatory variables” is overall mean are the effects of belonging to a group g are residuals

BIOSYST-MeBioS FANOVA: theory Constraint: Introduce so that Introduce functional aspect: Constraint: introduce

BIOSYST-MeBioS FANOVA: goal and solution Goal: estimate from Solution:

BIOSYST-MeBioS FANOVA: significance testing Locally: Globally:

BIOSYST-MeBioS FANOVA: case study Spectra of manure 4 types of animals: dairy, beef, calf, hog 3 ambient temperatures: 4°C, 12°C, 20°C 3 sample temperatures: 4°C, 12°C, 20°C 9 replicates => 324 samples Model:

BIOSYST-MeBioS FANOVA: case study (ct'd)

BIOSYST-MeBioS FANOVA: case study (ct'd)

BIOSYST-MeBioS Conclusions Splines are a good basis for fitting spectral data Using FDA, it is possible to include vicinity constraint in prediction models in chemometrics FANOVA is a good tool to explore the variance in spectral data