From Data to Differential Equations Jim Ramsay McGill University With inspirations from Paul Speckman and Chong Gu.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
AGC DSP AGC DSP Professor A G Constantinides©1 Modern Spectral Estimation Modern Spectral Estimation is based on a priori assumptions on the manner, the.
Model assessment and cross-validation - overview
Data mining and statistical learning - lecture 6
Selected from presentations by Jim Ramsay, McGill University, Hongliang Fei, and Brian Quanz Basis Basics.
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
Visual Recognition Tutorial
An Introduction to Functional Data Analysis Jim Ramsay McGill University.
Psychometrics, Dynamics, and Functional Data Analysis Jim Ramsay Jim Ramsay McGill University “The views “The views.
Jim Ramsay McGill University Basis Basics. Overview  What are basis functions?  What properties should they have?  How are they usually constructed?
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
x – independent variable (input)
280 SYSTEM IDENTIFICATION The System Identification Problem is to estimate a model of a system based on input-output data. Basic Configuration continuous.
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Ch 7.9: Nonhomogeneous Linear Systems
Curve-Fitting Regression
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Development of Empirical Models From Process Data
Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
CS Subdivision I: The Univariate Setting Peter Schröder.
Maximum likelihood (ML)
III Solution of pde’s using variational principles
Classification and Prediction: Regression Analysis
Normalised Least Mean-Square Adaptive Filtering
From Data to Differential Equations Jim Ramsay McGill University.
Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Computational Methods in Physics PHYS 3437 Dr Rob Thacker Dept of Astronomy & Physics (MM-301C)
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
V. Space Curves Types of curves Explicit Implicit Parametric.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Curve-Fitting Regression
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Chapter 2. Signals and Linear Systems
Motivation Thus far we have dealt primarily with the input/output characteristics of linear systems. State variable, or state space, representations describe.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Machine Learning 5. Parametric Methods.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Amir Yavariabdi Introduction to the Calculus of Variations and Optical Flow.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Estimating standard error using bootstrap
Boyce/DiPrima 10th ed, Ch 7.9: Nonhomogeneous Linear Systems Elementary Differential Equations and Boundary Value Problems, 10th edition, by William E.
Chapter 7. Classification and Prediction
CSE 4705 Artificial Intelligence
Introduction to Instrumentation Engineering
Modern Spectral Estimation
Human Growth: From data to functions
Basis Expansions and Generalized Additive Models (2)
Presentation transcript:

From Data to Differential Equations Jim Ramsay McGill University With inspirations from Paul Speckman and Chong Gu

The themes Differential equations are powerful tools for modeling data. Differential equations are powerful tools for modeling data. We have new methods for estimating differential equations directly from data. We have new methods for estimating differential equations directly from data. Some examples are offered, drawn from chemical engineering and medicine. Some examples are offered, drawn from chemical engineering and medicine.

Differential Equations as Models DIFE’S make explicit the relation between one or more derivatives and the function itself. DIFE’S make explicit the relation between one or more derivatives and the function itself. An example is the harmonic motion equation: An example is the harmonic motion equation:

Why Differential Equations? The behavior of a derivative is often of more interest than the function itself, especially over short and medium time periods. The behavior of a derivative is often of more interest than the function itself, especially over short and medium time periods. How rapidly a system responds rather than its level of response is often what matters. How rapidly a system responds rather than its level of response is often what matters. Velocity and acceleration can reflect energy exchange within a system. Recall equations like f = ma and e = mc 2. Velocity and acceleration can reflect energy exchange within a system. Recall equations like f = ma and e = mc 2.

Natural scientists often provide theory to biologists and engineers in the form of DIFE’s. Natural scientists often provide theory to biologists and engineers in the form of DIFE’s. Many fields such as pharmacokinetics and industrial process control routinely use DIFE’s as models. Many fields such as pharmacokinetics and industrial process control routinely use DIFE’s as models. Especially for input/output systems, and for systems with two or more functional variables mutually influencing each other. Especially for input/output systems, and for systems with two or more functional variables mutually influencing each other. DIFE’s arise when feedback systems must be developed to control the behavior of systems. DIFE’s arise when feedback systems must be developed to control the behavior of systems.

The solution to an mth order linear DIFE is an m-dimensional function space, and thus the equation can model variation over replications as well as average behavior. The solution to an mth order linear DIFE is an m-dimensional function space, and thus the equation can model variation over replications as well as average behavior. A DIFE requires that derivatives behave smoothly, since they are linked to the function itself. A DIFE requires that derivatives behave smoothly, since they are linked to the function itself. Nonlinear DIFE’s can provide compact and elegant models for systems exhibiting exceedingly complex behavior. Nonlinear DIFE’s can provide compact and elegant models for systems exhibiting exceedingly complex behavior.

The Rössler Equations This nearly linear system exhibits chaotic behavior that would be virtually impossible to model without using a DIFE:

Stochastic DIFE’s We can introduce randomness into DIFE’s in many ways: Random coefficient functions. Random coefficient functions. Random forcing functions. Random forcing functions. Random initial, boundary, and other constraints. Random initial, boundary, and other constraints. Time unfolding at a random rate. Time unfolding at a random rate.

Deliverables If we can model data on functions or functional input/output systems, we will have a modeling tool that greatly extends the power and scope of existing nonparametric curve-fitting techniques. If we can model data on functions or functional input/output systems, we will have a modeling tool that greatly extends the power and scope of existing nonparametric curve-fitting techniques. We may also get better estimates of functional parameters and their derivatives. We may also get better estimates of functional parameters and their derivatives.

A simple input/output system We begin by looking at a first order DIFE for a single output function x(t) and a single input function u(t). (SISO) We begin by looking at a first order DIFE for a single output function x(t) and a single input function u(t). (SISO) But our goal is the linking of multiple inputs to multiple outputs (MIMO) by linear or nonlinear systems of arbitrary order m. But our goal is the linking of multiple inputs to multiple outputs (MIMO) by linear or nonlinear systems of arbitrary order m.

u(t) is often called the forcing function, andu(t) is often called the forcing function, and is an exogenous functional independent is an exogenous functional independent variable. variable. Dx(t) = -β(t)x(t) is called the homogeneousDx(t) = -β(t)x(t) is called the homogeneous part of the equation. part of the equation. α(t) and β(t) are the coefficient functionsα(t) and β(t) are the coefficient functions that define the DIFE. that define the DIFE. The system is linear in these coefficientThe system is linear in these coefficient functions, and in the input u(t) and output functions, and in the input u(t) and output x(t). x(t).

In this simple case, an analytic solution is possible: However, it is necessary to use numerical methods to find the solution to most DIFE’S.

A simpler constant coefficient example We can see more clearly what happens when the coefficients α and β are constants, the coefficients α and β are constants, α = 1, x 0 = 0, and α = 1, x 0 = 0, and u(t) is a step function, stepping from 0 to 1 at time t 1 : u(t) is a step function, stepping from 0 to 1 at time t 1 :

Constant α/β is the gain in the system. Constant α/β is the gain in the system. Constant β controls the responsivity of the system to a change in input. Constant β controls the responsivity of the system to a change in input.

A Real Example: Lupus treatment Lupus is an incurable auto-immune disease that mainly afflicts women. Lupus is an incurable auto-immune disease that mainly afflicts women. It flares unpredictably, inflicting wide damage with severe symptoms. It flares unpredictably, inflicting wide damage with severe symptoms. The treatment is prednisone, an immune system suppressant used in transplants. The treatment is prednisone, an immune system suppressant used in transplants. But prednisone has serious short- and long-term side affects, and exposure to it must be controlled. But prednisone has serious short- and long-term side affects, and exposure to it must be controlled.

How to Estimate a Differential Equation from Raw Data A previous method, principal differential analysis, first smoothed the data to get functions x(t) and u(t), and then estimated the coefficient functions defining the DIFE. A previous method, principal differential analysis, first smoothed the data to get functions x(t) and u(t), and then estimated the coefficient functions defining the DIFE. This two-stage procedure is inelegant and probably inefficient. Going directly from data to DIFE would be better. This two-stage procedure is inelegant and probably inefficient. Going directly from data to DIFE would be better.

Profile Least Squares The idea is to replace the function fitting the raw data, x(t), by the equations defining the fit to the data conditional on the DIFE. The idea is to replace the function fitting the raw data, x(t), by the equations defining the fit to the data conditional on the DIFE. Then we optimize the fit with respect to only the unknown parameters defining the DIFE itself. Then we optimize the fit with respect to only the unknown parameters defining the DIFE itself. The fit x(t) is defined as a by-product of the process, but does not itself require additional parameters. The fit x(t) is defined as a by-product of the process, but does not itself require additional parameters.

This profiling process is often used in nonlinear least squares problems where some parameters are easily solved for given other parameters. This profiling process is often used in nonlinear least squares problems where some parameters are easily solved for given other parameters. There we express the conditional estimates of the these easy-to-estimate parameters as functions of the unknown hard-to-estimate parameters, and optimize only with respect to the hard parameters. There we express the conditional estimates of the these easy-to-estimate parameters as functions of the unknown hard-to-estimate parameters, and optimize only with respect to the hard parameters. This saves both computational time and degrees of freedom. This saves both computational time and degrees of freedom. An alternative strategy is to integrate over the easy parameters, and optimize with respect to the hard ones; this is the M-step in the EM algorithm. An alternative strategy is to integrate over the easy parameters, and optimize with respect to the hard ones; this is the M-step in the EM algorithm.

The DIFE as a linear differential operator We can re-express the first order DIFE as a linear differential operator: More compactly, suppressing “(t)”, and making explicit the dependency of L on α and β, explicit the dependency of L on α and β,

Smoothing data with the operator L If we know the differential equation, then the differential operator L defines a data smoother (Heckman and Ramsay, 2000). The fitting criterion is: The larger λ is, the more the fitting function x(t) is forced to be a solution of the differential equation L αβ x(t) = 0.

Let x(t) be expanded in terms of a set K basis functions φ k (t), Let x(t) be expanded in terms of a set K basis functions φ k (t), Let N by K matrix Z contain the values of these basis functions at time points t i, and Let N by K matrix Z contain the values of these basis functions at time points t i, and Let y be the vector of raw data. Let y be the vector of raw data.

Then the smooth values have the expression Zc, Then the smooth values have the expression Zc, where c is the vector of coefficients. where c is the vector of coefficients. But these coefficients are easy parameters to estimate given operator L αβ. The expression for them is But these coefficients are easy parameters to estimate given operator L αβ. The expression for them is We therefore remove parameter vector c by We therefore remove parameter vector c by replacing it with the expression above. replacing it with the expression above.

How to estimate L L is a function of weight coefficients α(t) and β(t). If these have the basis function expansions then we can optimize the profiled error sum of squares with respect to coefficient vectors a and b.

It is also a simple matter to: constrain some coefficient functions to be zero or a constant. constrain some coefficient functions to be zero or a constant. force some coefficient functions to be smooth, employing specific linear differential operators to smooth them towards specific target spaces. We do this by appending penalties to SSE(a,b), such as force some coefficient functions to be smooth, employing specific linear differential operators to smooth them towards specific target spaces. We do this by appending penalties to SSE(a,b), such as where M is a linear differential operator for penalizing the roughness of β.

And more … This approach is easily generalizable to: DIFE’s and differential operators of any order. DIFE’s and differential operators of any order. Multiple inputs u j (t) and outputs x i (t). Multiple inputs u j (t) and outputs x i (t). Replicated functional data. Replicated functional data. Nonlinear DIFE’s and operators. Nonlinear DIFE’s and operators.

Adaptive smoothing We can also use this approach to have the level of smoothing vary. We modify the differential operator as follows: The exponent function κ(t) plays the role of a log λ that varies with t. that varies with t.

Choosing the smoothing parameter λ is always a delicate matter. Choosing the smoothing parameter λ is always a delicate matter. The right value of λ will be rather large if the data can be well-modeled by a low- order DIFE. The right value of λ will be rather large if the data can be well-modeled by a low- order DIFE. But it should not so large as to smooth away additional functional variation that may be important. But it should not so large as to smooth away additional functional variation that may be important. Estimating λ by generalized cross- validation seems to work reasonably well, at least for providing a tentative value to be explored further. Estimating λ by generalized cross- validation seems to work reasonably well, at least for providing a tentative value to be explored further.

A First Example The first example simulates replicated data where the true curves are a set of tilted sinusoids. The first example simulates replicated data where the true curves are a set of tilted sinusoids. The operator L is of order 4 with constant coefficients. The operator L is of order 4 with constant coefficients. How precisely can we estimate these coefficients? How precisely can we estimate these coefficients? How accurately can we estimate the curves and first two derivatives? How accurately can we estimate the curves and first two derivatives?

For replications i=1,…,N and time values j=1,…,n, let where the c ik ’s and the ε ij ’s are N(0,1); and t = 0(0.01)1. The functional variation satisfies the differential equation where β 0 (t) = β 1 (t) = β 3 (t)=0 and β 2 (t) = (6π) 2 =

For simulated data with N = 20 replications and constant bases for β 0 (t),…, β 3 (t), we get L = D 4: best results are for λ= and the RIMSE’s for derivatives 0, 1 and 2 are 0.32, 9.3 and 315.6, respectively. L = D 4: best results are for λ= and the RIMSE’s for derivatives 0, 1 and 2 are 0.32, 9.3 and 315.6, respectively. L estimated: best results are for λ=10 -5 and the RIMSE’s are 0.18, 2.8, and 49.3, respectively. L estimated: best results are for λ=10 -5 and the RIMSE’s are 0.18, 2.8, and 49.3, respectively. giving precision ratios of 1.8, 3.3 and 6.4, resp. giving precision ratios of 1.8, 3.3 and 6.4, resp. β 2 was estimated as whereas the true value was β 2 was estimated as whereas the true value was β 3 was 0.1, with true value 0.0. β 3 was 0.1, with true value 0.0.

In addition to better curve estimates and much better derivative estimates, note that the derivative RMSE’s do not go wild at the end points, which is usually a serious problem with polynomial spline smoothing. In addition to better curve estimates and much better derivative estimates, note that the derivative RMSE’s do not go wild at the end points, which is usually a serious problem with polynomial spline smoothing. This is because the DIFE ties the derivatives to the function values, and the function values are tamed at the end points by the data. This is because the DIFE ties the derivatives to the function values, and the function values are tamed at the end points by the data.

A decaying harmonic with a forcing function Data from a second order equation defining harmonic behavior with decay, forced by a step function, is generated by β 0 = 4.04, β 1 = 0.4, α = β 0 = 4.04, β 1 = 0.4, α = u(t) = 0, t < 2π, u(t) = 1, t ≥ 2π. u(t) = 0, t < 2π, u(t) = 1, t ≥ 2π. Adding noise with std. dev Adding noise with std. dev. 0.2.

Parameter True Value Mean Estimate Std. Error β0β0β0β β1β1β1β α With only one replication, using minimum generalized cross-validation to choose λ, the results estimated for 100 trials are:

An oil refinery example The single input is “reflux flow” and the output is “tray 47” level in a distillation column. The single input is “reflux flow” and the output is “tray 47” level in a distillation column. There were 194 sampling points. There were 194 sampling points. 30 B-spline basis functions were used to fit the output, and a step function was used to model the input. 30 B-spline basis functions were used to fit the output, and a step function was used to model the input.

After some experimentation with first and second order models, and with constant and varying coefficient models, the clear conclusion seems to be the constant coefficient model: The standard errors of β and α in this model, as estimated by parametric bootstrapping, as estimated by parametric bootstrapping, were and , respectively. were and , respectively. The delta method yielded and , respectively. Pretty much the same. respectively. Pretty much the same.

Monotone smoothing Some constrained functions can be expressed as DIFE’s. Some constrained functions can be expressed as DIFE’s. A smooth strictly monotone function can be expressed as the second order DIFE A smooth strictly monotone function can be expressed as the second order DIFE

We can monotonically smooth data by estimating the second order DIFE directly. We can monotonically smooth data by estimating the second order DIFE directly. We constrain β 0 (t) = 0, and give β 1 (t) enough flexibility to smooth the data. We constrain β 0 (t) = 0, and give β 1 (t) enough flexibility to smooth the data. In the following artificial example, the smoothing parameter was chosen by generalized cross-validation. β 1 (t) was expanded in terms of 13 B-splines. In the following artificial example, the smoothing parameter was chosen by generalized cross-validation. β 1 (t) was expanded in terms of 13 B-splines.

Analyzing the Lupus data Weight function β(t) defining an order 1 DIFE for symptoms estimated with and without prednisone dose as a forcing function. Weight function β(t) defining an order 1 DIFE for symptoms estimated with and without prednisone dose as a forcing function. Weight expanded using B-splines with knots at every observation time. Weight expanded using B-splines with knots at every observation time. Weight α(t) for prednisone is constant. Weight α(t) for prednisone is constant.

The forced DIFE for lupus

The data fit

Adding the forcing function halved the LS fitting criterion being minimized. Adding the forcing function halved the LS fitting criterion being minimized. We see that the fit improves where the dose is used to control the symptoms, but not where it is not used. We see that the fit improves where the dose is used to control the symptoms, but not where it is not used. These results are only suggestive, and much more needs to be done. These results are only suggestive, and much more needs to be done. We want to model treatment and symptom as mutually influencing each other. This requires a system of two differential equations. We want to model treatment and symptom as mutually influencing each other. This requires a system of two differential equations.

Summary We can estimate differential equations directly from noisy data with little bias and good precision. We can estimate differential equations directly from noisy data with little bias and good precision. This gives us a lot more modeling power, especially for fitting input/output functional data. This gives us a lot more modeling power, especially for fitting input/output functional data. Estimates of derivatives can be much better, relative to smoothing methods. Estimates of derivatives can be much better, relative to smoothing methods. Functions with special properties such as monotonicity can be fit by estimating the DIFE that defines them. Functions with special properties such as monotonicity can be fit by estimating the DIFE that defines them.