Download presentation
Presentation is loading. Please wait.
1
An Introduction to Functional Data Analysis Jim Ramsay McGill University
2
What is functional data analysis? A data analysis is functional if either: The data to be analyzed are assumed to come from a smooth function (eg: time, space, space/time, ability, depression, frequency, molecular weight, etc.) The data to be analyzed are assumed to come from a smooth function (eg: time, space, space/time, ability, depression, frequency, molecular weight, etc.) The data are to be modeled by a smooth function (eg: probability density, dose response function, intensity function, etc.) The data are to be modeled by a smooth function (eg: probability density, dose response function, intensity function, etc.)
3
So what’s new about that? Two ideas are central: The functions are smooth, usually meaning that one or more derivatives can be estimated and are useful. The functions are smooth, usually meaning that one or more derivatives can be estimated and are useful. No assumptions, such as stationarity, low dimensionality, equally spaced sampling points, etc, are made about the functions or the data. No assumptions, such as stationarity, low dimensionality, equally spaced sampling points, etc, are made about the functions or the data.
4
Heights of 10 girls
5
Important features of growth data Times of measurements are not equally spaced. Times of measurements are not equally spaced. Growth is smooth; we want to study the second derivative of these curves. Growth is smooth; we want to study the second derivative of these curves. No assumptions are made initially about the shapes of these curves or their derivatives. No assumptions are made initially about the shapes of these curves or their derivatives. A girl’s entire height record is viewed as a single unitary observation. A girl’s entire height record is viewed as a single unitary observation.
6
What new challenges are there? Have a look at the estimated acceleration curves for these ten girls. Growth curves should be monotonic; how can we achieve this? Growth curves should be monotonic; how can we achieve this? How can we get a good estimate of acceleration? How can we get a good estimate of acceleration? What’s wrong with the mean acceleration curve? What’s wrong with the mean acceleration curve?
7
Acceleration curves for 10 girls
8
Phase and amplitude variation We see that acceleration curves vary in terms of: The intensity of the pubertal growth spurt. The intensity of the pubertal growth spurt. and its timing. and its timing. There is both amplitude and phase variation here. There is both amplitude and phase variation here. Unless we can remove the phase variation, the cross-sectional mean is worthless. Unless we can remove the phase variation, the cross-sectional mean is worthless.
9
Do standard data analyses have functional counterparts? They surely do. There are functional versions of: Analysis of variance Analysis of variance Multiple regression analysis Multiple regression analysis Principal components analysis Principal components analysis Canonical correlation analysis Canonical correlation analysis Cluster and classification analysis Cluster and classification analysis
10
Is there anything new in FDA? Because the functions we estimate are assumed smooth, we can model the dynamic behavior of the data. Because the functions we estimate are assumed smooth, we can model the dynamic behavior of the data. This means using differential equations to model how the output of an input/output system changes in response to changes in the input. This means using differential equations to model how the output of an input/output system changes in response to changes in the input.
11
Data from an oil refinery
12
The data are from a tray in a distillation column. The data are from a tray in a distillation column. The output is the top plot; the input is the bottom plot. The output is the top plot; the input is the bottom plot. The solid line is a model using a simple first order differential equation: The solid line is a model using a simple first order differential equation: Dx(t) = -βx(t) + αu(t) Dx(t) = -βx(t) + αu(t) where x(t) is the output function and u(t) is the input function. where x(t) is the output function and u(t) is the input function. How can we estimate such models from noisy observed data? How can we estimate such models from noisy observed data?
13
Where do we start? The first task is to learn methods for estimating smooth functions from discrete noisy data. The first task is to learn methods for estimating smooth functions from discrete noisy data. We use basis function expansions to model functions. We use basis function expansions to model functions. We impose smoothness using roughness penalties. We impose smoothness using roughness penalties.
14
And what’s next? Because most functional data show variation in both phase and amplitude, the next step is to learn how to separate phase from amplitude variation. Because most functional data show variation in both phase and amplitude, the next step is to learn how to separate phase from amplitude variation. This process is called curve registration. This process is called curve registration. After that, we can use functional versions of standard multivariate data analyses. After that, we can use functional versions of standard multivariate data analyses.
15
What about functional exploratory data analysis? As always, graphical display methods are indispensable. As always, graphical display methods are indispensable. We will focus on the phase/plane plot as a way of exploring the interplay between derivatives. We will focus on the phase/plane plot as a way of exploring the interplay between derivatives. Principal components and cluster analyses are also useful. Principal components and cluster analyses are also useful.
16
How do we use covariate information? Covariates or independent variables can be (a) multivariate and (b) functional. Covariates or independent variables can be (a) multivariate and (b) functional. Regression analysis with a functional response and multivariate covariates is fairly straightforward. Regression analysis with a functional response and multivariate covariates is fairly straightforward. Regressing on functional covariates leads to new challenges, however. Regressing on functional covariates leads to new challenges, however.
17
Can derivatives be used, too? Every function, whether directly fit to data, or estimated from non-functional data, is assumed to have one or more derivatives available for an analysis. Every function, whether directly fit to data, or estimated from non-functional data, is assumed to have one or more derivatives available for an analysis. A differential equation is a model that contains one or more derivatives as a part of the model. A differential equation is a model that contains one or more derivatives as a part of the model.
18
Unique Aspects of Functional Data Analysis The data are from a smooth process, so we can use derivatives in various ways. The data are from a smooth process, so we can use derivatives in various ways. Time itself may be an elastic medium, and vary over functional observations. Time itself may be an elastic medium, and vary over functional observations. Differential equations can play a big role in a functional data analysis. Differential equations can play a big role in a functional data analysis.
19
Finding out More Ramsay, J. O. and Silverman, B. W. (1997, 2004) Functional Data Analysis. Springer. Ramsay, J. O. and Silverman, B. W. (1997, 2004) Functional Data Analysis. Springer. Ramsay, J. O. and Silverman, B. W. Ramsay, J. O. and Silverman, B. W. (2002) Applied Functional Data Analysis. Springer (2002) Applied Functional Data Analysis. Springer Visit the FDA website: www.psych.mcgill.ca/misc/fda/ Visit the FDA website: www.psych.mcgill.ca/misc/fda/ www.psych.mcgill.ca/misc/fda/ Software in Matlab, R and S-PLUS available at ego.psych.mcgill.ca/pub/ramsay Software in Matlab, R and S-PLUS available at ego.psych.mcgill.ca/pub/ramsay
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.