Download presentation
Presentation is loading. Please wait.
Published byGwendolyn Wells Modified over 8 years ago
1
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS
2
Basis expansion f(X) = E(Y |X) can often be nonlinear and non-additive in X However, linear models are easy to fit and interpret By augmenting the data, we may construct linear models to achieve non-linear regression/classification.
3
Basis expansion Some widely used transformations: h m (X) = X m, m = 1,..., p the original linear model. h m (X) = X j 2, h m (X) = X j X k or higher order polynomials augment the inputs with polynomial terms the number of variables grows exponentially in the degree of the polynomial: O(p d ) for a degree-d polynomial h m (X) = log(X j ),... other nonlinear transformations h m (X) = I(L m ≤ X k < U m ), breaking the range of X k up into non-overlapping regions piecewise constant
4
Basis expansion More often, we use the basis expansions as a device to achieve more flexible representations for f(X) Polynomials are global – tweaking functional forms to suite a region causes the function to flap about madly in remote regions. Red: 6 degree polynomial Blue: 7 degree polynomial
5
Basis expansion Piecewise-polynomials and splines allow for local polynomial representations Problem: the number of basis functions can grow too large to fit using limited data. Solution: Restriction methods - limit the class of functions Example: additive model
6
Basis expansion Selection methods Allow large numbers of basis functions, adaptively scan the dictionary and include only those basis functions h m () that contribute significantly to the fit of the model. Example: multivariate adaptive regression splines (MARS) Regularization methods where we use the entire dictionary but restrict the coefficients. Example: Ridge regression Lasso (both regularization and selection)
7
Piecewise Polynomials Assume X is one-dimensional. Divide the domain of X into contiguous intervals, and represent f(X) by a separate polynomial in each interval. Simplest – piecewise constant
8
Piecewise Polynomials piecewise linear Three additional basis functions are needed:
9
Piecewise Polynomials piecewise linear requiring continuity
10
Piecewise Polynomials Lower-right: Cubic spline
11
Spline An order-M spline with knots ξ j, j = 1,...,K is a piecewise-polynomial of order M, and has continuous derivatives up to order M − 2. Cubic spline is order 4; piecewise-constant function an order-1 spline Basis functions: In practice the most widely used orders are M = 1, 2 and 4.
12
Natural Cubic Splines polynomials fit to data tends to be erratic near the boundaries, and extrapolation can be dangerous. With splines, the polynomials fit beyond the boundary knots behave even more wildly than global polynomials in that region. A natural cubic spline adds additional constraints - the function is linear beyond the boundary knots.
13
Natural Cubic Splines
14
FIGURE 5.4. Fitted natural-spline functions for each of the terms in the final model selected by the stepwise procedure. Included are pointwise standard-error bands. South African Heart Disease data.
15
Smoothing Splines Avoids the knot selection problem completely. Uses a maximal set of knots. The complexity of the fit is controlled by regularization. Setup: among all functions f(x) with two continuous derivatives, find one that minimizes the penalized residual sum of squares Lambda: smoothing parameter. The second term penalizes curvature in the function
16
Smoothing Splines The solution is a natural cubic spline with knots at the unique values of the x i, i = 1,...,N the penalty term translates to a penalty on the spline coefficients shrink toward the linear fit
17
Smoothing Splines
18
effective degrees of freedom of a smoothing spline:
19
Smoothing Splines Bias-variance trade-off
20
Multidimensional Splines Basis of functions h 1k (X 1 ), k = 1,...,M 1 for X 1 Basis of functions h 2k (X 2 ), k = 1,...,M 2 for X 2 The coefficients can be fit by least squares, as before. But the dimension of the basis grows exponentially fast.
21
Multidimensional Splines
22
Generalized Additive Models f i () are unspecified smooth functions If model each function using an expansion of basis functions, the model could be fit by simple least squares. g(μ) = μ identity link, used for linear and additive models for Gaussian response data. g(μ) = logit(μ) as above, or g(μ) = probit(μ), for modeling binomial probabilities. g(μ) = log(μ) for log-linear or log-additive models for Poisson count data.
23
Generalized Additive Models The penalized least squares: where the λj ≥0 are tuning parameters The minimizer of (9.7) is an additive cubic spline model Each f j is a cubic spline in the component X j, with knots at each of the unique values of x ij, i = 1,...,N. To make solution unique,
24
Generalized Additive Models Equivalent to multiple regression for linear models: S j represents the spline. > Can use other univariate regression smoothers such as local polynomial regression and kernel methods as S j
25
Multidimensional Splines
26
MARS: Multivariate Adaptive Regression Splines an adaptive procedure for regression, well suited for high-dimensional problems MARS uses expansions in piecewise linear basis functions of the form “a reflected pair”
27
MARS: Multivariate Adaptive Regression Splines The idea is to form reflected pairs for each input X j with knots at each observed value x ij of that input. The collection of basis functions: If all of the input values are distinct, there are 2Np basis functions altogether. Model: where each h m (X) is a function in C, or a product of two or more such functions.
28
MARS: Multivariate Adaptive Regression Splines Model building – forward stepwise: in each iteration, select a function from the set C or their products. coefficients β m are estimated by standard linear regression. Add terms in the form:
29
MARS: Multivariate Adaptive Regression Splines In model Candidates At each stage we consider all products of a candidate pair with a basis function in the model. The product that decreases the residual error the most is added into the current model.
30
MARS: Multivariate Adaptive Regression Splines
31
At the end of this process we have a large model that typically overfits the data. A backward deletion procedure is applied. Remove the term whose removal causes the smallest increase in residual squared error, one at a time. This produces the best model of each size (number of terms) λ. Use (generalized) cross-validation to compare the models and select the best λ.
32
MARS: Multivariate Adaptive Regression Splines
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.