Download presentation
Presentation is loading. Please wait.
Published byBruno Oliver Modified over 9 years ago
1
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008
2
Contents Introduction Introduction Piecewise Polynomials and Splines Piecewise Polynomials and Splines Filtering and Feature Extraction Filtering and Feature Extraction Smoothing Splines Smoothing Splines Automatic Smoothing parameter selection Automatic Smoothing parameter selection
3
1. Introduction Basis: In Linear Algebra, a basis is a set of vectors satisfying: Basis: In Linear Algebra, a basis is a set of vectors satisfying: Linear combination of the basis can represent every vector in a given vector space; No element of the set can be represented as a linear combination of the others.
4
In Function Space, Basis is degenerated to a set of basis functions; In Function Space, Basis is degenerated to a set of basis functions; Each function in the function space can be represented as a linear combination of the basis functions. Each function in the function space can be represented as a linear combination of the basis functions. Example: Quadratic Polynomial bases {1,t,t^2} Example: Quadratic Polynomial bases {1,t,t^2}
5
What is Basis Expansion? Given data X and transformation Given data X and transformation Then we model Then we model as a linear basis expansion in X, where is a basis function. is a basis function.
6
Why Basis Expansion? In regression problems, f(X) will typically nonlinear in X; In regression problems, f(X) will typically nonlinear in X; Linear model is convenient and easy to interpret; Linear model is convenient and easy to interpret; When sample size is very small but attribute size is very large, Linear model is all what we can do to avoid over fitting. When sample size is very small but attribute size is very large, Linear model is all what we can do to avoid over fitting.
7
2. Piecewise Polynomials and Splines Spline: Spline: In Mathematics, a spline is a special function defined piecewise by polynomials; In Computer Science, the term spline more frequently refers to a piecewise polynomial (parametric) curve. Simple construction, ease and accuracy of evaluation, capacity to approximate complex shapes through curve fitting and interactive curve design. Simple construction, ease and accuracy of evaluation, capacity to approximate complex shapes through curve fitting and interactive curve design.
8
Example of a Spline http://en.wikipedia.org/wiki/Image:BezierInterpolation.gif
10
Assume four knots spline (two boundary knots and two interior knots), also X is one dimensional. Assume four knots spline (two boundary knots and two interior knots), also X is one dimensional. Piecewise constant basis: Piecewise constant basis: Piecewise Linear Basis: Piecewise Linear Basis:
11
Piecewise Cubic Polynomial
12
Basis functions: Basis functions: Six functions corresponding to a six- dimensional linear space. Six functions corresponding to a six- dimensional linear space.
13
An M-order spline with knots An M-order spline with knots has continuous derivatives up to order M-2. The general form for truncated-power basis set would be:
14
Natural cubic Spline A natural cubic spline adds additional constrains: function is linear beyond the boundary knots. A natural cubic spline adds additional constrains: function is linear beyond the boundary knots. A natural cubic spline with K knots is represented by K basis functions. A natural cubic spline with K knots is represented by K basis functions. One can start from a basis for cubic splines, and derive the reduced basis by imposing boundary constraints. One can start from a basis for cubic splines, and derive the reduced basis by imposing boundary constraints.
15
Example of Natural cubic spline Starting from the truncated power series basis, we arrive at: Starting from the truncated power series basis, we arrive at: Where Where
16
An example of application (Phoneme Recognition)
17
Data:1000 samples drawn from 695 “ aa ” s and 1022 “ ao ” s, with a feature vector of length 256. Data:1000 samples drawn from 695 “ aa ” s and 1022 “ ao ” s, with a feature vector of length 256. Goal: use such data to classify spoken phoneme. Goal: use such data to classify spoken phoneme. The coefficients can be plotted as a function of frequency The coefficients can be plotted as a function of frequency
18
Fitting via maximum likelihood only, the coefficient curve is very rough; Fitting via maximum likelihood only, the coefficient curve is very rough; Fitting through natural cubic splines: Fitting through natural cubic splines: Rewrite the coefficient function as expansion of splines that ’ s where H is a p by M basis matrix of natural cubic splines. where H is a p by M basis matrix of natural cubic splines. since we replace input features x by filtered version. Fit via linear logistic regression on Final result
20
3. Filtering and Feature Extraction Preprocessing high-dimensional features is a power method to improve performance of learning algorithm. Preprocessing high-dimensional features is a power method to improve performance of learning algorithm. Previous example, a filtering approach to transform features; Previous example, a filtering approach to transform features; They need not be linear, but can be in a general form. They need not be linear, but can be in a general form. Another example: wavelet transform Another example: wavelet transform refers to section 5.9. refers to section 5.9.
21
4.Smoothing Splines Purpose: avoid complexity of knot selection problem by using maximal set of knots. Purpose: avoid complexity of knot selection problem by using maximal set of knots. Complexity is controlled via regularization. Complexity is controlled via regularization. Considering this problem: among all functions with two continuous second derivative, minimize Considering this problem: among all functions with two continuous second derivative, minimize
22
Though RSS is defined on an infinite- dimensional function space, it has an explicit, finite-dimensional unique minimizer : a natural cubic spline with knots at the unique values of the. Though RSS is defined on an infinite- dimensional function space, it has an explicit, finite-dimensional unique minimizer : a natural cubic spline with knots at the unique values of the. Penalty term translates to a penalty on the spline coefficients. Penalty term translates to a penalty on the spline coefficients.
23
Rewrite the solution:, where Rewrite the solution:, where are N-dimensional set of basis functions representing the family of natural splines. Matrix format criterion: Matrix format criterion: Where. Where. With ridge regression result, the solution: With ridge regression result, the solution: The fitted smooth spline is given by The fitted smooth spline is given by
24
Example of a smoothing spline
25
Degree of freedom and smoother matrix A smoothing spline with prechosen is a linear operator. A smoothing spline with prechosen is a linear operator. Let be the N-vector of fitted values Let be the N-vector of fitted values at the training predictors : Here is called smoother matrix. It depends on only. Here is called smoother matrix. It depends on only.
26
Suppose is a N by M matrix of M cubic spline basis functions evaluated at the N training points, with knot sequence. The fitted spline value is given by: Suppose is a N by M matrix of M cubic spline basis functions evaluated at the N training points, with knot sequence. The fitted spline value is given by: Here linear operator is a projection operator, known as hat matrix in statistics. Here linear operator is a projection operator, known as hat matrix in statistics.
27
Similarity and difference between and Both are symmetric, positive, semi-definite. Both are symmetric, positive, semi-definite. Idempotent Idempotent Rank( )=N, Rank( )=M. Rank( )=N, Rank( )=M. Trace of gives the dimension of the projection space (number of basis functions). Trace of gives the dimension of the projection space (number of basis functions).
28
Define effective degree of freedom as: Define effective degree of freedom as: By specifying, we can derive. By specifying, we can derive. Since is symmetric, hence rewrite Since is symmetric, hence rewrite is the solution of is the solution of K is known as Penalty Matrix. K is known as Penalty Matrix.
29
Eigen-decomposition of is given by: Eigen-decomposition of is given by: where where are eigen value and eigen vector of K. are eigen value and eigen vector of K.
30
Highlights of eigen-decompostion The eigen-vectors are not effected by changes in. The eigen-vectors are not effected by changes in. Shrinking nature. Shrinking nature. The eigen-vector sequence ordered by decreasing appears to increase in complexity. The eigen-vector sequence ordered by decreasing appears to increase in complexity. First two eigen values are always 1, since d1=d2=0, showing Linear functions are not penalized. First two eigen values are always 1, since d1=d2=0, showing Linear functions are not penalized.
31
Figure: cubic smooth spline fitting to some data
32
5. Automatic selection of the smoothing parameters Selecting the placement and number of knots for regression splines can be a combinatorially complex task; Selecting the placement and number of knots for regression splines can be a combinatorially complex task; For smoothing splines, only penalty. For smoothing splines, only penalty. Method: fixing the degree of freedom, solve it from. Method: fixing the degree of freedom, solve it from. Criterion: Bias-Variance tradeoff. Criterion: Bias-Variance tradeoff.
33
The Bias-Variance Tradeoff Integrated squared prediction error (EPE): Integrated squared prediction error (EPE): Cross Validation: Cross Validation:
34
An example: An example:
35
Figure: EPE,CV and effects for different degree of freedom
36
Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.