Download presentation
Presentation is loading. Please wait.
1
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant
2
Measuring the spread of data Covariance of two random variables, x and y – Expectation of their product x, y need to be standardized if they use different units of measurement
3
Correlation Covariance of x and y measure correlation This treats coordinates independently – In kernel-induced feature space we don’t have access to the coordinates
4
Spread in the Feature Space Consider l × N matrix X Assume zero mean, then covariance matrix C
5
Spread in the Feature Space Observe Consider (unit vector) then the value of the projection is:
6
Spread in the Feature Space Variance of the norms of the projections onto v Where
7
Spread in the Feature Space So the covariance matrix contains everything needed to calculate the variance of data along any projection direction If the data is not centered, suntract square of mean projection
8
Variance of Projections Variance of projections onto fixed direction v in feature space using only inner product v is a linear combination of training points Then:
9
Now that we can compute the variance of projections in feature space we can implement a linear classifier The Fisher discriminant
10
Fisher Discriminant Classification function: Where w is chosen to maximize
11
Regularized Fisher discriminant Choose w to solve Quotient is invariant to rescalings of w – Use fixed value C for denominator Using a Lagrange multiplier v, the solution is
12
Regularized Fisher discriminant We then have Where – y is vector of labels {-1, +1} – I + (I - ) is identity matrix with only positive (negative) columns containing 1s – j + (j - ) all-1s vector, similar to I + (I - )
13
Regularized Fisher discriminant Furthermore, let – Where D is a diagonal matrix – And where C +, C - are given by
14
Regularized Fisher discriminant Then – With appropriate redefinitions of v, λ and C – Taking derivatives with respect to w produces
15
Dual expression of w We can express w in feature space as a linear combination of training samples w=X’α, with – Substituting w produces – Giving This is invariant to rescalings of w, so we can rescale α by v to obtain
16
Regularized kernel Fisher discriminant Solution given by Classification function is – Where k is the vector with entries k(x,x i ), i=1,…,l – And b is chosen so that w’μ + -b = b-w’μ -
17
Regularized kernel Fisher discriminant Taking w=X’α, we have – where
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.