Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.

Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant

Measuring the spread of data Covariance of two random variables, x and y – Expectation of their product x, y need to be standardized if they use different units of measurement

Correlation Covariance of x and y measure correlation This treats coordinates independently – In kernel-induced feature space we don’t have access to the coordinates

Spread in the Feature Space Consider l × N matrix X Assume zero mean, then covariance matrix C

Spread in the Feature Space Observe Consider (unit vector) then the value of the projection is:

Spread in the Feature Space Variance of the norms of the projections onto v Where

Spread in the Feature Space So the covariance matrix contains everything needed to calculate the variance of data along any projection direction If the data is not centered, suntract square of mean projection

Variance of Projections Variance of projections onto fixed direction v in feature space using only inner product v is a linear combination of training points Then:

Now that we can compute the variance of projections in feature space we can implement a linear classifier The Fisher discriminant

Fisher Discriminant Classification function: Where w is chosen to maximize

Regularized Fisher discriminant Choose w to solve Quotient is invariant to rescalings of w – Use fixed value C for denominator Using a Lagrange multiplier v, the solution is

Regularized Fisher discriminant We then have Where – y is vector of labels {-1, +1} – I + (I - ) is identity matrix with only positive (negative) columns containing 1s – j + (j - ) all-1s vector, similar to I + (I - )

Regularized Fisher discriminant Furthermore, let – Where D is a diagonal matrix – And where C +, C - are given by

Regularized Fisher discriminant Then – With appropriate redefinitions of v, λ and C – Taking derivatives with respect to w produces

Dual expression of w We can express w in feature space as a linear combination of training samples w=X’α, with – Substituting w produces – Giving This is invariant to rescalings of w, so we can rescale α by v to obtain

Regularized kernel Fisher discriminant Solution given by Classification function is – Where k is the vector with entries k(x,x i ), i=1,…,l – And b is chosen so that w’μ + -b = b-w’μ -

Regularized kernel Fisher discriminant Taking w=X’α, we have – where

Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.

Similar presentations

Presentation on theme: "Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.

Similar presentations

Presentation on theme: "Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant."— Presentation transcript:

Similar presentations

About project

Feedback