Measure Independence in Kernel Space Presented by: Qiang Lou
References I made slides based on following papers: F. Bach and M. Jordan. Kernel Independent Component Analysis. Journal of Machine Learning Research, Arthur Gretton, Ralf herbrich, Alexander Smola, Olivier Bousquet, Bernhard Scholkopf. Kernel Methods for Measuring Independence. Journal of Machine Learning and Research, 2005.
Outline Introduction Canonical Correlation Kernel Canonical Correlation Application Example
Introduction What is Independence? Intuitively, two variables y 1, y 2 are said to be independent if information on value of one variable does not give any information on the value of the other variable. Technically, y 1 and y 2 are independent if and only if and only if the joint pdf is factorizable in the following way: p(y 1, y 2 ) = p 1 (y 1 )*p 2 (y 2 )
Introduction How to measure Independence. --Can we use correlation? --Uncorrelated variables means Independent variables? Remark: y 1 and y 2 are uncorrelated means: E[y 1 y 2 ] – E[y 1 ]E[y 2 ] = 0
Introduction The answer is “No” Fact: Independence implies uncorrelatedness, but the reverse is not true. Which means: p(y 1, y 2 ) = p 1 (y 1 )*p 2 (y 2 ) → E[y 1 y 2 ] – E[y 1 ]E[y 2 ] = 0 E[y 1 y 2 ] – E[y 1 ]E[y 2 ] = 0 → p(y 1, y 2 ) = p 1 (y 1 )*p 2 (y 2 ) This is easy to prove…
Introduction Now comes the question: How to measure independence?
Canonical Correlation Canonical Correlation Analysis (CCA) is concerned with finding a pair of linear transformations such that one component within each set of transformed variables is correlated with a single component in the other set. We focus on the first canonical correlation which is defined as the maximum possible correlation between the two projections and of x 1 and x 2 : C is the covariance matrix of (x 1, x 2 )
Canonical Correlation Taking derivatives with respect to and, we obtain:
Canonical Correlation
So, it can be extended to more than two sets of variables: (find smallest eigenvalue)
Kernel Canonical Correlation Kernel trick: defining a map from X to a feature space F, such that we can find a kernel satisfying:
Kernel Canonical Correlation F-correlation -- canonical correlation between Φ 1 (x 1 ) and Φ 2 (x 2 )
Kernel Canonical Correlation Notes: X1 and x2 are independent implies value of is 0. Is the converse true? -- If F is ‘large’, it’s true. -- If F is the space corresponding to a Gaussian Kernel which is positive definite kernel on X = R
Kernel Canonical Correlation Estimation of the F-correlation -- kernelized version of canonical correlation We will show that depends only on Gram matrices K1 and K2 of these observations, we will use to denote this canonical correlation. Suppose the data are centered in feature space. (i.e. )
Kernel Canonical Correlation We want to know: Which means we want to know three things:
Kernel Canonical Correlation For fixed f1 and f2, the empirical covariance of the projections in feature can be written:
Kernel Canonical Correlation Similarly, we can get the following:
Kernel Canonical Correlation Put three expressions together, we get: Similar with the problem we talked before, this is equivalent to the following generalized eigenvalue problem:
Kernel Canonical Correlation Problem: suppose that the Gram matrices K1 and K2 have full rank, canonical correlation will always be 1, whatever K1 and K2 are. Let V 1 and V 2 denote the subspaces of R N generated by the columns of K1 and K2, then we can rewrite: If K1 and K2 have full rank, V 1 and V 2 would be equal to R N
Kernel Canonical Correlation Solution: regularization by penalizing the norm of f1 and f2, so we get the regularized F-correlation as following: where k is a small positive constant. We expand:
Kernel Canonical Correlation Now we can get regularized KCC:
Kernel Canonical Correlation Generalizing to more than two sets of variables, it’s equivalent to the generalized eigenvalue problem:
Example Application Applications: -- ICA (Independent Component Analysis) -- Feature Selection See the demo for application in ICA…
Thank you!!! Questions?