Properties of Kernels Presenter: Hongliang Fei Date: June 11, 2009
Overview Inner product and Hilbert space Characteristics of kernels The kernel Matrix Kernel construction
Hilbert spaces Linear function: Given a vector space X over the reals, a function f: X->R is linear if f(ax)=af(x) and f(x+z) = f(x)+f(z) for all x,z \in X and a \in R. Inner product space: A vector space X over the reals R is an inner product space if there exists a real-valued symmetric bilinear (linear in each argument) map (.,.), that satisfies
Hilbert spaces A Hilbert Space F is an inner product space with the additional properties that it is separable and complete. Completeness refers to the property that every Cauchy sequence {h n } n≥1 of elements of F converges to an element h ∈ F. A space F is separable if and only if it admits a countable orthonormal basis.
Cauchy – Schwarz inequality In an inner product space, and the equality sign holds in a strict inner product space if and only if x and z are rescalings of the same vector.
Gram matrix
Positive semi-definite matrices A symmetric matrix is positive semidefinite, iff its eigenvalues are all non-negative. for all v, A symmetric matrix is positive semidefinite, iff its eigenvalues are all postive Gram and kernel matrices are positive semi- definite.
Finitely positive semi-definite functions A function satisfies the finitely positive semi-definite property if it is a symmetric function for which the matrices formed by restriction to any finite subset of the space X are positive semi- definite.
Mercier Kernel Theorem A function which is either continuous or has a finite domain, can be decomposed into a feature map φ into a Hilbert space F applied to both its arguments followed by the evaluation of the inner product in F if and only if it satisfies the finitely positive semi-definite property.
The kernel matrix Implementation issues Kernels and prior knowledge Kernel Selection Kernel Alignment
Kernel Selection Ideally select the optimal kernel based on our prior knowledge of the problem domain. Actually, consider a family of kernels defined in a way that again reflects our prior expectations. Simple way: require only limited amount of additional information from the training data. Elaborate way: Combine label information
Kernel Alignment Measure similarity between two kernels The alignment A(K1,K2) between two kernel matrices K1 and K2 is given by
Kernel Construction
Operations on Kernel matrices Simple transformation Centering data Subspace projection: chapter 6 Whitening: Set all eigenvalues to 1 (spherically symmetric)
That ’ s all. Any questions?