Using Manifold Structure for Partially Labeled Classification by Belkin and Niyogi, NIPS 2002 Presented by Chunping Wang Machine Learning Group, Duke University November 16, 2007
Outline Motivations Algorithm Description Theoretical Interpretation Experimental Results Comments
Motivations (1) Why manifold structure is useful? Data lies on a lower-dimensional manifold – a dimension reduction is preferable an example: a handwritten digit 0 Usually, dimensionality is the number of pixels, typically very high (256)
Motivations (1) Why manifold structure is useful? Data lies on a lower-dimensional manifold – a dimension reduction is preferable an example: a handwritten digit 0 Usually, dimensionality is the number of pixels, typically very high (256) d1 Ideally, 5-dimensional features f1 * d2 f2 *
Motivations (1) Why manifold structure is useful? Data lies on a lower-dimensional manifold – a dimension reduction is preferable an example: a handwritten digit 0 Actually, a higher dimensionality, but perhaps no more than several dozens Usually, dimensionality is the number of pixels, typically far higher (256) d1 Ideally, 5-dimensional features f1 * d2 f2 *
Motivations (2) Why manifold structure is useful? Data representation in the original space is unsatisfactory labeled unlabeled In the original space 2-d representation with Laplacian Eigenmaps
Algorithm Description (1) Semi-supervised classification k points First s are labeled (s<k) for binary cases Constructing the Adjacency Graph if i is among n nearest neighbors of j or j is among n nearest neighbors of i Eigenfunctions compute , corresponding to the p smallest eigenvelues for the graph Laplacian L = D-W,
Algorithm Description (2) Semi-supervised classification k points First s are labeled (s<k) for binary cases Building the classifier minimize the error function over the space of coefficients a the solution is Classifying unlabeled points (i >s)
Theoretical Interpretation (1) For a manifold , the eigenfunctions of its Laplacian form a basis for the Hilbert space , i.e., any function can be written as with eigenfunctions satisfying The simplest nontrivial example: the manifold is a unit circle S1 Fourier series
Theoretical Interpretation (2) Smoothness measure S: a small S means “smooth” For unit circle S1 Generally Smaller eigenvalues correspond to smoother eigenfunctions (lower frequency) is a constant function In terms of the smoothest p eigenfunctions, the approximation of an arbitrary function
Theoretical Interpretation (3) Back to our problem with finite number of points The solution of a discrete version For binary classification, the alphabet of the function f only contains two possible values. For M-ary cases, the only difference is the number of possible values is more than two.
Results (1) Handwritten Digit Recognition (MNIST data set) 60,000 28-by-28 gray images (the first 100 principal components are used) p=20% k
Results (2) Text Classification (20 Newsgroups data set) 19,935 vectors with dimensionality of 6000 p=20% k
Comments This semi-supervised algorithm essentially converts the original problem to a linear regression problem in a new space with lower dimensionality. The approach to solve this linear regression problem is the standard least square estimation. Only n nearest neighbors are considered for each data point, thus the computation for eigen-decomposition is reduced. Little additional computation is expended after dimensionality reduction. More comments ……