Manifold Learning Techniques: So which is the best?

Slides:

Advertisements

Similar presentations

Coherent Laplacian 3D protrusion segmentation Oxford Brookes Vision Group Queen Mary, University of London, 11/12/2009 Fabio Cuzzolin.

Advertisements

Filling Algorithms Pixelwise MRFsChaos Mosaics Patch segments are pasted, overlapping, across the image. Then either: Ambiguities are removed by smoothing.

Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics

Scale & Affine Invariant Interest Point Detectors Mikolajczyk & Schmid presented by Dustin Lennon.

Non-linear Dimensionality Reduction by Locally Linear Inlaying Presented by Peng Zhang Tianjin University, China Yuexian Hou, Peng Zhang, Xiaowei Zhang,

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Surface Reconstruction From Unorganized Point Sets

Surface normals and principal component analysis (PCA)

Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.

Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009

Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Clustering and Dimensionality Reduction Brendan and Yifang April

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.

Ronald R. Coifman , Stéphane Lafon, 2006

“Random Projections on Smooth Manifolds” -A short summary

Mesh Simplification Global and Local Methods:

Locally Constraint Support Vector Clustering

Dimensionality Reduction and Embeddings

Dimensional reduction, PCA

Clustering Color/Intensity

Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.

reconstruction process, RANSAC, primitive shapes, alpha-shapes

CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry Some material taken from:  David Lowe, UBC  Jiri Matas, CMP Prague

Manifold learning and pattern matching with entropic graphs Alfred O. Hero Dept. EECS, Dept Biomed. Eng., Dept. Statistics University of Michigan - Ann.

A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.

Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.

Dimensionality Reduction

1 Numerical geometry of non-rigid shapes Non-Euclidean Embedding Non-Euclidean Embedding Lecture 6 © Alexander & Michael Bronstein tosca.cs.technion.ac.il/book.

NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.

Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.

Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)

Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.

Diffusion Maps and Spectral Clustering

Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.

Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.

Dimensionality reduction: Some Assumptions High-dimensional data often lies on or near a much lower dimensional, curved manifold. A good way to represent.

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.

Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.

GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.

Manifold learning: MDS and Isomap

1 LING 696B: MDS and non-linear methods of dimension reduction.

Nonlinear Dimensionality Reduction Approach (ISOMAP)

Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Data Mining Course 0 Manifold learning Xin Yang. Data Mining Course 1 Outline Manifold and Manifold Learning Classical Dimensionality Reduction Semi-Supervised.

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.

Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.

CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.

CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.

Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.

Spectral Methods for Dimensionality

Nonlinear Dimensionality Reduction

We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological.

Manifold Learning Student: Ali Taalimi 05/05/2012.

Unsupervised Riemannian Clustering of Probability Density Functions

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Dipartimento di Ingegneria «Enzo Ferrari»,

ISOMAP TRACKING WITH PARTICLE FILTERING

Jianping Fan Dept of CS UNC-Charlotte

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE

Domain-Modeling Techniques

Object Modeling with Layers

NonLinear Dimensionality Reduction or Unfolding Manifolds

Presentation transcript:

Manifold Learning Techniques: So which is the best? Todd Wittman Math 8600: Geometric Data Analysis Instructor: Gilad Lerman Spring 2005

Manifold Learning for Dimensionality Reduction Goal: We want to map a D-dimensional data set X to a d-dimensional data set Y, preserving the local geometries on the original manifold as much as possible. Methods we’ve learned MDS PCA ISOMAP LLE Hessian LLE Laplacian Eigenmap Diffusion Maps KNN Diffusion (Mauro) So which is the best? First let’s make some predictions...

PCA Summary Predictions Just SVD of covariance matrix XX’. Take largest d eigenvectors. Predictions Doesn’t really infer the geometry of the manifold. Assumes the data Y represents the projection onto a d-dimensional space. But it’s extremely fast, so it might be good to give it first crack at your data.

MDS Summary Predictions Iterates to answer by learning the correct eigen decomposition. Predictions Like PCA, does not seem to have any built-in mechanism for determining the geometry of the manifold. Also much slower than PCA. Requires parameters describing learning rate and number of iterations.

General Framework ISOMAP, LLE, Hessian, Laplacian, & KNN Diffusion are all based on KNN graphs. The graph-based algorithms have 3 basic steps. 1. Find K nearest neighbors. 2. Estimate local properties of manifold by looking at neighborhoods found in Step 1. 3. Find a global embedding that preserves the properties found in Step 2.

ISOMAP (Tennenbaum, DeSilva, & Langford) Summary Build graph from K Nearest Neighbors. Run MDS. Predictions Since MDS is slow, ISOMAP will be very slow. Need estimate of K. Assumes data set is convex (no holes). ISOMAP removes outliers in pre-processing. So is it extra-sensitive to noise?

LLE (Saul & Roweis) Summary (see Effie’s talk) Predictions Build graph from K Nearest Neighbors. Determine reconstruction weights by assuming neighborhoods are locally linear and insuring invariance. Determine embedding. Predictions If it’s locally linear, can it handle curvature and corners? What about noise?

Laplacian Eigenmap (Belkin & Nyogi) Summary Build graph from K Nearest Neighbors. Construct weighted adjacency matrix with Gaussian kernel. Compute embedding from normalized Laplacian. Predictions Assumes each point lies in the convex hull of its neighbors. So it might have trouble at the boundary. Will have difficulty with non-uniform sampling.

Hessian LLE (Donoho & Grimes) Summary Build graph from K Nearest Neighbors. Estimate tangent Hessians. Compute embedding based on Hessians. Predictions Specifically set up to handle non-convexity. Slower than LLE & Laplacian. Will perform poorly in sparse regions. Only method with convergence guarantees.

Diffusion Map (Coifman & Lafon) Summary Find Gaussian kernel Normalize kernel Apply weighted graph Laplacian. Compute SVD of A Predictions Doesn’t seem to infer geometry directly. Need to set parameters alpha and sigma.

KNN Diffusion (Mauro) Summary Predictions Build graph from K nearest neighbors. Run Diffusion Map on graph. Predictions Should infer geometry better than Diffusion Map. Now we have to set the parameters alpha, sigma, and K.

How do we compare the methods? Speed Manifold Geometry Non-convexity Curvature Corners High-Dimensional Data: Can the method process image manifolds? Sensitivity to Parameters K Nearest Neighbors: Isomap, LLE, Hessian, Laplacian, KNN Diffusion Sigma: Diffusion Map, KNN Diffusion Noise Non-uniform Sampling Sparse Data Clustering

Testing Examples I’ve generated a set of examples to test the methods. Many of these examples were taken from the original papers. Swiss Roll Swiss Hole Punctured Sphere Corner Planes 3D Clusters Except for the last one, we map 3D to 2D data sets so we can visualize the results. We’ll compare the speed and sensitivity to parameters throughout. Twin Peaks Toroidal Helix Gaussian Occluded Disks

Manifold Geometry First, let’s try to unroll the Swiss Roll. We should see a plane.

Hessian LLE is pretty slow, MDS is very slow, and ISOMAP is extremely slow. MDS and PCA don’t can’t unroll Swiss Roll, use no manifold information. LLE and Laplacian can’t handle this data. Diffusion Maps could not unroll Swiss Roll for any value of Sigma.

Non-Convexity Can we handle a data set with a hole? Swiss Hole: Can we still unroll the Swiss Roll when it has a hole in the middle?

Only Hessian LLE can handle non-convexity. ISOMAP, LLE, and Laplacian find the hole but the set is distorted.

Manifold Geometry Twin Peaks: fold up the corners of a plane. LLE will have trouble because it introduces curvature to plane.

PCA, LLE, and Hessian LLE distort the mapping the most.

Curvature & Non-uniform Sampling Gaussian: We can randomly sample a Gaussian distribution. We increase the curvature by decreasing the standard deviation. Coloring on the z-axis, we should map to concentric circles.

For std = 1 (low curvature), MDS and PCA can project accurately. Laplacian Eigenmap cannot handle the change in sampling.

For std = 0.4 (higher curvature), PCA projects from the side rather than top-down. Laplacian looks even worse.

For std = 0.3 (high curvature), none of the methods can project correctly.

Corners Corner Planes: We bend a plane with a lift angle A. We want to bend it back down to a plane. If A > 90, we might see the data points written on top of each other. A

For angle A=75, we see some disortions in PCA and Laplacian.

For A = 135, MDS, PCA, and Hessian LLE overwrite the data points. Diffusion Maps work very well for Sigma < 1. LLE handles corners surprisingly well.

Clustering A good mapping should preserve clusters in the original data set. 3D Clusters: Generate M non-overlapping clusters with random centers. Connect the clusters with a line.

For M = 3 clusters, MDS and PCA can project correctly. Diffusion Maps work well with large Sigma. LLE compresses each cluster into a single point. Hessian LLE has trouble with the sparse connecting lines.

For M=8 clusters, MDS and PCA can still recover. Diffusion Maps do quite well. LLE and ISOMAP are decent, but Hessian and Laplacian fail.

Noise & Non-uniform Sampling Can the method handle changes from dense to sparse regions? Toroidal Helix should be unraveled into a circle parametrized by t. We can change the sampling rate along the helix by changing the exponent R on the parameter t and we can add some noise.

With no noise added, ISOMAP, LLE, Laplacian, and Diffusion Map are correct. MDS and PCA project to an asterisk. What’s up with Hessian and KNN Diffusion?

Adde noise to the Helix sampling. LLE cannot recover the circle. ISOMAP emphasizes outliers more than the other methods.

When the sampling rate is changed along the torus, Laplacian starts to mess up and Hessian is completely thrown off. Hessian LLE code crashed frequently on this example. Diffusion maps handle it quite well for carefully chosen Sigma=0.3.

Sparse Data & Non-uniform Sampling Of course, we want as much data as possible. But can the method handle sparse regions in the data? Punctured Sphere: the sampling is very sparse at the bottom and dense at the top.

Only LLE and Laplacian get decent results. PCA projects the sphere from the side. MDS turns it inside-out. Hessian and Diffusion Maps get correct shape, but give too much emphasis to the sparse region at the bottom of the sphere.

High-Dimensional Data All of the examples so far have been 3D. But can the data handle high-dimensional data sets, like images? Disks: Create 20x20 images with a disk of fixed radius and random center. We should recover the centers of the circles.

??? LLE Crashed LLE crashed on high-dimensional data set. Number of images was not high enough, but ISOMAP did a very good job.

Occluded Disks We can add a second disk of radius R in the center of every image. As Joe Kenney pointed out, this creates a non-convex data set by occlusion.

??? LLE Crashed Hessian Crashed Both LLE and Hessian crashed, possibly # points is too small. Laplacian failed completely. Is ISOMAP the best for high-dimensional data?

Sensitivity to Parameters When the number of points is small or the data geometry is complex, it is important to set K appropriately, neither too big nor small. But if the data set is dense enough, we expect K around 8 or 10 to suffice. For Diffusion Maps, the method is very sensitive to the Sigma in Gaussian kernel. Varies from example to example.

Diffusion Map Sigma depends on manifold. Helix X Sigma = 10 Sigma = 0.2 Clusters

So what have you learned, Dorothy? MDS PCA ISOMAP LLE Hessian Laplacian Diffusion Map KNN Diffusion Speed Very slow Extremely fast Extremely slow Fast Slow Infers geometry? NO YES MAYBE Handles non-convex? Handles non-uniform sampling? Handles curvature? Handles corners? Clusters? Handles noise? Handles sparsity? may crash Sensitive to parameters? VERY

So which is the best? Depends... ISOMAP is just too slow, so don’t use it unless the data is high-dimensional or you’ve got a whole lot of time. It’s too hard to set the parameters for Diffusion Maps. Is the data low-dimensional and its projection is d-dimensional? YES NO Is the data non-convex and densely sampled? PCA YES NO Is the data noisy or uniformly sampled? Hessian YES NO Laplacian LLE

MANI MANIfold learning demonstration GUI Contains the methods and examples I’ve shown you. Now for a demo!

Some Notes on using MANI Hard to set K and Sigma just right. MDS and ISOMAP are very slow. Hessian LLE is pretty slow. Since Hessian needs a dense data set, this means it takes even longer when the # points is large. Occluded Disks is 400-dimensional data, which takes a long time and a lot of data points to correctly map. To run KNN Diffusion, requires unzipping Mauro’s EigenFunctions Package into the same directory as mani.m Matlab GUIs seem to run better on PC than Linux.

That’s All Folks! The mani.m file will be e-mailed to you. If you try it out, please let me know what you think. Drop me an e-mail at: wittman@math.umn.edu Thanks for listening!