Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap Algorithm.

Similar presentations


Presentation on theme: "University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap Algorithm."— Presentation transcript:

1 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap Algorithm http://isomap.stanford.edu/ Yuri Barseghyan Yasser Essiarab

2 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Linear Methods for Dimensionality Reduction –PCA (Principal Component Analysis): rotate data so that principal axes lie in direction of maximum variance –MDS (Multi-Dimensional Scaling): find coordinates that best preserve pairwise distances

3 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Limitations of Linear methods What if the data does not lie within a linear subspace? Do all convex combinations of the measurements generate plausible data? Low-dimensional non-linear Manifold embedded in a higher dimensional space http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction1.pdf

4 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Non-linear Dimensionality Reduction What about data that cannot be described by linear combination of latent variables? –Ex: swiss roll, s-curve In the end, linear methods do nothing more than “globally transform” (rotate/translate/scale) data. Sometimes need to “unwrap” the data first PCA http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

5 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Non-linear Dimensionality Reduction Unwrapping the data = “manifold learning” Assume data can be embedded on a lower-dimensional manifold Given data set X = {x i } i=1…n, find representation Y = {y i } i=1…n where Y lies on lower-dimensional manifold Instead of preserving global pairwise distances, non-linear dimensionality reduction tries to preserve only the geometric properties of local neighborhoods

6 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isometry From Mathworld: two Riemannian manifolds M and N are isometric if there is a diffeomorphism such that the Riemannian metric from one pulls back to the metric on the other. For a complete Riemannian manifold: d(x, y) = geodesic distance between x and y Informally, an isometry is a smooth invertible mapping that looks locally like a rotation plus translation Intuitively, for 2-dimensional case, isometries include whatever physical transformations one can perform on a sheet of paper without introducing tears, holes, or self-intersections

7 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Trustworthiness [2] The trustworthiness quanties how trustworthy is a projection of a high-dimensional data set onto a low-dimensional space. Specically a projection is trustworthy if the set of the t nearest neighbors of each data point in the lowdimensional space are also close-by in the original space. r(i, j) is the rank of the data point j in the ordering according to the distance from i in the original data space U t (i) denotes the set of those data points that are among the t- nearest neighbors of the data point i in the low-dimensional space but not in the original space. The maximal value that trustworthiness can take is equal to one. The closer M(t) is to one, the better the low-dimensional space describes the originaldata.

8 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Several methods to learn a manifold Two to start: –Isomap [Tenenbaum 2000] –Locally Linear Embeddings (LLE) [Roweis and Saul, 2000] Recently: –Semidefinite Embeddings (SDE) [Weinberger and Saul, 2005]

9 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi An important observation Small patches on a non-linear manifold look linear These locally linear neighborhoods can be defined in two ways –k-nearest neighbors: find the k nearest points to a given point, under some metric. Guarantees all items are similarly represented, limits dimension to K-1 –ε-ball: find all points that lie within ε of a given point, under some metric. Best if density of items is high and every point has a sufficient number of neighbors http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction1.pdf

10 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap Find coordinates on lower-dimensional manifold that preserve geodesic distances instead of Euclidean distances Key Observation: If goal is to discover underlying manifold, geodesic distance makes more sense than Euclidean Small Euclidean distance Large geodesic distance http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction1.pdf

11 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Calculating geodesic distance We know how to calculate Euclidean distance Locally linear neighborhoods mean that we can approximate geodesic distance within a neighborhood using Euclidean distance A graph is constructed by connecting each point to its K nearest neighbours. Approximate geodesic distances are calculated by finding the length of the shortest path in the graph between points Use Dijkstra’s algorithm to fill in remaining distances http://www.maths.lth.se/bioinformatics/calendar/20040527/NilssonJ_KI_27maj04.pdf

12 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Dijkstra’s Algorithm Greedy breadth-first algorithm to compute shortest path from one point to all other points http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

13 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap Algorithm –Compute fully-connected neighborhood of points for each item Can be k nearest neighbors or ε-ball –Calculate pairwise Euclidean distances within each neighborhood –Use Dijkstra’s Algorithm to compute shortest path from each point to non-neighboring points –Run MDS on resulting distance matrix http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

14 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap Algorithm [3]

15 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Time Complexity of Algorithm http://www.cs.rutgers.edu/~elgammal/classes/cs536/lectures/NLDR.pdf

16 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap Results Find a 2D embedding of the 3D S-curve http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

17 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Residual Fitting Error Plotting eigenvalues from MDS will tell you dimensionality of your data http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

18 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Neighborhood Graph http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

19 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi More Isomap Results http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

20 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Results on projecting the face dataset to two dimensions (Trustworthiness−Continuity) [1]

21 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi More Isomap Results http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

22 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap Failures Isomap has problems on closed manifolds of arbitrary topology http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

23 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap: Advantages Nonlinear Globally optimal –Still produces globally optimal low-dimensional Euclidean representation even though input space is highly folded, twisted, or curved. Guarantee asymptotically to recover the true dimensionality.

24 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap: Disadvantages Guaranteed asymptotically to recover geometric structure of nonlinear manifolds –As N increases, pairwise distances provide better approximations to geodesics by “hugging surface” more closely –Graph discreteness overestimates dM(i,j) K must be high to avoid “linear shortcuts” near regions of high surface curvature Mapping novel test images to manifold space

25 University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Literature [1] Jarkko Venna and Samuel Kaski, Nonlinear dimensionality reduction viewed as information retrieval, NIPS' 2006 workshop on Novel Applications of Dimensionality Reduction, 9 Dec 2006 http://www.cis.hut.fi/projects/mi/papers/nips06_nldrws_poster.pdf [2] Claudio Varini, Visual Exploration of Multivariate Data in Breast Cancer by Dimensional Reduction, March 2006 http://deposit.ddb.de/cgi- bin/dokserv?idn=98073472x&dok_var=d1&dok_ext=pdf&filena me=98073472x.pdf [3] YimingWu, Kap Luk Chan, An Extended Isomap Algorithm for Learning Multi-Class Manifold, Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference, Aug. 2004 http://ww2.cs.fsu.edu/~ywu/PDF-files/ICMLC2004.pdf


Download ppt "University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 www.cs.joensuu.fi Isomap Algorithm."

Similar presentations


Ads by Google