Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Jan Kamenický

 Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization

 WhaT maniFold? ◦ Low dimensional embedding of high dimensional data lying on a smooth nonlinear manifold  Linear methods fail ◦ i.e. PCA

 Unsupervised methods ◦ Without any a priori knowledge  ISOMAPs ◦ Isometric mapping  LLE ◦ Locally linear embedding

 Core idea ◦ Use geodesic distances on the manifold instead of Euclidean  Classical MDS ◦ Maps data to the lower dimensional space

 Select neighbours ◦ K-nearest neighbours ◦ ε-distance neighbourhood  Create weighted neighbourhood graph ◦ Weights = Euclidean distances  Estimate the geodesic distances as shortest paths in the weighted graph ◦ Dijkstra’s algorithm

 1) Set distances (0 for initial, ∞ for all other nodes), set all nodes as unvisited  2) Select unvisited node with smallest distance as active  3) Update all unvisited neighbours of the active node (if the computed distance is smaller)  4) Mark active node as visited (it has now minimal distance), repeat from 2) as necessary

 Time complexity ◦ O(|E|dec+|V|min)  Implementation ◦ Sparse edges ◦ Fibonacci heap as a priority queue ◦ O(|E|+|V|log|V|)  Geodesic distances in ISOMAP ◦ O(N 2 logN)

 Input ◦ Dissimilarities (distances)  Output ◦ Data in a low-dimensional embedding, with distances corresponding to the dissimilarities  Many types of MDS ◦ Classical ◦ Metric / non-metric (number of dissimilarity matrices, symmetry, etc.)

 Quantitative similarity  Euclidean distances (output)  One distance matrix (symmetric)  Minimizing the stress function

 We can optimize directly ◦ Compute double-centered distance matrix ◦ Note: ◦ Perform SVD of B ◦ Compute final data

 Covariance matrix  Projection of centered X onto eigenvectors of NS (result of the PCA of X)

 How many dimensions to use? ◦ Residual variance  Short-circuiting ◦ Too large neigbourhood (not enough data) ◦ Non-isometric mapping ◦ Totally destroys the final embedding

 Conformal ISOMAP ◦ Modified weights in geodesic distance estimate: ◦ Magnifies regions with high density ◦ Shrinks regions with low density

 Landmark ISOMAP ◦ Use only geodesic distances from several landmark points (on the manifold) ◦ Use Landmark-MDS for finding the embedding  Involves triangulation of non-landmark data ◦ Significantly faster, but higher chance for “short- circuiting”, number of landmarks has to be chosen carefully

 Kernel ISOMAP ◦ Ensures that the B (double-centered distance matrix) is positive semidefinite by constant-shifting method

 Core idea ◦ Estimate each point as a linear combination of it’s neighbours – find best such weights ◦ Same linear representation will hold in the low dimensional space

 Find weights W ij by constrained minimization  Neighbourhood preserving mapping

 Low dimensional representation Y  We take eigenvectors of M corresponding to its q+1 smallest eigenvalues  Actually, different algebra is used to improve numeric stability and speed

 ISOMAP ◦ Preserves global geometric properties (geodesic distances), especially for faraway points  LLE ◦ Preserves local neighbourhood correspondence only ◦ Overcomes non-isometric mapping ◦ Manifold is not explicitly required ◦ Difficult to estimate q (number of dimensions)

The end

Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Similar presentations

Presentation on theme: "Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Similar presentations

Presentation on theme: "Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization."— Presentation transcript:

Similar presentations

About project

Feedback