Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Slides:



Advertisements
Similar presentations
Coherent Laplacian 3D protrusion segmentation Oxford Brookes Vision Group Queen Mary, University of London, 11/12/2009 Fabio Cuzzolin.
Advertisements

Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Manifold Learning Dimensionality Reduction. Outline Introduction Dim. Reduction Manifold Isomap Overall procedure Approximating geodesic dist. Dijkstra’s.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
Clustering and Dimensionality Reduction Brendan and Yifang April
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
“Random Projections on Smooth Manifolds” -A short summary
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Dimensional reduction, PCA
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.
Manifold Learning: ISOMAP Alan O'Connor April 29, 2008.
1 NONLINEAR MAPPING: APPROACHES BASED ON OPTIMIZING AN INDEX OF CONTINUITY AND APPLYING CLASSICAL METRIC MDS TO REVISED DISTANCES By Ulas Akkucuk & J.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Lecture 09 Clustering-based Learning
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On multidimensional scaling and the embedding of self-organizing.
Non Negative Matrix Factorization
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Multidimensional Scaling Vuokko Vuori Based on: Data Exploration Using Self-Organizing Maps, Samuel Kaski, Ph.D. Thesis, 1997 Multivariate Statistical.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Dimensionality Reduction
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
ViSOM - A Novel Method for Multivariate Data Projection and Structure Visualization Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Hujun Yin.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Spectral Methods for Dimensionality
Nonlinear Dimensionality Reduction
Data Mining, Neural Network and Genetic Programming
INTRODUCTION TO Machine Learning 3rd Edition
Unsupervised Riemannian Clustering of Probability Density Functions
Dimensionality Reduction
Clustering (3) Center-based algorithms Fuzzy k-means
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Dimensionality Reduction
Lecture 22 Clustering (3).
Principal Component Analysis
Nonlinear Dimension Reduction:
NonLinear Dimensionality Reduction or Unfolding Manifolds
Presentation transcript:

Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen, Springer, 2007

Agenda What is dimensionality reduction? Linear methods – Principal components analysis – Metric multidimensional scaling (MDS) Non-linear methods – Distance preserving – Topology preserving – Auto-encoders (Deep neural networks)

Dimensionality Reduction Mapping d-dimensional data points y to p-dimensional vectors x; p < d. Purposes – Visualization – Classification/regression Most of the times we are only interested in the forward mapping y to x. The backward mapping is difficult in general. If the forward and the backward mappings are linear they method is called linear, else it is called non-linear dimensionality reduction technique.

Two Benchmark Manifolds

Distance Preserving Methods Let’s say the points y i are mapped to x i, i=1,2,…,N. Distance preserving methods try to preserve pair wise distances, i.e., d(y i, y j ) = d(x i, x j ), or the pair wise dot products, =. What is a distance? Nondegeneracy: d(a, b) = 0 if and only if a = b Triangular inequality: for any three points a, b, and c, d(a, b)  d(c, a) + d(c, b) Other two properties, nonnegativity and symmetry follows from these two

Metric MDS A multidimensional scaling (MDS) method is a linear generative model like PCA: y’s are d-dimensional observed variable and x’s are p-dimensional latent variable W is a matrix with the property: So, dot product is preserved. How about Euclidean distances? Let Then So, Euclidean distances are preserved too!

Metric MDS Algorithm Center data matrix Y; and compute dot product matrix S = Y T Y If data matrix is not available, only distance matrix D is available, do double centering to form scalar matrix: Compute eigenvalue decomposition S = U  U T Construct p-dimensional representation as: Metric MDS is actually PCA and is a linear method

Metric MDS Result

Sammon’s Nonlinear Mapping (NLM) NLM minimizes the energy function: Start with initial x’s Update x’s by x k,i is the k th component of vector x i (quasi-Newton update)

Sammon’s NLM

A Basic Issue with Metric Distance Preserving Methods Geodesic distances seem to be better suited

Graph Distance: Approximation to Geodesic Distance

ISOMAP ISOMAP = MDS with graph distance Needs to decide how the graph is constructed: who is the neighbor of whom K closest rule or  -distance rule can build a graph

KPCA Closely related to MDS algorithm KPCA using Gaussian kernel

Topology Preserving Techniques Topology  Neighborhood relationship Topology preservation means two neighboring points in d-dimensions should map to two neighboring points in p-dimension Distance preservation is too often too rigid; topology preservation techniques can sometimes stretch or shrink point clouds More flexible; algorithmically more complex

TP Techniques Can be categorized broadly into – Methods with predefined topology SOM (Kohonen’s self-organizing map) – Data driven lattice LLE (locally linear embedding) Isotop…

Kohonen’s Self-Organizing Maps (SOM) Step 1: Define a 2D lattice indexed by (l, k): l, k =1,…K. Step 2: For a set of data vectors y i, i=1,2,…,N, find a set of prototypes m(l, k). Note that by this indexing (l, k), the prototypes are mapped to the 2D lattice. Step 3: Iterate for each data y i : 1.Find the closest prototype m (using Euclidean distance in the d-dimensional space): 2.Update prototypes: (prepared from [HTF] book)

Neighborhood Function for SOM A hard threshold function: Or, a soft threshold function:

Example: Simulated data

SOM for “Swiss Roll” and “Open Box”

Remarks SOM is actually a constrained k-means – Constrains K-means clusters on a smooth manifold – If only one neighbor (itself) is allowed => K-means Learning rate (  ) and distance threshold ( ) usually decrease with training iterations Mostly useful for a visualization tool: typically it cannot map to more than 3 dimensions Convergence is hard to assess

Locally Linear Embedding Data driven lattice, unlike SOM on predefined lattice Topology preserving: it is based on conformal mapping, which is a transformation that preserves angles; LLE is invariant to rotation, translation and scaling To some extent similar to preserving dot-product A data point y i is assumed to be a linear combination of its neighbors

LLE Principle Each data point y is a local linear combination: Neighborhood of y i : determined by a graph Constraints on w ij : LLE first computes the matrix W by minimizing E. Then it assumes that in the low dimensions the same local linear combination holds: So, it minimizes F with respect to x’s: obtains low dimensional mapping!

LLE Results Let’s visit: