Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.

Slides:



Advertisements
Similar presentations
Partitional Algorithms to Detect Complex Clusters
Advertisements

Component Analysis (Review)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Clustering and Dimensionality Reduction Brendan and Yifang April
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Ronald R. Coifman , Stéphane Lafon, 2006
6/26/2006CGI'06, Hangzhou China1 Sub-sampling for Efficient Spectral Mesh Processing Rong Liu, Varun Jain and Hao Zhang GrUVi lab, Simon Fraser University,
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Determinants Bases, linear Indep., etc Gram-Schmidt Eigenvalue and Eigenvectors Misc
Lecture 21: Spectral Clustering
Pattern Recognition and Machine Learning
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
Support Vector Machines and Kernel Methods
Principal Component Analysis
Stochastic Differentiation Lecture 3 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Continuous Latent Variables --Bishop
Boot Camp in Linear Algebra Joel Barajas Karla L Caballero University of California Silicon Valley Center October 8th, 2008.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Diffusion Maps and Spectral Clustering
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Vector Norms and the related Matrix Norms. Properties of a Vector Norm: Euclidean Vector Norm: Riemannian metric:
Support Vector Machines and Kernel Methods Machine Learning March 25, 2010.
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Discriminant Analysis
5.1 Eigenvectors and Eigenvalues 5. Eigenvalues and Eigenvectors.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Affine Registration in R m 5. The matching function allows to define tentative correspondences and a RANSAC-like algorithm can be used to estimate the.
Ranking Projection Zhi-Sheng Chen 2010/02/03 1/30 Multi-Media Information Lab, NTHU.
Irena Váňová. B A1A1. A2A2. A3A3. repeat until no sample is misclassified … labels of classes Perceptron algorithm for i=1...N if then end * * * * *
Feature Extraction 主講人:虞台文.
Geometric diffusions as a tool for harmonic analysis and structure definition of data By R. R. Coifman et al. The second-round discussion* on * The first-round.
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Spectral Methods for Dimensionality
Dimensionality Reduction
LECTURE 10: DISCRIMINANT ANALYSIS
Unsupervised Riemannian Clustering of Probability Density Functions
Generating multidimensional embeddings based on fuzzy memberships
Principal Components Analysis
Spectral Methods Tutorial 6 1 © Maks Ovsjanikov
Principal Component Analysis
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,
Introduction PCA (Principal Component Analysis) Characteristics:
Symmetric Matrices and Quadratic Forms
LECTURE 09: DISCRIMINANT ANALYSIS
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
NULL SPACES, COLUMN SPACES, AND LINEAR TRANSFORMATIONS
Eigenvectors and Eigenvalues
Presentation transcript:

Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona State University)

Motivation Linear projections will not detect the pattern.

Nonlinear PCA using Kernels Traditional PCA applies linear transformation  May not be effective for nonlinear data Solution: apply nonlinear transformation to potentially very high- dimensional space. Computational efficiency: apply the kernel trick.  Require PCA can be rewritten in terms of dot product. More on kernels later

Nonlinear PCA using Kernels Rewrite PCA in terms of dot product The covariance matrix S can be written as Let v be The eigenvector of S corresponding to nonzero eigenvalue Eigenvectors of S lie in the space spanned by all data points.

Nonlinear PCA using Kernels The covariance matrix can be written in matrix form: Any benefits?

Nonlinear PCA using Kernels Next consider the feature space: The (i,j)-th entry of is Apply the kernel trick: K is called the kernel matrix.

Nonlinear PCA using Kernels Projection of a test point x onto v: Explicit mapping is not required here.

Diffusion distance and Diffusion map A symmetric matrix M s can be derived from M as M and M s has same N eigenvalues, Under random walk representation of the graph M   : left eigenvector of M  : right eigenvector of M  : time step 8/14

If one starts random walk from location x i, the probability of landing in location y after r time steps is given by For large , all points in the graph are connected (M i,j >0) and the eigenvalues of M Diffusion distance and Diffusion map  has the dual representation (time step and kernel width). where e i is a row vector with all zeros except that i th position = 1.

Diffusion distance and Diffusion map One can show that regardless of starting point x i Left eigenvector of M with eigenvalue 0 =1 with Eigenvector  0 ( x ) has the dual representation : 1. Stationary probability distribution on the curve, i.e., the probability of landing at location x after taking infinite steps of random walk (independent of the start location). 2. It is the density estimate at location x.

Diffusion distance For any finite time r,  k and  k are the right and left eigenvectors of graph Laplacian M. is the k th eigenvalue of M r (arranged in descending order). Given the definition of random walk, we denote Diffusion distance as a distance measure at time t between two pmfs as with empirical choice w(y)=1/  0 (y).

Diffusion Map Diffusion distance : Diffusion map : Mapping between original space and first k eigenvectors as Relationship : This relationship justifies using Euclidean distance in diffusion map space for spectral clustering. Since, it is justified to stop at appropriate k with a negligible error of order O( k+1 / k ) t ).

Example: Hourglass

Example: Image imbedding

Example: Lip image

Shape description

Dimension Reduction of Shape space

References Unsupervised Learning of Shape Manifolds (BMVC 2007) Diffusion Maps(Appl. Comput. Harmon. Anal. 21 (2006)) Geometric diffusions for the analysis of data from sensor networks (Current Opinion in Neurobiology 2005)