Diffusion Maps and Spectral Clustering

Slides:



Advertisements
Similar presentations
Partitional Algorithms to Detect Complex Clusters
Advertisements

Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
1 Numerical Geometry of Non-Rigid Shapes Diffusion Geometry Diffusion geometry © Alexander & Michael Bronstein, © Michael Bronstein, 2010 tosca.cs.technion.ac.il/book.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Ronald R. Coifman , Stéphane Lafon, 2006
6/26/2006CGI'06, Hangzhou China1 Sub-sampling for Efficient Spectral Mesh Processing Rong Liu, Varun Jain and Hao Zhang GrUVi lab, Simon Fraser University,
Lecture 21: Spectral Clustering
Pattern Recognition and Machine Learning
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.
Dimensionality Reduction
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Modern Navigation Thomas Herring
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel Transductive Rademacher Complexity and its Applications.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
1 Learning from Shadows Dimensionality Reduction and its Application in Artificial Intelligence, Signal Processing and Robotics Ali Ghodsi Department of.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Vector Norms and the related Matrix Norms. Properties of a Vector Norm: Euclidean Vector Norm: Riemannian metric:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Modern Navigation Thomas Herring MW 11:00-12:30 Room
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CHAPTER 5 SIGNAL SPACE ANALYSIS
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
1 LING 696B: MDS and non-linear methods of dimension reduction.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Lecture 2: Statistical learning primer for biologists
1 EE571 PART 3 Random Processes Huseyin Bilgekul Eeng571 Probability and astochastic Processes Department of Electrical and Electronic Engineering Eastern.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Geometric diffusions as a tool for harmonic analysis and structure definition of data By R. R. Coifman et al. The second-round discussion* on * The first-round.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
PART III: TRANSIENT INTERFERENCE SUPPRESSION USING DIFFUSION MAPS.
Rongjie Lai University of Southern California Joint work with: Jian Liang, Alvin Wong, Hongkai Zhao 1 Geometric Understanding of Point Clouds using Laplace-Beltrami.
High Dimensional Probabilistic Modelling through Manifolds
Spectral Methods for Dimensionality
Graph Spectral Image Smoothing
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Intrinsic Data Geometry from a Training Set
Markov Chains Mixing Times Lecture 5
Unsupervised Riemannian Clustering of Probability Density Functions
Generating multidimensional embeddings based on fuzzy memberships
LSI, SVD and Data Management
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Learning with information of features
Feature space tansformation methods
Using Manifold Structure for Partially Labeled Classification
Spectral clustering methods
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Diffusion Maps and Spectral Clustering 1/14 Machine Learning Seminar Series Diffusion Maps and Spectral Clustering Author : Ronald R. Coifman et al. (Yale University) Presenter : Nilanjan Dasgupta (SIG Inc.)

Motivation Data lie on a low-dimensional manifold. The shape of the 2/14 Motivation X Y Z -- Datum Low-dimensional Manifold Data lie on a low-dimensional manifold. The shape of the manifold is not known a priori. PCA would fail to make compact representation since the manifold is not linear ! Spectral clustering as a non-linear dimensionality reduction scheme.

Outline Non-linear dimensionality reduction and spectral clustering. 3/14 Outline Non-linear dimensionality reduction and spectral clustering. Diffusion based probabilistic interpretation of spectral methods. Eigenvectors of normalized graph Laplacian is a discrete approximation of the continuous Fokker-Plank operator. Justification of the success of spectral clustering. Conclusions.

Spectral clustering Nomalized graph Laplacian : 4/14 Spectral clustering Nomalized graph Laplacian : Given N data points where each , the distance (similarity) between any two points xi and xj is given by with Gaussian kernel of width e and a diagonal normalization matrix Solve the normalized eigenvalue problem Use first few eigenvectors of M for low-dimensional representation of data or good coordinates for clustering.

Spectral Clustering : previous work 5/14 Spectral Clustering : previous work Non-linear dimensionality analysis by S. Roweis and L.Saul (published in Science magazine, 2000). Belkin & Niyogi (NIPS’02) show that if data are sampled uniformly from the low-dimensional manifold, first few eigenvectors of M=D-1L are discrete approximation of the Laplace-Beltrami operator on the manifold. Meila & Shi (AIStat’01) interpret M as a stochastic matrix representing random walk on the graph.

Diffusion distance and Diffusion map 6/14 Diffusion distance and Diffusion map A symmetric matrix Ms can be derived from M as M and Ms has same N eigenvalues, Under random walk representation of the graph M f : left eigenvector of M y : right eigenvector of M e : time step

Diffusion distance and Diffusion map 7/14 Diffusion distance and Diffusion map e has the dual representation (time step and kernel width). If one starts random walk from location xi , the probability of landing in location y after r time steps is given by For large e, all points in the graph are connected (Mi,j >0) and the eigenvalues of M where ei is a row vector with all zeros except that ith position = 1.

Diffusion distance and Diffusion map 8/14 Diffusion distance and Diffusion map One can show that regardless of starting point xi Left eigenvector of M with eigenvalue l0=1 with Eigenvector f0(x) has the dual representation : 1. Stationary probability distribution on the curve, i.e., the probability of landing at location x after taking infinite steps of random walk (independent of the start location). 2. It is the density estimate at location x.

Diffusion distance For any finite time r, 9/14 Diffusion distance For any finite time r, yk and fk are the right and left eigenvectors of graph Laplacian M. is the kth eigenvalue of M r (arranged in descending order). Given the definition of random walk, we denote Diffusion distance as a distance measure at time t between two pmfs as with empirical choice w(y)=1/f0(y).

Diffusion Map k eigenvectors as Relationship : 10/14 Diffusion Map Diffusion distance : Diffusion map : Mapping between original space and first k eigenvectors as Relationship : This relationship justifies using Euclidean distance in diffusion map space for spectral clustering. Since , it is justified to stop at appropriate k with a negligible error of order O(lk+1/lk)t).

Asymptotics of Diffusion Map 11/14 Asymptotics of Diffusion Map Suppose {xi} are sampled i.i.d. from probability density p(x) defined over manifold Z X Y Suppose p(x) = e-U(x) with U(x) is potential (energy) at location x. As , random walk on a discrete graph converges to random walk on the continuous manifold W. The forward and backward operators are given by

Asymptotics of Diffusion Map 12/14 Asymptotics of Diffusion Map Tf[f] : the probability distribution after one time-step e f(x) is probability distribution on the graph at t=0. Tb[y](x) is the mean of function y after one time-step e, for a random walk that started at location x at time t=0. Consider the limit , i.e., when each data point contains infinite nearby neighbors. Hence in that limit, random walk converges to a diffusion process with probability density evolving continuously in time as

Fokker-Plank operator 13/14 Fokker-Plank operator Infinitesimal generators (propagators) : The eigenfunctions of Tf and Tb converge to those of Hf and Hb, respectively. The backward generator is given by the Fokker –Plank operator which corresponds to a diffusion process in a potential field 2U(x).

Spectral clustering and Fokker-Plank operator 14/14 Spectral clustering and Fokker-Plank operator The term is interpreted as the drift term towards low potential (higher data density). The left and right eigenvectors of M can be viewed as discrete approximations of Tf and Tb, respectively. Tf and Tb can be viewed as approximation to Hf and Hb, which in the asymptotic case ( ) can be viewed as diffusion process with potential 2U(x) (p(x)=exp(-U(x)).