Ronald R. Coifman , Stéphane Lafon, 2006

Name: Ronald R. Coifman , Stéphane Lafon, 2006
Uploaded: 2017-10-09T20:31:05+00:00
Duration: PTM10S46
Channel: Briana Tate
Description: Ronald R. Coifman , Stéphane Lafon, 2006

Ronald R. Coifman , Stéphane Lafon, 2006
Diffusion Maps Ronald R. Coifman , Stéphane Lafon, 2006 Big Data Seminar, Omri Zomet, January 11th, 2015

Table of contents Where are we? Motivation Markov chains
Diffusion distance & Markov Some math (spectral analysis) Example+Demo

Where are we? Data Analysis Manifold Learning Kernel Methods
Diffusion Maps

Motivation Manifold Learning
Efficient representations of complex geometric structures Global geometric information from local structures Robust to noise perturbation Dimensionality Reduction (nonlinear)

(Mix all ingredients together, put in the oven and VOILA!)
A kernel function (e.g. Gaussian) Markov chains Spectral analysis Diffusion distance and mapping Dimensionality reduction (Mix all ingredients together, put in the oven and VOILA!)

Kernel A kernel 𝑘:𝑋×𝑋→𝑅 should satisfy: Symmetric: 𝑘 𝑥,𝑦 =𝑘(𝑦,𝑥)
Positivity preserving: 𝑘(𝑥,𝑦)≥0 Some Kernel examples: Heat Kernel (as seen earlier in the Seminar) Gaussian Kernel – exp⁡(− 𝑥 𝑖 − 𝑥 𝑗 𝜎 2 ) Linux kernel (Not really…)

Markov Chains - reminder
0.6 0.4 0.7 0.3 Andrey (Andrei) Andreyevich Markov

Markov Chains - properties
Pr 𝑋 𝑛+1 =𝑥 𝑋 1 = 𝑥 1 , 𝑋 2 = 𝑥 2 ,…, 𝑋 𝑛 = 𝑥 𝑛 = Pr 𝑋 𝑛+1 =𝑥 𝑋 𝑛 = 𝑥 𝑛 M - Stochastic matrix (sum or rows = 1) State after t times - 𝑥 (𝑛) 𝑀 𝑡 Steady state: 𝑀 ∞ or Solve vM=𝑣 𝑣 is (normalized) left eigenvector with eigenvalue 1! Andrey (Andrei) Andreyevich Markov

From Kernel to Markov Given N data points 𝑥 𝑛 𝑛=1 𝑁 where each 𝑥 𝑛 ∈ 𝑅 𝑝 , the distance (similarity) between any two points xi and xj is given by: e.g. Gaussian kernel of width e and a diagonal normalization matrix We get…

From Kernel to Markov Define: 𝑀= 𝐷 −1 𝐿 𝑀 𝑖𝑗 is a new “kernel”
Positivity preserving i.e. 𝑀 𝑖𝑗 =𝑘 𝑖,𝑗 >0 Not symmetric – but has conservation property (sum == 1) Will be handled… 𝑀 𝑖𝑗 - transition kernel of a Markov chain on X (!) Meila & Shi (AIStat’01) interpret M as a stochastic matrix representing random walk on the graph This alone can infer geometric information…

The power of power Set X of 900 points, Union of 3 clusters, Gaussian kernel (ε=0.7), Corresponding Markov Matrix

The power of power

The power of power The key ideas from this example:
Cluster, from a random walk point of view - is a region in which the probability of escaping this region is low. T - in addition to being the time parameter, plays the role of a scale parameter.

Diffusion Distance A symmetric matrix Ms can be derived from M as
M and Ms has same N eigenvalues (matrices adjoint) 𝑀 𝑠 symmetric -> diagonalizable -> N real eigenvalues, orthonormal eigenbasis! f : left eigenvector of M y : right eigenvector of M (Bi-orthogonal)

Diffusion Distance Under random walk representation of the graph M
If one starts random walk from location xi , the probability of landing in location y after t time steps is given by For large enough e, all points in M are connected (Mi,j >0) and the eigenvalues of M

Diffusion Distance Regardless of initial state –
Eigenvector f0(x) has the dual representation : 1. Stationary probability distribution on the curve, i.e., the probability of landing at location x after taking infinite steps of random walk (independent of the start location). 2. It is the density estimate at location x. Left eigenvector of M with eigenvalue l0=1 Remember me?

Diffusion Distance For finite time t, decompose the probability distribution in the eigenbasis is the kth eigenvalue of M t (arranged in descending order). Coefficients Eigenbasis f : left eigenvector of M y : right eigenvector of M (B-orthogonal)

Diffusion Distance & Map
𝑤(𝑦) – weight function (e.g ) – not necessary. 𝑤(𝑦) accounts for local density – more weight on low density Diffusion Map: Are these related? In 3 slides.. For given i,k:

Diffusion Distance

Diffusion Distance Some intuition and insights:
Robust to noise perturbation, as it sums over all possible paths of length t between points Diffusion distance is small if there are many high probability paths of length t between two points Diffusion distance is small - the path probabilities between (x, u) and (u, y) must be roughly equal. This happens when x and y are both well connected via u. As diffusion process runs forward, revealing the geometric structure of the data, the main contributors to the diffusion distance are paths along that structure

Diffusion Distance & Map
Relation between distance and mapping: Diffusion distance = Euclidean distance in diffusion map space (with all n-1 eigenvectors) Did you know that - i.e. Diffusion maps are reduced to Laplacian eigenmaps when the eigenvalues are discarded from the mapping (using merely the eigenvectors) Diffusion maps - eigenvalues = Laplacian Eigenmaps

Dimensionality Reduction
If M has a spectral gap – approximate distance with first few k eigenvectors The error in the k-term approximation of the diffusion distance can be bounded – for any t. Example:

Remember the ingredients?
Recipe: Compute distance matrix L with kernel K Normalize L with sum of rows – get M Spectral decomposition – compute eigenvalues and eigenvectors of M (Eigen-value decomposition, SVD,…) Mapping – map to diffusion map space Dimensionality reduction – choose first k mapping coordinates M can be raised for some desired power t (application dependent) Mapping - raise eigenvalues by power t. Higher t - faster decay of the eigenvalues

Toy Example..

Toy Example.. Details: 15 images of same toy
Each image 127x169x3 pixels Toy rotation angle is the sole degree of freedom in the set Will the Diffusion Maps algorithm find the main difference between images? Reshaped each picture to be a vector in 𝑅 127𝑥169𝑥3 . Built a Gaussian kernel on this high dimensional data. Normalized it, and looked at the valued of the first diffusion map coordinate.

Toy Example.. Values of the first diffusion maps coordinate

Toy Example..

Demo Mani tool -

Things to think about.. Which kernel to use? Kernel parameters?
What t? Raise M by power? How to compute kernel efficiently (over all points)? How to find eigenvalues/eigenvectors efficiently?

Bibliography “Diffusion maps” by Ronald R. Coifman, Stéphane Lafon
“Diffusion Maps for Signal Processing” by Ronen Talmon, Israel Cohen, Sharon Gannot, and Ronald R. Coifman. “Diffusion Maps - a Probabilistic Interpretation for Spectral Embedding and Clustering Algorithms” by Boaz Nadler, Stephane Lafon, Ronald Coifman and Ioannis G. Kevrekidis. “An Introduction to Diffusion Maps“ by J. de la Porte, B. M. Herbst, W. Hereman, S. J. van der Walt (very good for math basis) “Diffusion Maps” presentation by Aviv Rotbart (some images) “Diffusion Maps and Spectral Clustering” presentation by Nilanjan Dasgupta (some images) Todd Wittman’s Mani Matlab tool Wikipedia (Markov)

Ronald R. Coifman , Stéphane Lafon, 2006

Similar presentations

Presentation on theme: "Ronald R. Coifman , Stéphane Lafon, 2006"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ronald R. Coifman , Stéphane Lafon, 2006

Similar presentations

Presentation on theme: "Ronald R. Coifman , Stéphane Lafon, 2006"— Presentation transcript:

Similar presentations

About project

Feedback