Download presentation
Presentation is loading. Please wait.
Published byFlorence Anne Thornton Modified over 6 years ago
1
Manifold Learning Student: Ali Taalimi 05/05/2012
2
When can we avoid the curse of dimensionality?
In what situation it might be have inferences from data in high dimensional, without face of curse of dimensionality. Smoothness rate~(1/n)^(s/d), n is # of examples we need to learn the function if s(smoothness) is equal to rate of d(dimensionality), there is no curse of dimensionality. So, if the function in high dimension, is very smooth, and the smoothness is the rate of dimensionality the problem is solved (fitting the smoothest function to the data). splines, kernel methods, L2 regularization… Sparsity Maybe the function that you are going to learn, isn't smooth. But, can be represented by the sparse combination of some basis functions (using few relevant features). wavelets, L1 regularization, LASSO, compressed sensing.. Geometry (the most recent) graphs, simplicial complexes, Laplacian, diffusions 6/9/2018 Slide 2/90
3
Geometry and Data: The Central Dogma
(Fact of natural dataset) In very very high dimensional space, data, will not distribute uniformly. Distribution of natural data is non-uniform and it has some shape. Since it has a shape/geometry, maybe it concentrates around low dimensional structures. Because the natural datasets comes from a system that has few free parameters. The shape (geometry) of the distribution can be exploited for efficient learning. 6/9/2018 Slide 3/90
4
Manifold Learning one setting of thinking about geometry is manifold learning. Manifold learning is not a single problem, but, rather is a collection of problems unified by the some common assumption. Assumption of data lives near or on some low dimensional manifold, embedded in this high dimensional space. In other words, data might have some geometry which is far from uniform, and, then try to understand what consequence of that fact might be. You have to learn it in high dimensional space, all, the data is typically embedded in low dimensional space. You want to learn a function, but, in this case the natural domain of the function is a manifold which all the data lives. so, we have to learn a function, which domain is this manifold, and, whose range might be a finite set (clustering, classification, dimensional reduction,…) PROBLEM: Although all the data lives near some manifold, for the most part, we don’t know what this manifold is. We have to discover this manifold, without knowing what this manifold is. 6/9/2018 Slide 4/90
5
Suppose compact Euclidean space.
Suppose I sample it and give you a collection of points that sit on this manifold. All you can see is the clouds of points. What topology can you learn from these randomly drawn points? How many connected component does my manifold have? (example: having samples from mixture of Gaussian, can you tell the number of Gaussian) In other words, we are trying to learn both the function and its domain, simultaneously. 6/9/2018 Slide 5/90
6
PCA most simplest solution of fitting a linear manifold to the data
fitting the best linear subspace/manifold to the data of the certain rank. X1, …, Xn H is some subspace P(Xi, H) is projection Xi in subspace H fitting a subspace H between all choice of H to the data that minimize least square error: 6/9/2018 Slide 6/90
7
Manifold Model Suppose data does not lie on or near a linear subspace.
Yet data has inherently one degree of freedom (data is not lie in one dimensionals space) 6/9/2018 Slide 7/90
8
Vision Example consider image f:R*R[0, 1]
f(x, y) is equal to intensity of the image at location(x, y) Consider the following class of images: In fact this set includes many images which only translated by amount of (t, r) So this set is embedded in the nonlinear space of all the images But, there is only two degree of freedom in this particular set (t, r). 6/9/2018 Slide 8/90
9
Manifold of Sphere Consider a sphere as a model manifold that you want to think about. given a point p on the manifold, there is tangent space on the manifold, which for the k dimensional manifold is essentially k dimensional space, and, you could think of it, k dimensional affine subspace of Rn. Exactly like sphere which embedded in R3 and the tangent space is 2 dimensional affine of R3. Since tangent space is the linear space, you can think naturally of tangent vectors 6/9/2018 Slide 9/90
10
relation of tangent and curve
tangent vector can be thought of as derivative (How?) What is the curve on the manifold? ϕ(t): RMk The derivative of the curve rather t (d(ϕ(t))/dt), is tangent vector. So every tangent vector (v) is identified by a curve (ϕ(t)). φ(t) : RMk f : Mk R f(φ(t)) : R R df/dv = d(f(φ(t)) / dt So you can think of tangent vector as operator that acts on function f and take directional derivative in certain direction. 6/9/2018 Slide 10/90
11
Geodesic length of curve: a curve is a map from [0, 1]Mk, if you take a derivative, you actually get a vector, Since all the curves are living in the space where you have norm and inner product defined (Riemannian Geometry), you access the norm of derivative. So length of the curve is: Geodesic: short curve between two points 6/9/2018 Slide 11/90
12
Gradient of the function
The gradient of the function will act as an operator on vector in tangent space , so given any v on the tangent space , the inner product of gradient and the v, is the differentiation of the function in direction v. So I fix a function f, I pick any vector v in the tangent space, I can differentiate df in the direction v, and I get a number, So I have a map from function to numbers 6/9/2018 Slide 12/90
13
Exponential Map exponential map will take you from tangent space, back to the manifold. Until now we have the manifold, we have point p on the manifold, we have the tangent space of the point p on the manifold, we have functions defined on manifold, now consider Tp(M), which is the tangent space of M at the point p, So any element of this tangent space is a vector. And exponential map will take me from Tp back to M How? you essentially start going in the geodesic along v, such that the length of the curve has norm v. In other words, the vector v that you picked has a certain norm, you will go along this manifold for a distance which is the length of vector v. 6/9/2018 Slide 13/90
14
Laplace-Beltrami operator for manifold
Define Laplace for the functions defined on the manifold: if f is a double differentiable of k dimentional space to R, differentiate f, twice, for each direction and sum it up. I have this k dimentional tangent space, and, if I pick any vector in this space, and apply exponential operator, I will have a point in manifold. f is a function from k dimensional manifold to R. So, f composed by exponential map, will take us from tangent space to the R. 6/9/2018 Slide 14/90
15
Dimensionality Reduction
Given Find If I give you bunch of points sample from manifold, can you discover this map that embed this manifold isometrically in a d-dimensional space, and then apply this map to the data and thereby end up embedding data now in a d-dimensional space, where all this data actually lives originally in D-dimensional space. ISOMAP (Tenenbaum, et al, 2000) LLE (Roweis, Saul, 2000) Laplacian Eigenmaps (Belkin, Niyogi, 2001) Local Tangent Space Alignment (Zhang, Zha, 02) Hessian Eigenmaps (Donoho, Grimes, 02) Diffusion Maps (Coifman, Lafon, et al, 04) 6/9/2018 Slide 15/90
16
Algorithmic framework
There is a manifold in the high dimensional space. We don’t know this manifold. I have only bunch of points from this manifold, what should I do? make a graph/mesh structure by connecting nearby points to each other and make graph. nearby means near in Euclidean space, because the only thing that I can measure is Euclidean space distance measuring. This graph is approximation to the manifold, so, if I want to do something on the manifold I will do it on graph, instead. 6/9/2018 Slide 16/90
17
Isomap Construct nearest neighbor graph from all data points. make a graph where every vertex is identified by a data point. find the shortest path (Geodesic) distances between all points on the graph (Dij). Dij is not Euclidean distance between Xi and Xj. This distance is trying to simulate the geodesic distance between Xi and Xj, if you want to go along the manifold. Embed using Multidimensional Scaling 6/9/2018 Slide 17/90
18
Multidimensional Scaling
MDS: you give me a distance matrix, if there is a set of points in the Euclidean space from which this distance matrix could have a reason, then I can find those points for you. If indeed there are a set of points from which this distance matrix arose in Euclidean space (if D is aroused from set of vectros in Euclidean space), then the inner product between those points will satisfy : (A is matrix of inner products) <x, x> - 2<x, y> + <y, y> = ||x-y||2 Aii - 2Aij + Ajj = Dij It is true only if the distances are consistent with the inner products. This is not the case if the distances are geodesic distance on the manifold. After finding matrix of inner products (A), how to find the vectors? Simply by looking at the Eigen vectors of this matrix 6/9/2018 Slide 18/90
19
Multidimensional Scaling
2) Embedding from inner products (same as PCA!) matrix of inner products (A) is positive definite Then for any x ∈ {1,...,n} Ψ is the mapping function. Ψ will actually satisfy inner product and distance matrix summary: you start with the distance matrix, and, find the candidate set of inner products (A), and try to find vectors which are consistent with that set of inner product matrix. 6/9/2018 Slide 19/90
20
Isomap images generated by a human hand varying in finger extension and wrist rotation. So, there is only 2 degree of freedom 6/9/2018 Slide 20/90
21
Locally Linear Embedding
construct the nearest neighbor graph. Let x1,...,xn be neighbors of x. Project x to the span of x1,...,xn, and, that projection is calling when the projection is found, finding the set of coefficients which sum to one, in order to to be center of mass. This is called Find barycentric coordinates of Construct sparse matrix W . i th row is barycentric coordinates of in the basis of its nearest neighbors. Use lowest eigenvectors of (I − W )t(I − W ) to embed. 6/9/2018 Slide 21/90
22
Laplacian Eigenmaps (1)
make a graph which each data is represented by a vertex (n vertices for n data point), with edge eij. Calculate the Euclidean distance which is the only thing we can measure. n nearest neighbors. [parameter n ∈ N] Nodes i and j are connected by an edge if i is among n nearest neighbors of j or j is among n nearest neighbors of i. Heat kernel. [parameter t ∈ R]. If nodes i and j are connected, put if t is small it put a lot of penalty for far points, and, vice versa 6/9/2018 Slide 22/90
23
Laplacian Eigenmaps (2)
till now, I built a graph of n vertices using n data points, and I calculate weights for edges of the graph. the matrix W is a random matrix, because the original points were randomly sampled from manifold. The idea is by looking at the spectrum of matrix W, and Eigen values and Eigen vectors, we can recover Eigen values and Eigen functions of Laplace Beltrami operator of the manifold from which the data was sampled. D is a diagonal matrix made by: Construct matrix L = D – W [Eigenmaps] Compute eigenvalues and eigenvectors for the generalized eigenvector problem: Lf = λDf Let f0,...,fk−1 be eigenvectors. Leave out the eigenvector f0 and use the next m lowest eigenvectors for embedding in an m-dimensional Euclidean space. 6/9/2018 Slide 23/90
24
Diffusion Distance initial diffusion of heat on the manifold which is a pulse at x (δx), and do the same for y (δy) Diffusion distance between x and y (heat diffusion operator (Ht)): Difference between heat distributions after time t. On the manifold I want to have the distance, I take a point x, and start with initial pulse at the point x and allow heat to dissipate over the manifold. heat flows along the geometry of the manifold. after time t, distribution of heat from initial location x, will be obtained. Do the same for location y, and then measure the Euclidean distance between these two. if heat flow happen along the manifold, and it started with the initial distribution concentrated on x, I will get a distribution of heat and that distribution of heat is given by Ht (diffusion operator) apply it to δx if I apply heat diffusion operator (Ht) on δy (initial condition which is a point mass at y) 6/9/2018 Slide 24/90
25
Diffusion Distance relation of heat diffusion and Laplacian:
Diffusion of the heat on the manifold is governed by the heat equation on the manifold heat equation on the manifold is given by the Laplacian of f (which is partial derivative with respect to time) So, the way that diffusion maps as opposed to Laplacian Eigen maps works is that it looks at the Eigen function of the Laplacian and it embeds every point into the lower dimensional space using : λ is the Eigen values and f’s are Eigen vectors 6/9/2018 Slide 25/90
26
Justification for Laplacian Eigen map
bunch of points sitting in the higher dimensional manifold (x1, …, xn from MD ) wanted to have points in lower dimensional (y1, …, yn of R) Laplacian Eigen map tries to preserve locality. if xi and xj is near to each other, the yi and yj should be near to eachother (smoothness and locality preservation) If Wij is large if xi and xj is close to eachother, so yi and yj should be close too. If Wij is small, xi and xj are far from eachother, so, we don’t care. 6/9/2018 Slide 26/90
27
Justification of Laplacian Eigen Map
It can be shown that: minimizing ends up to finding Eigen vectors of Laplacian (L) Use eigenvectors of L to embed. Let Y =[y1 , y2 , ..., ym] 6/9/2018 Slide 27/90
28
On the Manifold Relationship between graph Laplacian and manifold Laplacian? I need a smooth (local preservation) map f : MR On the graph, the smoothness is handled using quadratic form Finding smooth function on the manifold is equal to finding smooth function on the graph (Stokes Theorem) smooth function on the graph means function is not changing much from vertex to vertex. Finding smooth function on the manifold, requires to look at Eigen functions of Laplace-Beltrami operator on the manifold. Finding the min of smoothness condition on the graph leads us to find Eigen vectors of Laplacian on the graph 1 and 2 converge to each other. 6/9/2018 Slide 28/90
29
{φi} forms an orthonormal basis for L2(M)
given an arbitrary Riemanian manifold, we know about Laplace Beltrami on this manifold EigenSystem: {φi} forms an orthonormal basis for L2(M) Eigen values {λi} characterize the smoothness of {φi} If I know the manifold which the data live, the Eigen functions of the Laplace will give me the set of basis functions which are adapted to the geometry of manifold. These functions are used to build classifiers. 6/9/2018 Slide 29/90
30
I don’t have the manifold, I only have the collection of points on the manifold, I should make a graph out of these points and I should look at the Laplacian on this graph which is an operator on functions defined on the vertex set of the graph I can look at the Eigen values and Eigen vectros of this , and from this, I can reconstruct the Eigen values and Eigen vectors of the Laplacian 6/9/2018 Slide 30/90
31
Results consider manifold (MD), and EigenSystem of Δf=λf which I am interested in This gives λ1, λ2, …, λi and Eigen functions φi I have randomly sampled points on the manifold x1, x2, …, xn and I have a graph G(V, E), and, sizeof(V)=n I look at functions f1: VR defined on the vertex set assume another set of functions f2: M R which is solution of EigenSystem. Laplace-Beltrami operator apply to these functions apply graph Laplacian on the f1 , D is function of t L = D-W, and, find Eigen values and Eigen vectors of L λ of L converges to λ of Eigensystem Δf=λf as ninf and t0 The rate of this (1/n)^(1/d) don’t depend on D 6/9/2018 Slide 31/90
32
Application of Manifolds
Motion estimation (Estimate motion of the person) Markerless motion estimation: inferring joint angles (16 cameras) Corazza, et al, Stanford Biomotion Lab, 05 Isometrically invariant representation. Eigenfunctions of the Laplacian are invariant under isometries. What happen when we move? The surface of the body of walker is 2D manifold in 3 dimensional space.If I put two markers on the arm, their shortest geodesic distance doesn’t change while walking 6/9/2018 Slide 32/90
33
Motion estimation Two manifolds are isometrically equivalent, if there is a correspondence between them which preserves geodesic distances Moving the body is a isometric transformation of the surface of the body So, we have bunch of data from body surface, we compute Eigen vectors of the Laplace matrix Each Eigen vector is the function defined on the surface of the body The color of each data point is corresponding to the certain Eigen vector of the data point When the guy is moving, The color is not changing. It is useful because now I have a function which doesn’t change during walking. We can use several of these functions to segment body Don’t need time, only point clouds. Calculate Eigen vectors for certain time, then use it for whole time. 6/9/2018 Slide 33/90
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.