Download presentation
Presentation is loading. Please wait.
1
Intrinsic Data Geometry from a Training Set
Phd Thesis Proposal Nicola Rebagliati Supervisor: Alessandro Verri December 2007
2
Index Introduction The Training Set Intrinsic Geometry
An interesting operator over manifolds The Laplace-Beltrami operator The Laplacian matrix Preliminary work Image segmentation Image signatures Work plan
3
Introduction
4
The Training Set In the context of Statistical Learning we are given a set of training data representative of the collection of all possible data: Where, in the classification case, That is: Vectors in a Euclidean space and Labels
5
The Training Set The labelling function is unknown
We are looking for a function f* that minimize an error over future data We should stay away from overfitting and underfitting A solution... ... that overfits
6
Some example of Training Set
-1 1 Images: R^10000 Faces Not Faces Patient information: R^15 Exceeding quantity of iron Right quantity of iron Actions: R^(10000 x k) Walking Resting
7
Intrinsic Dimensionality
Without assumptions vectors may be uniformly distributed in their euclidean space In many practical cases the degrees of freedom of training data are far less than n. Images of 100X100 pixels representing squares may have three degrees of freedom, for example.
8
Data form a structure We may have something more than a simple lower dimensionality Data may live in a topological space where we can compute angles and distances That is a Riemannian Manifold
9
Example: The Swiss Roll
Difference between euclidean distance and geodesic distance
10
Semisupervised Learning
The geometry of the data is exploited in the semisupervised learning [BelkinNiyogi2004] In Semisupervised Learning the Training set is made of Labeled Data and Unlabeled Data Structural penalty w.r.t. known data Structural penalty w.r.t. all data Empirical error
11
Geometrical Information
A manifold has some intrinsic geometrical information [McKeanSinger67] : The Volume The Volume of the Boundary The Euler Characteristic ... The training set is extracted from the manifold Is it possible to approximate these information with the training set?
12
An interesting operator over manifold
13
The continuos Case: The Laplace-Beltrami operator
Let M denote a manifold The Laplace Beltrami operator maps functions to functions, both defined on the manifold In a while we see why we are interested in its eigenvalues and eigenvectors
14
The heat trace The heat trace of the Laplacian on a Riemannian manifold can be expressed as: The heat trace has an asymptotic expansion: C0 is the volume of the manifold C1 is proportional to the volume of the Boundary ... [McKeanSinger]
15
The discrete Case: The Laplacian Matrix
The Training Set can be seen as an undirected weighted graph Choose a comparing function for weighting graph edges: Define the degree of a vertex as Consider the Weight matrix Consider the diagonal Degree matrix
16
The discrete Case: The Laplacian Matrix
The Laplacian Matrix is simply: A normalized version [VonLuxburg06] has some better properties:
17
The discrete Case: The Laplacian Matrix
Now consider the spectral decomposition of the normalized laplacian: The eigenvalues are ordered as The training set can be embedded in a possibly low dimensional space through the eigenvectors and eigenvalues:
18
The Convergence We would like the discrete operator to be consistent with the continuos one Pointwise distance between the two operator is bounded by an error: The error is divided in Bias + Variance Here we have some constants that depend on the possibly unknown underlying manifold
19
Image segmentation PRELIMINARY WORK
20
Image segmentation A pixel is given by a digital camera as a tuple (x, y, R, G, B) We may define features that map a pixel, usually considering also its neighborhood, on a vector in For example (x, y, R, G, B) -> (R,G,B)
21
Image segmentation Clustering methods group together different vectors
I want to compare clustering results with a ‘ground truth’, or ‘human truth’
22
Segmenting with Laplacian
It can be proved that the second eigenvector of the Laplacian matrix give a clustering that: Maximize intra-group cohesion Minimize inter-group cohesion I consider as weighting function the gaussian: What sigma should be used?
23
Choose sigma in image segmentation
Heuristic: Choosing sigma such that the underlying manifold, if exists, is best approximated That is The training set is sampled over the manifold, so it is a random variable Choose sigma such that we have the minimum Bias-Variance error Let’s see a syntetic case
24
Choose sigma in image segmentation
3D Sphere Error Bound
25
Choose sigma and the balancing principle
The error bound is divided in bias and variance Algorithm for balancing the two terms We do not know the constants
26
Image Signatures PRELIMINARY WORK
27
3D-shape retrieval in database
In an article Reuter et al. [ReuterWolterPeinecke] propose to use the spectrum of the laplace-beltrami operator of a shape as its signature. This signature has many desiderable properties within that context: Representation invariant Scaling invariant Continuos w.r.t. Shape deformations
28
Image Signature In an another article Peinecke et al. [PeineckeReuterWolter] have proposed to look at an image as a 2D-manifold and to use its laplacian spectrum as signature. It is not clear what are the useful geometric invariants in that case: Illumination? Scale?
29
Image Signature: Laplacian Definition
Different Kind of Laplacian Definition The Normalized Laplacian The Laplace-Kirchoff operator The continuos operator Computed using the Finite Element Method (FEM) Different Results
30
Laplace-Kirchoff operator, radius:1
31
Laplace operator, sigma_xy=30, sigma gray=50
32
Work to do
33
Short Term Investigate an empirical criterion or an algorithm for choosing a suitable sigma Tackle the problem of extracting geometrical information from a set of eigenvalues of the Laplacian matrix. In the case of 3D-shapes, using FEM, a solution is proposed in [ReuterPeineckeWolter06]
34
Short Term Developing the idea of image signature
Compare different Laplacian definitions Explore the invariance possibilities Experimentation of plugging geometrical information in the learning context
35
Long Term Wider view for plugging geometrical information in the learning contest Characterize the manifold of images inside the space of matrices Torralba, Fergus and Freeman – 80 million tiny images: a large dataset for non-parametric object and scene recognition -
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.