Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intrinsic Data Geometry from a Training Set

Similar presentations


Presentation on theme: "Intrinsic Data Geometry from a Training Set"— Presentation transcript:

1 Intrinsic Data Geometry from a Training Set
Phd Thesis Proposal Nicola Rebagliati Supervisor: Alessandro Verri December 2007

2 Index Introduction The Training Set Intrinsic Geometry
An interesting operator over manifolds The Laplace-Beltrami operator The Laplacian matrix Preliminary work Image segmentation Image signatures Work plan

3 Introduction

4 The Training Set In the context of Statistical Learning we are given a set of training data representative of the collection of all possible data: Where, in the classification case, That is: Vectors in a Euclidean space and Labels

5 The Training Set The labelling function is unknown
We are looking for a function f* that minimize an error over future data We should stay away from overfitting and underfitting A solution... ... that overfits

6 Some example of Training Set
-1 1 Images: R^10000 Faces Not Faces Patient information: R^15 Exceeding quantity of iron Right quantity of iron Actions: R^(10000 x k) Walking Resting

7 Intrinsic Dimensionality
Without assumptions vectors may be uniformly distributed in their euclidean space In many practical cases the degrees of freedom of training data are far less than n. Images of 100X100 pixels representing squares may have three degrees of freedom, for example.

8 Data form a structure We may have something more than a simple lower dimensionality Data may live in a topological space where we can compute angles and distances That is a Riemannian Manifold

9 Example: The Swiss Roll
Difference between euclidean distance and geodesic distance

10 Semisupervised Learning
The geometry of the data is exploited in the semisupervised learning [BelkinNiyogi2004] In Semisupervised Learning the Training set is made of Labeled Data and Unlabeled Data Structural penalty w.r.t. known data Structural penalty w.r.t. all data Empirical error

11 Geometrical Information
A manifold has some intrinsic geometrical information [McKeanSinger67] : The Volume The Volume of the Boundary The Euler Characteristic ... The training set is extracted from the manifold Is it possible to approximate these information with the training set?

12 An interesting operator over manifold

13 The continuos Case: The Laplace-Beltrami operator
Let M denote a manifold The Laplace Beltrami operator maps functions to functions, both defined on the manifold In a while we see why we are interested in its eigenvalues and eigenvectors

14 The heat trace The heat trace of the Laplacian on a Riemannian manifold can be expressed as: The heat trace has an asymptotic expansion: C0 is the volume of the manifold C1 is proportional to the volume of the Boundary ... [McKeanSinger]

15 The discrete Case: The Laplacian Matrix
The Training Set can be seen as an undirected weighted graph Choose a comparing function for weighting graph edges: Define the degree of a vertex as Consider the Weight matrix Consider the diagonal Degree matrix

16 The discrete Case: The Laplacian Matrix
The Laplacian Matrix is simply: A normalized version [VonLuxburg06] has some better properties:

17 The discrete Case: The Laplacian Matrix
Now consider the spectral decomposition of the normalized laplacian: The eigenvalues are ordered as The training set can be embedded in a possibly low dimensional space through the eigenvectors and eigenvalues:

18 The Convergence We would like the discrete operator to be consistent with the continuos one Pointwise distance between the two operator is bounded by an error: The error is divided in Bias + Variance Here we have some constants that depend on the possibly unknown underlying manifold

19 Image segmentation PRELIMINARY WORK

20 Image segmentation A pixel is given by a digital camera as a tuple (x, y, R, G, B) We may define features that map a pixel, usually considering also its neighborhood, on a vector in For example (x, y, R, G, B) -> (R,G,B)

21 Image segmentation Clustering methods group together different vectors
I want to compare clustering results with a ‘ground truth’, or ‘human truth’

22 Segmenting with Laplacian
It can be proved that the second eigenvector of the Laplacian matrix give a clustering that: Maximize intra-group cohesion Minimize inter-group cohesion I consider as weighting function the gaussian: What sigma should be used?

23 Choose sigma in image segmentation
Heuristic: Choosing sigma such that the underlying manifold, if exists, is best approximated That is The training set is sampled over the manifold, so it is a random variable Choose sigma such that we have the minimum Bias-Variance error Let’s see a syntetic case

24 Choose sigma in image segmentation
3D Sphere Error Bound

25 Choose sigma and the balancing principle
The error bound is divided in bias and variance Algorithm for balancing the two terms We do not know the constants

26 Image Signatures PRELIMINARY WORK

27 3D-shape retrieval in database
In an article Reuter et al. [ReuterWolterPeinecke] propose to use the spectrum of the laplace-beltrami operator of a shape as its signature. This signature has many desiderable properties within that context: Representation invariant Scaling invariant Continuos w.r.t. Shape deformations

28 Image Signature In an another article Peinecke et al. [PeineckeReuterWolter] have proposed to look at an image as a 2D-manifold and to use its laplacian spectrum as signature. It is not clear what are the useful geometric invariants in that case: Illumination? Scale?

29 Image Signature: Laplacian Definition
Different Kind of Laplacian Definition The Normalized Laplacian The Laplace-Kirchoff operator The continuos operator Computed using the Finite Element Method (FEM) Different Results

30 Laplace-Kirchoff operator, radius:1

31 Laplace operator, sigma_xy=30, sigma gray=50

32 Work to do

33 Short Term Investigate an empirical criterion or an algorithm for choosing a suitable sigma Tackle the problem of extracting geometrical information from a set of eigenvalues of the Laplacian matrix. In the case of 3D-shapes, using FEM, a solution is proposed in [ReuterPeineckeWolter06]

34 Short Term Developing the idea of image signature
Compare different Laplacian definitions Explore the invariance possibilities Experimentation of plugging geometrical information in the learning context

35 Long Term Wider view for plugging geometrical information in the learning contest Characterize the manifold of images inside the space of matrices Torralba, Fergus and Freeman – 80 million tiny images: a large dataset for non-parametric object and scene recognition -


Download ppt "Intrinsic Data Geometry from a Training Set"

Similar presentations


Ads by Google