Download presentation
Presentation is loading. Please wait.
Published byEgbert Harper Modified over 9 years ago
1
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ of Munich, Germany
2
Outline Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
3
Outline Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
4
Introduction Classification the problem of assigning an appropriate class to the query object Applications -molecular biology, medical imaging mechanical engg., astronomy Objects of same class have some characteristic properties in common. These could be geometric properties, thematic properties.
5
Classification in Molecular Databases Classification schemata is already available We need a fast filter classification algorithm Dali System - a sophisticated classification algorithm for proteins CATH – hierarchical classification of protein domain structures Four levels – class, architecture, topology and homologous super family.
6
Nearest Neighbor Classification In general classification is done after training Object is assigned if it matches the description of the class Nearest neighbor classifiers –find the nearest neighbor and return its class K- nearest neighbors - #k, Weights of neighbors
7
Geometry Based Similarity Search Spatial objects transformed into high dimensional vector space In 2D shapes can be represented as ordered set of surface points, approx rectangular coverings etc. Section Coding technique – each polygon’s circumcircle is decomposed into number of sectors, and each of these sectors are normalized. Similarity is defined in terms of Euclidean distance between resulting feature vectors.
8
Invariance Properties Similarity models need to incorporate invariance against translation, rotation, scaling etc. Most of the methods include a preprocessing step such as rotation of objects to a normalized orientation, translation of center of mass to origin etc. Robustness against errors is not considered in most of these models
9
Outline Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
10
3D Shape Similarity Model We extend the concept of section coding technique to 3D. Shape Histograms – feature vectors Quadratic Distance Function
11
Shape Histograms Feature transform maps a complex object onto a feature vector in a multidimensional space. 3D shape histograms are also feature vectors Based on partitioning the space into complete and disjoint cells called the bins of the histogram We can use any space (geometric, thematic etc.)
12
Shell Model 3D space is decomposed into concentric shells around the center point Independent of rotation around the center Radii of the shells are determined from the extension of the objects Shells of uniform thickness
13
Sector Model 3D space is decomposed into sectors that emerge from the center point of the model Distribute points uniformly on the surface of the sphere. The Voronoi diagram gives an appropriate decomposition of the space.
15
Combined Model Combination of shell and sector models Results in a higher dimensionality We can different combinations of shells and sectors for the same dimensionality
19
Euclidean Distance Euclidean Distance between two N dimensional vectors p and q is given by Individual components of the feature vectors are assumed to be independent No relationships of the components such as substitutability and compensability may be regarded
20
Euclidean Distance Consider 3 objects a, b and c We can clearly see ‘a and b’ are closely related than ‘a and c’ or ‘b and c’ However due to rotation, the peaks of ‘a’ and ‘b’ are mapped into different bins and hence the Euclidean distance does not reflect similarity in this case
22
Outline Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
23
Quadratic Form Distance Function Quadratic form distance function is defined in terms of similarity matrix ‘A’ The components a ij of A represent similarity of the components i and j in the underlying space Euclidean distance is a specific case of Quad Form Distance where A= I, the Identity Matrix
24
Quadratic Form Distance Functions Euclidean distance of two vectors is totally determined Weighted Euclidean distance is a little more flexible, for it controls the effect of individual vector component onto the overall distance On top of this, General Quad form distance function also specifies cross-dependencies of the dimensions
25
Quadratic Form Distance Functions The neighborhood of the bins can be represented as the similarity weights Let d(i,j) represent the distance of the cells that correspond to bin i and j For shells the bin distance is the difference in the corresponding radii For sectors the bin distance is the difference in the angles of sector centers
26
Quadratic Form Distance Functions When provided with appropriate distance function, the similarity matrix can be computed as a ij = e -σ.d(i,j) where the parameter σ controls the global shape of the similarity matrix.
27
Invariance Properties During normalization, we perform translation and rotation of all objects Translation is done such that the COM maps onto the Origin Principal Axes Transform is done This generally leads to unique orientation of the object
28
Principal Axes Transform Compute the Covariance matrix for a given 3D set of points (x,y,z)
29
Principal Axes Transform The eigen vectors of this matrix represent the principal axes of the original 3D point set The eigen values indicate the variance of the points in the respective direction As a result of PAT all the covariances of the transformed points vanish
30
Outline Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
31
Extensibility of Histogram Models Along with spatial properties we can also consider thematic properties General approach to manage both thematic and spatial properties is to use combined histograms Combined histogram is the cartesian product of the individual histograms
33
Outline Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
34
Query Processing In case of Quad Form Distance Function, the evaluation time of a single database object increases quadratically with dimension
35
Optimal Multistep k- Nearest Neighbor Search In order to achieve a good performance, the paradigm of mutlistep query processing is used An index-based filter step produces a set of candidates Refinement step performs the expensive exact evaluation of the candidates Filter is responsible for completeness and refinement for correctness
37
Optimal Multistep k- Nearest Neighbor Search Based on multi-dimensional index structure, the filter step performs an incremental ranking objects ordered by their increasing filter distance to the query are reported In order to guarantee no false dismissals caused by the filter step, d j (p,q) ≤ d o (p,q) Where d j =filter distance and d 0 = object distance
38
Reduction in Dimensionality of Quadratic Forms Objects in high dimensional spaces are managed by reducing their dimensionality Typically this is done by Principal Component Analysis, Discrete Fourier transform, Similarity Matrix decomposition, Feature Subselection etc. These approaches can also be used in case of Quadratic Form Distance
39
Reduction in Dimensionality of Quadratic Forms An algorithm to reduce the similarity matrix from a high-dim. space down to a low-dim. space was developed in the context of multimedia databases. The method guarantees three things the reduced distance function is a lower bound of the given high- dimensional distance function. the reduced distance function again is a quadratic form the reduced distance function is the greatest of all lower-bounding distance functions in the reduced space.
40
Experimental Evaluation Data is taken from Brookhaven Protein Databank. Molecules are represented as surface points for the computation of shape histograms Reduced Feature Vectors for the filter step are managed by a X-tree of dimension 10.
41
Experimental Evaluation Similarity Matrices are computed by an adapted formula from where the similarity weights a ij of bin i and j are defined as a ij = e -σ.d(i,j) σ = 10
42
Basic Similarity Search
43
Classification by Shape Similarity Every class has at least two molecules From Preprocessing, 3422 proteins have been classified into 281 classes 3models pure shell model, pure sector model and combined model have been considered. The accuracy for the combined model is the best
44
Classification by Shape Similarity
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.