Download presentation
Presentation is loading. Please wait.
Published byLizbeth Glenn Modified over 6 years ago
1
Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel, University of Munich Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data
2
Feature Based Similarity
3
Simple Similarity Queries
Specify query object and Find similar objects – range query Find the k most similar objects – nearest neighbor q.
4
Join Applications: Catalogue Matching
E.g. Astronomic catalogues S R
5
Join Applications: Clustering
Clustering (e.g. DBSCAN) Similarity self-join
6
Grid partitioning General idea: Grid approximation where grid line distance = e Similar idea in the e-kdB-tree [Shim, Srikant, Agrawal: High-dimensional Similarity Joins, ICDE 1997] Disadvantage of any grid approach: Number of neighboring grid cells: 3d - 1
7
Scalability of the e-kdB-tree
Assumption: 2 adjacent e-stripes fit in main mem. Unrealistic for large data sets which are ... clustered, skewed and high-dimensional data
8
Epsilon Grid Order
9
e-Grid-Order Is a Total Strict Order
Irreflexivity Transitivity Asymmetry e-grid-order can be used in any sorting algorithm
10
e-Interval Coarse approximation of join mates: Used for I/O processing
11
I/O Processing for the Self Join
Decompose the sorted file into I/O units
12
Epsilon Grid Order
13
CPU Processing I/O units are further decomposed before joining
Simple divide-and-conquer: No further sorting Decomposition: maximize active dimensions
14
CPU Processing Point distance computations: Order of dimensions
Neighboring inactive dimensions Unspecified dimensions Active dimension Aligned inactive dimensions
15
Experimental Results 8-dimensional uniformly distributed vectors
16
Experimental Results (2)
16-d feature vectors from CAD application
17
Conclusions Summary Future research potential
High potential for performance gains of the similarity join by page capacity optimization Necessary to separately optimize I/O and CPU Future research potential Similarity join for metric index structures Approximate similarity join Parallel similarity join algorithms
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.