Presentation is loading. Please wait.

Presentation is loading. Please wait.

Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel, University of Munich Epsilon Grid Order: An Algorithm for the Similarity.

Similar presentations


Presentation on theme: "Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel, University of Munich Epsilon Grid Order: An Algorithm for the Similarity."— Presentation transcript:

1 Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel, University of Munich Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data

2 Feature Based Similarity

3 Simple Similarity Queries
Specify query object and Find similar objects – range query Find the k most similar objects – nearest neighbor q.

4 Join Applications: Catalogue Matching
E.g. Astronomic catalogues S R

5 Join Applications: Clustering
Clustering (e.g. DBSCAN) Similarity self-join

6 Grid partitioning General idea: Grid approximation where grid line distance = e Similar idea in the e-kdB-tree [Shim, Srikant, Agrawal: High-dimensional Similarity Joins, ICDE 1997] Disadvantage of any grid approach: Number of neighboring grid cells: 3d - 1

7 Scalability of the e-kdB-tree
Assumption: 2 adjacent e-stripes fit in main mem. Unrealistic for large data sets which are ... clustered, skewed and high-dimensional data

8 Epsilon Grid Order

9 e-Grid-Order Is a Total Strict Order
Irreflexivity Transitivity Asymmetry e-grid-order can be used in any sorting algorithm

10 e-Interval Coarse approximation of join mates: Used for I/O processing

11 I/O Processing for the Self Join
Decompose the sorted file into I/O units

12 Epsilon Grid Order

13 CPU Processing I/O units are further decomposed before joining
Simple divide-and-conquer:  No further sorting Decomposition: maximize active dimensions

14 CPU Processing Point distance computations: Order of dimensions
Neighboring inactive dimensions Unspecified dimensions Active dimension Aligned inactive dimensions

15 Experimental Results 8-dimensional uniformly distributed vectors

16 Experimental Results (2)
16-d feature vectors from CAD application

17 Conclusions Summary Future research potential
High potential for performance gains of the similarity join by page capacity optimization Necessary to separately optimize I/O and CPU Future research potential Similarity join for metric index structures Approximate similarity join Parallel similarity join algorithms


Download ppt "Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel, University of Munich Epsilon Grid Order: An Algorithm for the Similarity."

Similar presentations


Ads by Google