Download presentation
Presentation is loading. Please wait.
1
Indexing Multidimensional Data
Rui Zhang The University of Melbourne Aug 2006
2
Outline Backgrounds Approaches Multidimensional data and queries
Mapping based indexing Z-curve iDistance Hierarchical-tree based indexing R-tree k-d-tree Quad-tree Compression based indexing VA-file
3
Multidimensional Data
Spatial data Geographic Information: Melbourne (37, 145) Which city is at (30, 140)? Computer Aided Design: width and height (40, 50) Any part that has a width of 40 and height of 50? Records with multiple attributes Employee (ID, age, score, salary, …) Is there any employee whose age is under 25 and performance score is greater than 80 and salary is between 3000 and 5000 Multimedia data Color histograms of images Give me the most similar image to Multimedia Features: color, shape, texture (low-dimensionality) (medium-dimensionality) ID Age Score Salary … (high-dimensionality)
4
Multidimensional Queries
Point query Return the objects located at Q(x1, x2, …, xd). E.g. Q=(3.4, 6.6). Window query Return all the objects enclosed or intersected by the hyper-rectangle W{[L1, U1], [L2, U2], …, [Ld, Ud]}. E.g. W={[0,4],[2,5]} K-Nearest Neighbor Query (KNN Query) Return k objects whose distances to Q are no larger than any other object’ distance to Q. E.g. 3NN of Q=(4,1)
5
Mapping Based Multidimensional Indexing
Sort Name x y Block Height A 0.7 1.2 2 100 F 1.7 3.8 11 120 C 2.7 2.3 12 80 B 5.8 19 50 D 5.5 2.4 25 90 E 6.6 2.5 28 40 H 0.6 34 G 2.8 4.7 36 I 1.6 6.7 41 60 J 3.4 45 Name x y Block Height A 0.7 1.2 2 100 B 5.8 19 50 C 2.7 2.3 12 80 D 5.5 2.4 25 90 E 6.6 2.5 28 40 F 1.7 3.8 11 120 G 2.8 4.7 36 H 0.6 34 I 1.6 6.7 41 60 J 3.4 45 Story The CBD: [0,4][2,5] Blocks in the CBD are: [8,15], [32,33] and [36,37] General strategy: three steps Data mapping and indexing Query mapping and data retrieval Filtering out false positive
6
The Z-curve and Other Space-Filling Curves
Z-value calculation: bit-interleaving Support efficient window queries Disadvantage Jumps Other space-filling curves Hilbert-curves Gray-code Column-wise scan
7
Mapping for KNN Queries
Sort Name x y Street Height A 0.7 1.2 14 100 B 5.8 32 50 C 2.7 2.3 12 80 D 5.5 2.4 31 90 E 6.6 2.5 40 F 1.7 3.8 13 120 G 2.8 4.7 24 H 0.6 23 I 1.6 6.7 22 60 J 3.4 Name x y Street Height C 2.7 2.3 12 80 F 1.7 3.8 13 120 A 0.7 1.2 14 100 I 1.6 6.7 22 60 H 0.6 5.8 23 50 G 2.8 4.7 24 J 3.4 6.6 40 D 5.5 2.4 31 90 B 32 E 2.5 2 24 23 22 21 14 4 13 3 12 2 11 1 32 1 31 3 Q R = 1.40 R = 1.75 R = 2.10 R = 0.35 R = 1.05 R = 0.70 Story continued New factory at Q[4,1] Find 3 nearest buildings to Q Termination condition K candidates All in the current search circle ||DQ|| = 2.05 ||EQ|| = 3.00 ||AQ|| = 3.31 ||FQ|| = 3.62 ||BQ|| = 1.81 ||CQ|| = 1.84 Rank 1 2 3 Candidate B A F Distance to Q 1.81 3.31 3.62 Rank 1 2 3 Candidate A Distance to Q 3.31 Rank 1 2 3 Candidate B E A Distance to Q 1.81 3.00 3.31 Rank 1 2 3 Candidate A F Distance to Q 3.31 3.62 Rank 1 2 3 Candidate B C E Distance to Q 1.81 1.84 3.00 Rank 1 2 3 Candidate B C D Distance to Q 1.81 1.84 2.05
8
The iDistance Data partitioned into a number of clusters Data mapping
Streets are concentric circles Data mapping Objects mapped to street numbers Query mapping Search circle mapped to streets intersected
9
Hierarchical Tree Structures
R-tree Minimum bounding rectangle (MBR) Incomplete and overlapping partitioning Disk-based; Balanced K-d-tree Space division recursively Complete and disjoint partitioning In-memory; Unbalanced There are algorithms to page and balance the tree, but with more complex manipulations N3 N1 N3 N3 N3 N1 N1 N4 N1 A N1 A N1 N2 B C D A A B 0.5 C D D D N1 N5 N2 N1 N2 F G C F A D B 0.3 C E F A C D B E F G N5 E N2 E G B C B N1 N2 N2 N4 B C E F G A A D D Problem: Overlap Problem: Empty space F G C F E E G B C B
10
Hierarchical Tree Structures (continued)
Quad-tree Space divided into 4 rectangles recursively. Complete and disjoint partitioning In-memory; Unbalanced There are algorithms to page and balance the tree, but with more complex manipulations The point quad-tree A NW NE NW NE D SW SE A F D C B C B G E SE G E F SW
11
Compression Based Indexing
The dimensionality curse The Vector Approximation File (VA-File) VA File Skewed data
12
Summary of the Indexing Techniques
Disk-based / In-memory Balanced Efficient query type Dimensionality Comments R-tree Disk-based Yes Point, window, kNN Low Disadvantage is overlap K-d-tree In-memory No Point, window, kNN(?) Inefficient for skewed data Quad-tree Z-curve + B+-tree Point, window Order of the Z-curve affects performance iDistance Point, kNN High Not good for uniform data in very high-D VA-File Not good for skewed data
13
Index Implementations in major DBMS
SQL Server B+-Tree data structure Clustered indexes are sparse Indexes maintained as updates/insertions/deletes are performed Oracle B+-tree, hash, bitmap, spatial extender for R-Tree Clustered index Index organized table (unique/clustered) Clusters used when creating tables DB2 B+-Tree data structure, spatial extender for R-tree Clustered indexes are dense Explicit command for index reorganization
14
Recommended Readings and References
Survey on multidimensional indexing techniques Christian Böhm, Stefan Berchtold, Daniel A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 2001. Volker Gaede, Oliver Günther. Multidimensional Access Methods. ACM Computing Surveys 1998 Mapping based indexing Rui Zhang, Panos Kalnis, Beng Chin Ooi, Kian-Lee Tan. Generalized Multi-dimensional Data Mapping and Query Processing. ACM Transactions on Data Base Systems (TODS), 30(3), 2005. Space-filling curves H. V. Jagadish. Linear Clustering of Objects with Multiple Atributes . ACM SIGMOD Conference (SIGMOD) 1990. iDistance H.V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search. ACM Transactions on Data Base Systems (TODS), 30(2), 2005. R-tree Antonin Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching . ACM SIGMOD Conference (SIGMOD) 1984. Quad-tree Hanan Samet. The Quadtree and Related Hierarchical Data Structures . ACM Computing Surveys 1984. VA-File Roger Weber, Hans-Jörg Schek, Stephen Blott. Integrating A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. International Conference on Very Large Data Bases (VLDB) 1998.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.