Indexing Multidimensional Data

Slides:



Advertisements
Similar presentations
Indexing Multidimensional Data Rui Zhang The University of Melbourne Aug 2006.
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Spatial Data Structures Hanan Samet Computer Science Department
Multimedia Database Systems
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
        iDistance -- Indexing the Distance An Efficient Approach to KNN Indexing C. Yu, B. C. Ooi, K.-L. Tan, H.V. Jagadish. Indexing the distance:
Multidimensional Indexing
Access Methods for Advanced Database Applications.
Searching on Multi-Dimensional Data
Spatial Mining.
Bitmap Index Buddhika Madduma 22/03/2010 Web and Document Databases - ACS-7102.
Spatial indexing PAMs (II).
High-Dimensional Similarity Search using Data-Sensitive Space Partitioning ┼ Sachin Kulkarni 1 and Ratko Orlandic 2 1 Illinois Institute of Technology,
Multi-dimensional Indexes
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
An Overview of the SAND spatial Database System Claudio Esperanca Hanan Samet Presented By Gautam Shanbhag.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Content Based Image Retrieval Natalia.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Image Based Positioning System Ankit Gupta Rahul Garg Ryan Kaminsky.
Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University.
Index tuning Performance Tuning. Overview Index An index is a data structure that supports efficient access to data Set of Records index Condition on.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases.
Database Systems Laboratory The Pyramid-Technique: Towards Breaking the Curse of Dimensionality Stefan Berchtold, Christian Bohm, and Hans-Peter Kriegal.
Trajectory Data Mining Dr. Yu Zheng Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University Editor-in-Chief of ACM Trans.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick AT&T labs-research Beng Chin Ooi, Kian-Lee Tan, Rui National.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
IMinMax B.C. Ooi, K.-L Tan, C. Yu, S. Stephen. Indexing the Edges -- A Simple and Yet Efficient Approach to High dimensional Indexing. ACM SIGMOD-SIGACT-
1 Queryy Sampling Based High Dimensional Hybrid Index Junqi Zhang, Xiangdong Zhou Fudan University.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Spatial Data Management
Strategies for Spatial Joins
15-826: Multimedia Databases and Data Mining
Indexing Structures for Files and Physical Database Design
Multidimensional Access Structures
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Spatial Indexing I Point Access Methods.
COMP 430 Intro. to Database Systems
The Quad tree The index is represented as a quaternary tree
Sameh Shohdy, Yu Su, and Gagan Agrawal
Query Processing and Optimization
Spatial Storage and Indexing
Chapter 15 QUERY EXECUTION.
Physical Database Design
15-826: Multimedia Databases and Data Mining
Spatial Database Systems
15-826: Multimedia Databases and Data Mining
Multidimensional Indexes
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Similarity Search: A Matching Based Approach
File Processing : Multi-dimensional Index
Multidimensional Search Structures
Prof. R. Bayer, Ph.D. Dr. Volker Markl
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
C. Faloutsos Spatial Access Methods - R-trees
Presentation transcript:

Indexing Multidimensional Data Rui Zhang http://www.csse.unimelb.edu.au/~rui The University of Melbourne Aug 2006

Outline Backgrounds Approaches Multidimensional data and queries Mapping based indexing Z-curve iDistance Hierarchical-tree based indexing R-tree k-d-tree Quad-tree Compression based indexing VA-file

Multidimensional Data Spatial data Geographic Information: Melbourne (37, 145) Which city is at (30, 140)? Computer Aided Design: width and height (40, 50) Any part that has a width of 40 and height of 50? Records with multiple attributes Employee (ID, age, score, salary, …) Is there any employee whose age is under 25 and performance score is greater than 80 and salary is between 3000 and 5000 Multimedia data Color histograms of images Give me the most similar image to Multimedia Features: color, shape, texture (low-dimensionality) (medium-dimensionality) ID Age Score Salary … (high-dimensionality)

Multidimensional Queries Point query Return the objects located at Q(x1, x2, …, xd). E.g. Q=(3.4, 6.6). Window query Return all the objects enclosed or intersected by the hyper-rectangle W{[L1, U1], [L2, U2], …, [Ld, Ud]}. E.g. W={[0,4],[2,5]} K-Nearest Neighbor Query (KNN Query) Return k objects whose distances to Q are no larger than any other object’ distance to Q. E.g. 3NN of Q=(4,1)

Mapping Based Multidimensional Indexing Sort Name x y Block Height A 0.7 1.2 2 100 F 1.7 3.8 11 120 C 2.7 2.3 12 80 B 5.8 19 50 D 5.5 2.4 25 90 E 6.6 2.5 28 40 H 0.6 34 G 2.8 4.7 36 I 1.6 6.7 41 60 J 3.4 45 Name x y Block Height A 0.7 1.2 2 100 B 5.8 19 50 C 2.7 2.3 12 80 D 5.5 2.4 25 90 E 6.6 2.5 28 40 F 1.7 3.8 11 120 G 2.8 4.7 36 H 0.6 34 I 1.6 6.7 41 60 J 3.4 45 Story The CBD: [0,4][2,5] Blocks in the CBD are: [8,15], [32,33] and [36,37] General strategy: three steps Data mapping and indexing Query mapping and data retrieval Filtering out false positive

The Z-curve and Other Space-Filling Curves Z-value calculation: bit-interleaving Support efficient window queries Disadvantage Jumps Other space-filling curves Hilbert-curves Gray-code Column-wise scan

Mapping for KNN Queries Sort Name x y Street Height A 0.7 1.2 14 100 B 5.8 32 50 C 2.7 2.3 12 80 D 5.5 2.4 31 90 E 6.6 2.5 40 F 1.7 3.8 13 120 G 2.8 4.7 24 H 0.6 23 I 1.6 6.7 22 60 J 3.4 Name x y Street Height C 2.7 2.3 12 80 F 1.7 3.8 13 120 A 0.7 1.2 14 100 I 1.6 6.7 22 60 H 0.6 5.8 23 50 G 2.8 4.7 24 J 3.4 6.6 40 D 5.5 2.4 31 90 B 32 E 2.5 2 24 23 22 21 14 4 13 3 12 2 11 1 32 1 31 3 Q R = 1.40 R = 1.75 R = 2.10 R = 0.35 R = 1.05 R = 0.70 Story continued New factory at Q[4,1] Find 3 nearest buildings to Q Termination condition K candidates All in the current search circle ||DQ|| = 2.05 ||EQ|| = 3.00 ||AQ|| = 3.31 ||FQ|| = 3.62 ||BQ|| = 1.81 ||CQ|| = 1.84 Rank 1 2 3 Candidate B A F Distance to Q 1.81 3.31 3.62 Rank 1 2 3 Candidate A Distance to Q 3.31 Rank 1 2 3 Candidate B E A Distance to Q 1.81 3.00 3.31 Rank 1 2 3 Candidate A F Distance to Q 3.31 3.62 Rank 1 2 3 Candidate B C E Distance to Q 1.81 1.84 3.00 Rank 1 2 3 Candidate B C D Distance to Q 1.81 1.84 2.05

The iDistance Data partitioned into a number of clusters Data mapping Streets are concentric circles Data mapping Objects mapped to street numbers Query mapping Search circle mapped to streets intersected

Hierarchical Tree Structures R-tree Minimum bounding rectangle (MBR) Incomplete and overlapping partitioning Disk-based; Balanced K-d-tree Space division recursively Complete and disjoint partitioning In-memory; Unbalanced There are algorithms to page and balance the tree, but with more complex manipulations N3 N1 N3 N3 N3 N1 N1 N4 N1 A N1 A N1 N2 B C D A A B 0.5 C D D D N1 N5 N2 N1 N2 F G C F A D B 0.3 C E F A C D B E F G N5 E N2 E G B C B N1 N2 N2 N4 B C E F G A A D D Problem: Overlap Problem: Empty space F G C F E E G B C B

Hierarchical Tree Structures (continued) Quad-tree Space divided into 4 rectangles recursively. Complete and disjoint partitioning In-memory; Unbalanced There are algorithms to page and balance the tree, but with more complex manipulations The point quad-tree A NW NE NW NE D SW SE A F D C B C B G E SE G E F SW

Compression Based Indexing The dimensionality curse The Vector Approximation File (VA-File) VA File Skewed data

Summary of the Indexing Techniques Disk-based / In-memory Balanced Efficient query type Dimensionality Comments R-tree Disk-based Yes Point, window, kNN Low Disadvantage is overlap K-d-tree In-memory No Point, window, kNN(?) Inefficient for skewed data Quad-tree Z-curve + B+-tree Point, window Order of the Z-curve affects performance iDistance Point, kNN High Not good for uniform data in very high-D VA-File Not good for skewed data

Index Implementations in major DBMS SQL Server B+-Tree data structure Clustered indexes are sparse Indexes maintained as updates/insertions/deletes are performed Oracle B+-tree, hash, bitmap, spatial extender for R-Tree Clustered index Index organized table (unique/clustered) Clusters used when creating tables DB2 B+-Tree data structure, spatial extender for R-tree Clustered indexes are dense Explicit command for index reorganization

Recommended Readings and References Survey on multidimensional indexing techniques Christian Böhm, Stefan Berchtold, Daniel A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 2001. Volker Gaede, Oliver Günther. Multidimensional Access Methods. ACM Computing Surveys 1998 Mapping based indexing Rui Zhang, Panos Kalnis, Beng Chin Ooi, Kian-Lee Tan. Generalized Multi-dimensional Data Mapping and Query Processing. ACM Transactions on Data Base Systems (TODS), 30(3), 2005. Space-filling curves H. V. Jagadish. Linear Clustering of Objects with Multiple Atributes . ACM SIGMOD Conference (SIGMOD) 1990. iDistance H.V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search. ACM Transactions on Data Base Systems (TODS), 30(2), 2005. R-tree Antonin Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching . ACM SIGMOD Conference (SIGMOD) 1984. Quad-tree Hanan Samet. The Quadtree and Related Hierarchical Data Structures . ACM Computing Surveys 1984. VA-File Roger Weber, Hans-Jörg Schek, Stephen Blott. Integrating A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. International Conference on Very Large Data Bases (VLDB) 1998.