Indexing Multidimensional Data Rui Zhang The University of Melbourne Aug 2006.

Slides:



Advertisements
Similar presentations
Multidimensional Index Structures One dimensional index structures assume a single search key, and retrieve records that match a given search-key value.
Advertisements

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.
Association rule mining
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Reinforcement Learning
Alexander Gelbukh Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 6 (book chapter 12): Multimedia.
Sequential Logic Design
Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.
Division ÷ 1 1 ÷ 1 = 1 2 ÷ 1 = 2 3 ÷ 1 = 3 4 ÷ 1 = 4 5 ÷ 1 = 5 6 ÷ 1 = 6 7 ÷ 1 = 7 8 ÷ 1 = 8 9 ÷ 1 = 9 10 ÷ 1 = ÷ 1 = ÷ 1 = 12 ÷ 2 2 ÷ 2 =
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
1 1  1 =.
CHAPTER 18 The Ankle and Lower Leg
Chapter 6 File Systems 6.1 Files 6.2 Directories
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Polygon Scan Conversion – 11b
The 5S numbers game..
High-dimensional Similarity Join
Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.
The basics for simulations
Database Performance Tuning and Query Optimization
Briana B. Morrison Adapted from William Collins
Introduction to Structured Query Language (SQL)
Yong Choi School of Business CSU, Bakersfield
Regression with Panel Data
Association Rule Mining
Chapter 6 File Systems 6.1 Files 6.2 Directories
Indexes An index on a file speeds up selections on the search key fields for the index. Any subset of the fields of a relation can be the search key for.
GIS Lecture 8 Spatial Data Processing.
1..
Introduction to Indexes Rui Zhang The University of Melbourne Aug 2006.
Similarity Search: A Matching Based Approach Rui Zhang The University of Melbourne July 2006.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Multidimensional Data Structures
When you see… Find the zeros You think….
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
PSSA Preparation.
& dding ubtracting ractions.
Copyright Tim Morris/St Stephen's School
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Spatial Data Structures Hanan Samet Computer Science Department
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
        iDistance -- Indexing the Distance An Efficient Approach to KNN Indexing C. Yu, B. C. Ooi, K.-L. Tan, H.V. Jagadish. Indexing the distance:
Searching on Multi-Dimensional Data
High-Dimensional Similarity Search using Data-Sensitive Space Partitioning ┼ Sachin Kulkarni 1 and Ratko Orlandic 2 1 Illinois Institute of Technology,
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Content Based Image Retrieval Natalia.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Database Systems Laboratory The Pyramid-Technique: Towards Breaking the Curse of Dimensionality Stefan Berchtold, Christian Bohm, and Hans-Peter Kriegal.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick AT&T labs-research Beng Chin Ooi, Kian-Lee Tan, Rui National.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Indexing Multidimensional Data
Multidimensional Access Structures
Spatial Indexing I Point Access Methods.
COMP 430 Intro. to Database Systems
Multidimensional Indexes
Similarity Search: A Matching Based Approach
Multidimensional Search Structures
Prof. R. Bayer, Ph.D. Dr. Volker Markl
Presentation transcript:

Indexing Multidimensional Data Rui Zhang The University of Melbourne Aug 2006

Outline Backgrounds Multidimensional data and queries Approaches Mapping based indexing Z-curve iDistance Hierarchical-tree based indexing R-tree k-d-tree Quad-tree Compression based indexing VA-file

Multidimensional Data Spatial data Geographic Information: Melbourne (37, 145) Which city is at (30, 140)? Computer Aided Design: width and height (40, 50) Any part that has a width of 40 and height of 50? Records with multiple attributes Employee (ID, age, score, salary, …) Is there any employee whose age is under 25 and performance score is greater than 80 and salary is between 3000 and 5000 Multimedia data Color histograms of images Give me the most similar image to Multimedia Features: color, shape, texture IDAgeScoreSalary… … (high-dimensionality) (medium-dimensionality) (low-dimensionality)

Multidimensional Queries Point query Return the objects located at Q(x 1, x 2, …, x d ). E.g. Q=(3.4, 6.6). Window query Return all the objects enclosed or intersected by the hyper-rectangle W{[L 1, U 1 ], [L 2, U 2 ], …, [L d, U d ]}. E.g. W={[0,4],[2,5]} K-Nearest Neighbor Query (KNN Query) Return k objects whose distances to Q are no larger than any other object’ distance to Q. E.g. 3NN of Q=(4,1)

Mapping Based Multidimensional Indexing Story The CBD: [0,4][2,5] Blocks in the CBD are: [8,15], [32,33] and [36,37] General strategy: three steps Data mapping and indexing Query mapping and data retrieval Filtering out false positive NamexyBlockHeight A B C D E F G H I J NamexyBlockHeight A F C B D E H G I J Sort

The Z-curve and Other Space-Filling Curves The Z-curve Z-value calculation: bit-interleaving Support efficient window queries Disadvantage Jumps Other space-filling curves Hilbert-curves Gray-code Column-wise scan

3 2 1 Mapping for KNN Queries Story continued New factory at Q[4,1] Find 3 nearest buildings to Q Termination condition K candidates All in the current search circle NamexyStreetHeight A B C D E F G H I J Sort NamexyStreetHeight C F A I H G J D B E Rank123 CandidateA Distance to Q3.31 Q Rank123 CandidateBAF Distance to Q Rank123 CandidateBEA Distance to Q Rank123 CandidateAF Distance to Q Rank123 CandidateBCE Distance to Q Rank123 CandidateBCD Distance to Q ||AQ|| = 3.31||FQ|| = 3.62||BQ|| = 1.81||EQ|| = 3.00||CQ|| = 1.84||DQ|| = R = 0.35R = 0.70R = 1.05R = 1.40R = 1.75R = 2.10

The iDistance Data partitioned into a number of clusters Streets are concentric circles Data mapping Objects mapped to street numbers Query mapping Search circle mapped to streets intersected

Hierarchical Tree Structures R-tree Minimum bounding rectangle (MBR) Incomplete and overlapping partitioning Disk-based; Balanced A D C E B F G A D C E B F G A D C E B G F A D C E B G F K-d-tree Space division recursively Complete and disjoint partitioning In-memory; Unbalanced There are algorithms to page and balance the tree, but with more complex manipulations A N1N1 N2N2 N1N1 BCD N1N1 ACD N1N1 BE N2N2 N1N1 N2N2 FG N1N1 N3N3 N3N3 ABCD N1N1 0.5 N3N3 N1N1 N2N2 AD N1N1 BCE N2N2 N3N3 F BCE N2N2 FG N4N4 N4N4 N5N5 0.3 N5N5 Problem: OverlapProblem: Empty space

Hierarchical Tree Structures (continued) Quad-tree Space divided into 4 rectangles recursively. Complete and disjoint partitioning In-memory; Unbalanced There are algorithms to page and balance the tree, but with more complex manipulations The point quad-tree A D C E B F G A NW NE SW B NW SWSE NE CD EFG SE

Compression Based Indexing The dimensionality curse The Vector Approximation File (VA-File) VA FileSkewed data

Summary of the Indexing Techniques IndexDisk-based / In-memory BalancedEfficient query type Dimensi onality Comments R-treeDisk-basedYesPoint, window, kNN LowDisadvantage is overlap K-d-treeIn-memoryNoPoint, window, kNN(?) LowInefficient for skewed data Quad-treeIn-memoryNoPoint, window, kNN(?) LowInefficient for skewed data Z-curve + B + -tree Disk-basedYesPoint, window LowOrder of the Z- curve affects performance iDistanceDisk-basedYesPoint, kNNHighNot good for uniform data in very high-D VA-FileDisk-basedPoint, window, kNN HighNot good for skewed data

Index Implementations in major DBMS SQL Server B+-Tree data structure Clustered indexes are sparse Indexes maintained as updates/insertions/deletes are performed Oracle B+-tree, hash, bitmap, spatial extender for R-Tree Clustered index Index organized table (unique/clustered) Clusters used when creating tables DB2 B+-Tree data structure, spatial extender for R-tree Clustered indexes are dense Explicit command for index reorganization

Recommended Readings and References Survey on multidimensional indexing techniques Christian B ö hm, Stefan Berchtold, Daniel A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys Volker Gaede, Oliver G ü nther. Multidimensional Access Methods. ACM Computing Surveys 1998 Mapping based indexing Rui Zhang, Panos Kalnis, Beng Chin Ooi, Kian-Lee Tan. Generalized Multi-dimensional Data Mapping and Query Processing. ACM Transactions on Data Base Systems (TODS), 30(3), Space-filling curves H. V. Jagadish. Linear Clustering of Objects with Multiple Atributes. ACM SIGMOD Conference (SIGMOD) iDistance H.V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search. ACM Transactions on Data Base Systems (TODS), 30(2), R-tree Antonin Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. ACM SIGMOD Conference (SIGMOD) Quad-tree Hanan Samet. The Quadtree and Related Hierarchical Data Structures. ACM Computing Surveys VA-File Roger Weber, Hans-J ö rg Schek, Stephen Blott. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. International Conference on Very Large Data Bases (VLDB) 1998.