Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indexing for Multidimensional Data An Introduction.

Similar presentations


Presentation on theme: "Indexing for Multidimensional Data An Introduction."— Presentation transcript:

1 Indexing for Multidimensional Data An Introduction

2 Jaruloj Chongstitvatana Advanced Data Structures 2 Applications of Multidimensional Databases Databases with multiple-attribute key Spatial databases Geographic information system (GIS) Computer-aided design (CAD) Multimedia databases Medical applications

3 Characteristics of Good Index Structures Dynamic Operations –Queries Point queries Range queries Spatial queries –Insert –Delete Simplicity Performance –Disk accesses –Running time –Storage utilization Low % of waste space Memory Disk Scalability –Data size –Data dimension Jaruloj Chongstitvatana Advanced Data Structures 3

4 Why Hierarchical Structures ADVANTAGES Allow the search to be focused on interesting subset of data Eliminate useless search Clean and simple implementation DISADVANATGES Parallelism Jaruloj Chongstitvatana Advanced Data Structures 4

5 Types of Data Multi-dimension point data –Database with multiple-attribute key –Point in 2D or 3D Interval data Multi-dimension region data High-dimensional point data –Data mining Jaruloj Chongstitvatana Advanced Data Structures 5

6 Jaruloj Chongstitvatana Advanced Data Structures 6 Comparison B tree Binary tree Unbalanced Organize data Memory-based index –Measuring the running time Practical memory size B+ tree N-ary tree Height-balanced Organize data space Disk-based index –Measuring the number of disk accesses Disk page size

7 Jaruloj Chongstitvatana Advanced Data Structures 7 B tree 10 4 9 20 6 7

8 Jaruloj Chongstitvatana Advanced Data Structures 8 B+ tree 6 11 14 48 19 22 16 31

9 B+ tree Jaruloj Chongstitvatana Advanced Data Structures 9 N-ary tree Increase the breadth of trees to decrease the height Used for indexing of large amount of data (stored in disk)

10 Example Jaruloj Chongstitvatana Advanced Data Structures 10 12 52 78 83 91 60 69 19 26 37 46 4 8 012012 70 71 76 77 79 80 81 82 83 85 86 90 93 95 97 98 99 54 56 57 59 60 61 62 66 67 13 14 17 19 20 21 22 26 27 28 31 35 38 44 45 49 50 567567 8 9 11 12

11 Properties of B+ trees For an M-ary B tree: The root has up to M children. Non-leaf nodes store up to M-1 keys, and have between M/2 and M children, except the root. All data items are stored at leaves. All leaves have to same depth, and store between L/2 and L data items. Jaruloj Chongstitvatana Advanced Data Structures 11

12 Search Jaruloj Chongstitvatana Advanced Data Structures 12 12 52 78 83 91 60 69 19 26 37 46 4 8 012012 70 71 76 77 79 80 81 82 83 85 86 90 93 95 97 98 99 54 56 57 59 60 61 62 66 67 13 14 17 19 20 21 22 26 27 28 31 35 38 44 45 49 50 567567 8 9 11 12 Search for 66

13 Insert Jaruloj Chongstitvatana Advanced Data Structures 13 12 52 78 83 91 60 69 19 26 37 46 4 8 012012 70 71 76 77 79 80 81 82 83 85 86 90 93 95 97 98 99 54 56 57 59 60 61 62 66 67 13 14 17 19 20 21 22 26 27 28 31 35 38 44 45 49 50 567567 8 9 11 12 Insert 55 Split leave

14 Insert Jaruloj Chongstitvatana Advanced Data Structures 14 12 52 78 83 91 60 69 19 26 37 46 4 8 012012 70 71 76 77 79 80 81 82 83 85 86 90 93 95 97 98 99 54 56 57 59 60 61 62 66 67 13 14 17 19 20 21 22 26 27 28 31 35 36 38 44 45 49 50 567567 8 9 11 12 Insert 32 Split leave Insert key 31Split node Insert key 31

15 Jaruloj Chongstitvatana Advanced Data Structures 15 Handling multiple attributes Separate index structure for each attributes –Update all index structures for each record update. –Data are scattered in many disk pages. a1a2a3 disk a4

16 Jaruloj Chongstitvatana Advanced Data Structures 16 Handling multiple attributes Bit interleaving Attribute interleaving

17 Multiple-attribute indexing Quad-tree k-d tree k-d-B tree Grid file hB-tree Issues Non-linear relationship Distance measure k-nearest-neighbor queries Jaruloj Chongstitvatana Advanced Data Structures 17

18 Spatial Indexing R-tree R*-tree SKD-tree Issues Non-linear ordering Spatial queries High cost of determining spatial relationship Jaruloj Chongstitvatana Advanced Data Structures 18

19 High-dimensional Indexing SS-tree TV-tree Issues: Curse of dimensionality Volume grows exponentially with dimension Partition in higher dimension is coarser Distance measurement in higher dimension is not practical Jaruloj Chongstitvatana Advanced Data Structures 19


Download ppt "Indexing for Multidimensional Data An Introduction."

Similar presentations


Ads by Google