File Processing : Multi-dimensional Index 2018, Spring Pusan National University Ki-Joune Li
Multi-Dimensional Index Multi-Attributes Query vs. Single Attribute Query Single Attribute : Only ONE attribute to specify query condition Example : Find Students whose record is in [3.5, 4.5] Multi-Attributes : Several attributes Example : Find students whose height is greater than 180 cm and weight is less than 70 Kg Each attribute corresponds to a dimension Multi-Attribute Query : Multi-Dimensional Query
Processing Multi-dimensional Queries Example : Find students whose height > 180 cm and weight < 70 Kg Method 1 : Using a B+-tree Step 1 : Apply B+-tree to search student taller than 180 cm Step 2 : Search students lighter than 70 Kg from the result of step 1 Height and Weight or Weight and Height ? < 70 Result 180
Processing Multi-dimensional Queries Method 2 : Using Two B+-trees Step 1 : Result1 ← Students taller than 180 cm by B+-tree Step 2 : Result2 ← Students lighter than 70 Kg by B+-tree Step 3 : Result ← Result1 Result2 Comparison of Method 1 and Method 2 180 < 70 Result =
Processing Multi-dimensional Queries Method 3 : Unified Index for Several Attributes One index for several attributes Multi-Dimensional Space Two approaches Extending B+-tree Extending Dynamic Hashing Index for Height and Weight Weight Height
Extending Hashing : Grid Approach Weight Height block pointer . . . Block Pointer Array Query Fixed Variable Fixed Grid Method Grid File
Extending Hashing : Grid File Directory (x1, y1) (x2, y2) Block Pointer Query
Problem 1: Dead Space No objects in this query area 5 block accesses Dead Space Empty space with no objects How to reduce dead space
Minimum Bounding Rectangle MBR (Minimum Bounding Rectangle) Query Only 1 Disk Access
Problem 2: Non-Point Object Where to store this object
Minimum Bounding Rectangle MBR (Minimum Bounding Box) Two dimensional geometric simplification of objects Not the Whole space, only in the region occupied by objects (X1max , X2max ) (X1min , X2min)
Extending B+-tree : R-tree B+-tree vs. R-tree B+-tree : Interval (1-D rectangle) R-tree : Multi-Dimensional Interval (Rectangle) R-tree : Rectangle B+-tree Each Node MBR (Minimum Bounding Rectangle) instead of Interval (or Delimiter) No Linked-List for External Nodes A certain amount of overlapping is indispensable
Extending B+-tree : R-tree Example Root Query
Upward Split like B-tree Split MBR in the case of overflow Line sweeping : Compare Cost-X and Cost-Y New MBR Splitting Line
Splitting Strategy 50:50 Split Instead of 50:50 split, other cost measures Area, Perimeter Overlapping Area Good Split Bad Split 1. Make them as COMPACT as possible 2. Preserve spatial proximity as possible
R*-tree: An Improvement of R-tree Re-Insertion Strategy on Overflow Most Popular Index for Multi-Dimensional Index Overflow More Compact Newly Inserted Object Delete and Re-Insert this Re-Inserted Object