Download presentation
Presentation is loading. Please wait.
Published byArnold Mitchell Modified over 8 years ago
1
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University
2
Agenda Requirements on Indexing methods Existing indexing methods Optimization of R-Tree for OLAP data R-Tree VS Bit-mapped Indices Conclusion
3
Requirements on Indexing methods Symmetric partial match queries – Continuous e.g. “time between Jan to July 94” – Discontinuous e.g. “first month of each year” Indexing at multiple levels of aggregation – Pre-computation group-bys – Indexing summary data Handing multiple traversal orders Efficient batch update Handling sparse data efficiently
4
Existing methods Multidimensional array-based methods – Works efficiently when data is dense – Essbase’s schema E.G. four dimensional cube : product and store (sparse), time and scenarios ( dense) –B-tree on Product and Store –Two-dimensional array on time and scenarios – Evaluation of Essbase’s schema May cause multiple searches. –E.g. searching store = “something” on product-store index Performance depends on ability to find enough dense dimensions. Efficient batch update
5
Existing methods… Cont... Bit mapped indices – Pros: Low cardinality data, bit maps are both spaced and retrieval efficient. Supports bitwise operations Access data is clustered All dimensions handles symmetrically – Cons Range queries Increased space overhead of storing the bit-maps specially for high cardinality data Expensive batch update as all bit mapped indices have to be modified even for a single row insertion
6
Existing methods... Cont… Bit-mapped indices variants – Compression – Hybrid – Dynamic Bit-maps
7
Existing methods... Cont… Hierarchical Indices – Example: Product - Store Index product first also store summaries on product level. For each product value, create index for Store and store summaries for product-store level – Pros: Allows faster access to higher levels data Dimensions are symmetrically handled – Cons: Widely used index storage overhead The average retrieval efficiency can suffer because of large indexing structure
8
Existing methods… Cont… Multidimensional indices – Use of of the indexed methods designed for spatial data E.g RTree, GridFiles etc.
9
Optimized R-Tree of OLAP data Rectangular dense region (only the boundaries that contain more than threshold number of points – Contains a pointer to variable length array of (TIDs or the tuples itself) – Points in sparse regions Finding dense regions – Ask Expert? – Use of clustering algorithm (similar algorithm: image analysis) Need evaluation!!
10
R-Tree VS Bit-mapped indices R-Tree Pros: – Allows range queries – Smaller space overhead – Update is more efficient Bit-mapped Pros: – Faster Bit-wise operation – Efficient for low cardinality, few restricted dimensions, and sparse data.
11
Conclusion High level overview Recommended readings – MOLAP VS OLAP – R-Tree and variants – R-Tree alternatives – Computational of multidimensional aggregates – And More…..
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.