Download presentation
Presentation is loading. Please wait.
Published byMargaret Chandler Modified over 9 years ago
1
1 Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7
2
2 Processing l ROLAP servers vs. MOLAP servers l Index Structures l Cube computation l What to Materialize? l Algorithms Client Warehouse Source Query & Analysis Integration Metadata
3
3 ROLAP Server l Relational OLAP Server relational DBMS ROLAP server tools utilities Special indices, tuning; Schema is “denormalized”
4
4 MOLAP Server l Multi-Dimensional OLAP Server multi- dimensional server M.D. tools utilities could also sit on relational DBMS Product City Date 1 2 3 4 milk soda eggs soap A B Sales
5
5 MOLAP Total annual sales of TV in U.S.A. Date Product Country sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum
6
6 MOLAP A B 29303132 1234 5 9 13141516 64636261 48474645 a1a0 c3 c2 c1 c 0 b3 b2 b1 b0 a2a3 C 44 28 56 40 24 52 36 20 60 B
7
7 Challenges in MOLAP l Storing large arrays for efficient access u Row-major, column major u Chunking u Compressing sparse arrays l Creating array data from data in tables l Efficient techniques for Cube computation Topics are discussed in the paper for reading
8
8 Index Structures l Traditional Access Methods u B-trees, hash tables, R-trees, grids, … l Popular in Warehouses u inverted lists u bit map indexes u join indexes u text indexes
9
9 Inverted Lists... age index inverted lists data records
10
10 Using Inverted Lists l Query: u Get people with age = 20 and name = “fred” l List for age = 20: r4, r18, r34, r35 l List for name = “fred”: r18, r52 l Answer is intersection: r18
11
11 Bit Maps... age index bit maps data records
12
12 Bitmap Index l Index on a particular column l Each value in the column has a bit vector: bit-op is fast l The length of the bit vector: # of records in the base table l The i-th bit is set if the i-th row of the base table has the value for the indexed column l not suitable for high cardinality domains Base table Index on RegionIndex on Type
13
13 Using Bit Maps l Query: u Get people with age = 20 and name = “fred” l List for age = 20: 1101100000 l List for name = “fred”: 0100000001 l Answer is intersection: 010000000000 l Good if domain cardinality small l Bit vectors can be compressed
14
14 Join “Combine” SALE, PRODUCT relations In SQL: SELECT * FROM SALE, PRODUCT WHERE...
15
15 Join Indexes join index
16
16 Cube Computation for Data Warehouses
17
17 Counting Exercise l How many cuboids are there in a cube? u The full or nothing case u When dimension hierarchies are present l What is the size of each cuboid?
18
18 Lattice of Cuboids city, product, date city, productcity, dateproduct, date cityproductdate all day 2 day 1 129
19
19 Dimension Hierarchies all state city
20
20 Dimension Hierarchies city, product city, product, date city, date product, date city product date all state, product, date state, date state, product state not all arcs shown...
21
21 Efficient Data Cube Computation l Data cube can be viewed as a lattice of cuboids u The bottom-most cuboid is the base cuboid u The top-most cuboid (apex) contains only one cell u How many cuboids in an n-dimensional cube with L levels? l Materialization of data cube u Materialize every (cuboid) (full materialization), none (no materialization), or some (partial materialization) u Selection of which cuboids to materialize è Based on size, sharing, access frequency, etc.
22
22 Derived Data l Derived Warehouse Data u indexes u aggregates u materialized views (next slide) l When to update derived data? l Incremental vs. refresh
23
23 Idea of Materialized Views l Define new warehouse tables/arrays does not exist at any source
24
24 Efficient OLAP Processing l Determine which operations should be performed on available cuboids u Transform drill, roll, etc. into corresponding SQL and/or OLAP operations, e.g., dice = selection + projection l Determine which materialized cuboid(s) should be selected for OLAP: u Let the query to be processed be on {brand, province_or_state} with the condition “year = 2004”, and there are 4 materialized cuboids available: 1) {year, item_name, city} 2) {year, brand, country} 3) {year, brand, province_or_state} 4) {item_name, province_or_state} where year = 2004 Which should be selected to process the query? l Explore indexing structures & compressed vs. dense arrays in MOLAP
25
25 What to Materialize? l Store in warehouse results useful for common queries l Example: day 2 day 1 129... total sales materialize
26
26 Materialization Factors l Type/frequency of queries l Query response time l Storage cost l Update cost Will study a concrete algorithm later
27
27 Iceberg Cube l Computing only the cuboid cells whose count or other aggregates satisfying the condition like HAVING COUNT(*) >= minsup l Motivation u Only a small portion of cube cells may be “above the water’’ in a sparse cube u Only calculate “interesting” cells—data above certain threshold
28
28 Challenges in MOLAP l Storing large arrays for efficient access u Row-major, column major u Chunking u Compressing sparse arrays l Creating array data from data in tables l Efficient techniques for Cube computation Topics are discussed in the paper for reading
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.