Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7.

Similar presentations


Presentation on theme: "1 Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7."— Presentation transcript:

1 1 Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7

2 2 Processing l ROLAP servers vs. MOLAP servers l Index Structures l Cube computation l What to Materialize? l Algorithms Client Warehouse Source Query & Analysis Integration Metadata

3 3 ROLAP Server l Relational OLAP Server relational DBMS ROLAP server tools utilities Special indices, tuning; Schema is “denormalized”

4 4 MOLAP Server l Multi-Dimensional OLAP Server multi- dimensional server M.D. tools utilities could also sit on relational DBMS Product City Date 1 2 3 4 milk soda eggs soap A B Sales

5 5 MOLAP Total annual sales of TV in U.S.A. Date Product Country sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum

6 6 MOLAP A B 29303132 1234 5 9 13141516 64636261 48474645 a1a0 c3 c2 c1 c 0 b3 b2 b1 b0 a2a3 C 44 28 56 40 24 52 36 20 60 B

7 7 Challenges in MOLAP l Storing large arrays for efficient access u Row-major, column major u Chunking u Compressing sparse arrays l Creating array data from data in tables l Efficient techniques for Cube computation Topics are discussed in the paper for reading

8 8 Index Structures l Traditional Access Methods u B-trees, hash tables, R-trees, grids, … l Popular in Warehouses u inverted lists u bit map indexes u join indexes u text indexes

9 9 Inverted Lists... age index inverted lists data records

10 10 Using Inverted Lists l Query: u Get people with age = 20 and name = “fred” l List for age = 20: r4, r18, r34, r35 l List for name = “fred”: r18, r52 l Answer is intersection: r18

11 11 Bit Maps... age index bit maps data records

12 12 Bitmap Index l Index on a particular column l Each value in the column has a bit vector: bit-op is fast l The length of the bit vector: # of records in the base table l The i-th bit is set if the i-th row of the base table has the value for the indexed column l not suitable for high cardinality domains Base table Index on RegionIndex on Type

13 13 Using Bit Maps l Query: u Get people with age = 20 and name = “fred” l List for age = 20: 1101100000 l List for name = “fred”: 0100000001 l Answer is intersection: 010000000000 l Good if domain cardinality small l Bit vectors can be compressed

14 14 Join “Combine” SALE, PRODUCT relations In SQL: SELECT * FROM SALE, PRODUCT WHERE...

15 15 Join Indexes join index

16 16 Cube Computation for Data Warehouses

17 17 Counting Exercise l How many cuboids are there in a cube? u The full or nothing case u When dimension hierarchies are present l What is the size of each cuboid?

18 18 Lattice of Cuboids city, product, date city, productcity, dateproduct, date cityproductdate all day 2 day 1 129

19 19 Dimension Hierarchies all state city

20 20 Dimension Hierarchies city, product city, product, date city, date product, date city product date all state, product, date state, date state, product state not all arcs shown...

21 21 Efficient Data Cube Computation l Data cube can be viewed as a lattice of cuboids u The bottom-most cuboid is the base cuboid u The top-most cuboid (apex) contains only one cell u How many cuboids in an n-dimensional cube with L levels? l Materialization of data cube u Materialize every (cuboid) (full materialization), none (no materialization), or some (partial materialization) u Selection of which cuboids to materialize è Based on size, sharing, access frequency, etc.

22 22 Derived Data l Derived Warehouse Data u indexes u aggregates u materialized views (next slide) l When to update derived data? l Incremental vs. refresh

23 23 Idea of Materialized Views l Define new warehouse tables/arrays does not exist at any source

24 24 Efficient OLAP Processing l Determine which operations should be performed on available cuboids u Transform drill, roll, etc. into corresponding SQL and/or OLAP operations, e.g., dice = selection + projection l Determine which materialized cuboid(s) should be selected for OLAP: u Let the query to be processed be on {brand, province_or_state} with the condition “year = 2004”, and there are 4 materialized cuboids available: 1) {year, item_name, city} 2) {year, brand, country} 3) {year, brand, province_or_state} 4) {item_name, province_or_state} where year = 2004 Which should be selected to process the query? l Explore indexing structures & compressed vs. dense arrays in MOLAP

25 25 What to Materialize? l Store in warehouse results useful for common queries l Example: day 2 day 1 129... total sales materialize

26 26 Materialization Factors l Type/frequency of queries l Query response time l Storage cost l Update cost Will study a concrete algorithm later

27 27 Iceberg Cube l Computing only the cuboid cells whose count or other aggregates satisfying the condition like HAVING COUNT(*) >= minsup l Motivation u Only a small portion of cube cells may be “above the water’’ in a sparse cube u Only calculate “interesting” cells—data above certain threshold

28 28 Challenges in MOLAP l Storing large arrays for efficient access u Row-major, column major u Chunking u Compressing sparse arrays l Creating array data from data in tables l Efficient techniques for Cube computation Topics are discussed in the paper for reading


Download ppt "1 Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7."

Similar presentations


Ads by Google