Download presentation
Presentation is loading. Please wait.
1
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Physical Data Warehouse Design Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business
2
2 It’s all about trading storage for speed! Fundamentals Aggregates (Ch. 16, pp. 356 - 357) Indexes (Ch. 16, p. 357)
3
3 Fundamentals: the Storage Hierarchy CPU Cache Memory Disk Storage Capacity Small Large Access Speed Slow Fast 10 -8 second 10 -7 second 10 -2 second 500-1000 MIPS 512 KB 512 MB 512 GB
4
4 Fundamentals: the Storage Hierarchy CPU Memory Disk Disk Drive (I/O Channel) Cache Bus How long does it take to query sales by city? How large is the Sales Fact table? How long does it take to access the Sales Fact table?
5
5 Fundamentals How large is the fact table? e.g., 1 million records/day, 0.2KB/record 0.2 GB/day
6
6 Fundamentals How long does it take to access all the fact records? E.g., the small fact table is 1 Terabyte in size! –0.01s*10 12 =325 years LONG!!!!!!!!!!!!!
7
7 Fundamentals: the Storage Hierarchy CPU Memory Disk Disk Drive (I/O Channel) Cache Bus The logical unit of data transferred between disk and memory is block (e.g., 4k bytes)
8
8 Fundamentals How long does it take to access all the fact records? E.g., the small fact table is 1 Terabyte in size! –Number of blocks: 2.5 millions –Access time = 0.01s*2500000= < 7 hrs!!!
9
9 Aggregate In data warehouse design, we choose the gain of fact table to be the possible lowest level. Grain: orderline
10
10 Aggregate The reasons to choose the lowest level of fact: –(X) Analysts want to query on single record –(O) Analysts want to flexibly cut and group records.
11
11 Aggregate However, keeping the most detailed fact records could result in –huge-size fact table: TeraBytes?! (1 million records/day, 256 Bytes/record -> 0.2 GB/day) –slow query
12
12 Aggregate To keep s data warehouse flexible, fact tables need to store facts in their lowest levels of detail. To improve query performance, another type of fact table which stores pre-computed summaries of detailed facts helps. Reduced to a logical design solution
13
13 Aggregate An aggregate fact table is a fact table that summarizes base-level fact table records along one or several dimensions. An aggregate dimension table is a dimension table that summarizes base-level dimension table records. E.g., marketing managers check daily product sales by city --- aggregate by city in customer dimension
14
14 Aggregate Aggregate fact table Aggregate dimension table
15
15 Aggregate
16
16 Indexes How long does it take to find out the total purchase Amt by Tom Jones?
17
17 Indexes Customer table –1M records, each record 0.200 Kbytes long –Block is 4K size, block access time is 0.01s –Number of records/block: 4/0.2=20 –Number of blocks: 1M/20=50K Sequential search –Time: 25K*0.01s=250s=4min.
18
18 Indexes Binary search –Time: log(50K)*0.01s=16*0.01s=0.16s B+ tree index –Create index pn on customer(cname); –If each node (block) in B+ tree has 117 keys, then # of access to indexes: log 117 (1M)=3 (i.e.height of the tree) # of access to Customer Dimension: 1 Total time = 4*0.01 = 0.04s
19
19... (11 key values, 12 pointers)... B+-trees - P=12 Indexes to customer records ………. Indexes to indexes
20
20 Indexes How long does it take to find out the total sales of Desktop computers?
21
21 Performance Improvement Suppose there are only 4 product categories for 1M products Create a B+ tree index??? –Suppose the size of product category and block ID is 10 bytes –Size of index = 1M * 10 = 10 M bytes
22
22 Performance Improvement A bitmap index for an attribute A is a collection of bit vectors, one for each possible value of A. The vector for value v has 1 in position i if the ith record has v for attribute A.
23
23 Bitmaps Product record 1record 2record 3 A bitmap index for an attribute A is a collection of bit vectors, one for each possible value of A. The vector for value v has 1 in position i if the ith record has v for attribute A.
24
24 Performance Improvement Bitmap index is suitable for low cardinality attribute. –Cardinality(A) = # of possible values for A/#of records Compared with B+ tree index, bitmap index has the following advantages for low cardinality attributes –Storage space saving (1M*4/8=500K bytes) –Efficient for boolean operations CREATE BITMAP INDEX bitpc ON PRODUCT (PCNAME);
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.