Download presentation
Presentation is loading. Please wait.
Published byJaheim Larrabee Modified over 9 years ago
1
V Storage Manager Shahram Ghandeharizadeh Computer Science Department University of Southern California
2
Simple Tests Arbitrary Arbitrary Designed to stress your software and ensure its robustness: Designed to stress your software and ensure its robustness: Repeated creation of the same datazone name. Deletion of records in iteration j when they were deleted in iteration (j-1) Etc. Designed to be scalable. Designed to be scalable. Your test can be implemented with a variable A that captures the number of records in alpha (10,000). Value of all other important parameters can be a function of A, e.g., number of records for beta, number of iterations in Test 3. Start with a small value for A, say 100. Once your tests are working, increase the value of A to 1,000. Once this works, increase A to 10,000. Do not be surprised to find your code break when you increase the value of A from 100 to 1,000. This is the reality of developing robust software: Size matters!
3
Suggestion Focus on a single-threaded version of your implementation. Focus on a single-threaded version of your implementation. Once all tests are running, extend to analyze the impact of multi-threading. Once all tests are running, extend to analyze the impact of multi-threading. This may require a re-visit of your designs.
4
Heap versus Stack Execution of your program consists of two kinds of dynamic memory: Heap and Stack. Execution of your program consists of two kinds of dynamic memory: Heap and Stack. Use of malloc and new allocates memory from Heap. Use of malloc and new allocates memory from Heap. The programmer is responsible to free this memory and return it to heap. The programmer is responsible to free this memory and return it to heap. Invocation of a method uses a stack. All variables declared in a method are placed on the stack. When the method returns, its stack is freed. Invocation of a method uses a stack. All variables declared in a method are placed on the stack. When the method returns, its stack is freed. heap Code static data stack
5
Heap versus Stack (Example) In method Test 1, the character array named “payload” is declared on the stack when Test 1 is invoked. In method Test 1, the character array named “payload” is declared on the stack when Test 1 is invoked. Its memory is freed when Test 1 completes execution. Its memory is freed when Test 1 completes execution. The programmer is NOT responsible fore managing the memory assigned to payload because it is a local variable managed using the stack. The programmer is NOT responsible fore managing the memory assigned to payload because it is a local variable managed using the stack. Test 1 () { char[10000] payload; vdt vptr; vptr.set_data(payload);….} heap Code static data stack
6
Heap versus Stack (Example) In method Test 1, the character array named “payload” is assigned memory from the heap (using new). In method Test 1, the character array named “payload” is assigned memory from the heap (using new). The variable payload is on the stack! The memory pointed to by “payload” is allocated from the heap. The programmer is responsible for freeing this memory using delete. The programmer is responsible for freeing this memory using delete. Test 1 () { char *payload; vdt vptr; payload = new char[1000]; vptr.set_data(payload);….} heap Code static data stack
7
Urban Legends about Heap The following is FALSE: Memory allocated in method X can be freed only in method X. See the example as proof. The following is FALSE: Memory allocated in method X can be freed only in method X. See the example as proof. Cause: Debugging C/C++ programs is difficult. It is easy to corrupt memory if you are not careful! These errors are difficult to find. They are also stressful, resulting in beliefs that are not true Urban legend is born. Cause: Debugging C/C++ programs is difficult. It is easy to corrupt memory if you are not careful! These errors are difficult to find. They are also stressful, resulting in beliefs that are not true Urban legend is born. How to avoid these kinds of conceptual traps? Write small programs to verify a belief that sounds too good to be true. It is simple and avoids digressions that waste your time and cause a lot of heart ache. How to avoid these kinds of conceptual traps? Write small programs to verify a belief that sounds too good to be true. It is simple and avoids digressions that waste your time and cause a lot of heart ache. GenMem::GenMem(Vdt *v) { char *cptr; cptr = new char[10]; v->set_data(cptr); memcpy(cptr, "Shahram", 7); } int _tmain(int argc, _TCHAR* argv[]) { Vdt vptr; char *cptr; GenMem *GM = new GenMem(&vptr); cptr = (char *) vptr.get_data(); delete cptr; printf ("Exiting this simple test."); return 0; }
8
Variant Indexes by P. O’Neil and D. Quass Shahram Ghandeharizadeh Computer Science Department University of Southern California
9
Key Assumptions A read-mostly database that is updated infrequently. A read-mostly database that is updated infrequently. Complex indexes to speedup queries. Complex indexes to speedup queries. Focuses on physical designs to enhance performance. Focuses on physical designs to enhance performance.
10
Example Data Warehouse McDonalds keeping track of different sandwich purchases. McDonalds keeping track of different sandwich purchases. CidPidDayAmtdollar_costUnit_sales SALES PidNameSizeWeightPackage_type PROD DayWeekMonthYearHollidayWeekday TIME
11
Example Data Warehouse Key Observations: Key Observations: A handful of products, a PROD table with tens of rows. Many millions of rows for SALES tables. CidPidDayAmtdollar_costUnit_sales SALES PidNameSizeWeightPackage_type PROD DayWeekMonthYearHollidayWeekday TIME
12
A B+-Tree On the Pid of Sales Assuming McDonald’s sales 12 different products Assuming McDonald’s sales 12 different products Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Big Mac, (1,1), (1, 3), (1, 4), (2,4), …. B+-tree Leaf page
13
A B+-Tree On the Pid of Sales Assuming McDonald’s sales 12 different products Assuming McDonald’s sales 12 different products Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Big Mac, (1,1), (1, 3), (1, 4), (2,4), …. B+-tree Leaf page What happens with a SALES table consisting of a million rows?
14
A B+-Tree on Major Holidays A B+-tree index on different holidays of the SALES table. A B+-tree index on different holidays of the SALES table. Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Pres day, (1,2), (1, 3), (1, 4), (2,1), …. B+-tree Leaf page
15
A B+-Tree on Major Holidays A B+-tree index on different holidays of the SALES table. A B+-tree index on different holidays of the SALES table. Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Pres day, (1,2), (1, 3), (1, 4), (2,1), …. B+-tree Leaf page Value List
16
A B+-Tree on Major Holidays A B+-tree index on different holidays of the SALES table. A B+-tree index on different holidays of the SALES table. Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Pres day, (1,2), (1, 3), (1, 4), (2,1), …. B+-tree Leaf page Value List RID List
17
Conjunctive Queries Count number of Big Mac Sales on “President’s Day” assuming a B+-tree on product (pid) and day of SALES Count number of Big Mac Sales on “President’s Day” assuming a B+-tree on product (pid) and day of SALES With RID-Lists With RID-Lists Get the Value-List for “Big Mac” using the B+- tree, obtain RID-List1. Get the Value-List for “President’s Day” using the B+-tree, obtain RID-List2. Compute set-intersect of RID-List1 and RID-List2 Count the number of RIDs in the intersection set. Is there a better way? Is there a better way? Yes, use bit-maps and logical bit-wise operands.
18
Bitmap Indexes Use a bitmap to represent the existence of a record with a certain attribute value. Use a bitmap to represent the existence of a record with a certain attribute value. Example: If a record has the indexed attribute value “Big Mac” then its corresponding entry in the bitmap is set to one. Otherwise, it is a zero. Example: If a record has the indexed attribute value “Big Mac” then its corresponding entry in the bitmap is set to one. Otherwise, it is a zero.
19
A Bitmap A Bitmap B is defined on T as a sequence of M bits. A Bitmap B is defined on T as a sequence of M bits. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Pres Day, 0100001100111111110000011001….. Record 0
20
A Bitmap A Bitmap B is defined on T as a sequence of M bits. A Bitmap B is defined on T as a sequence of M bits. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Pres Day, 0100001100111111110000011001….. Record 1
21
A Bitmap A Bitmap B is defined on T as a sequence of M bits. A Bitmap B is defined on T as a sequence of M bits. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Pres Day, 0100001100111111110000011001….. Record 2
22
A B+-Tree on Major Holidays A B+-tree index on different holidays of the SALES table. A B+-tree index on different holidays of the SALES table. Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Pres day, 01111111…. B+-tree Leaf page
23
Logical Bit-Wise Operations Three key operands: AND, OR, NOT Three key operands: AND, OR, NOT Assume a bit map consisting of 4 bits: Assume a bit map consisting of 4 bits: 0011 AND 0101 = 0001 0011 OR 0101 = 0111 NOT 0011 = 1100 This paper assumes bit maps consisting of millions, if not billions, of bits. In Example 3.1, they assume a bitmap consisting of 100,000,000 bits, 12.5 Mega bytes. This paper assumes bit maps consisting of millions, if not billions, of bits. In Example 3.1, they assume a bitmap consisting of 100,000,000 bits, 12.5 Mega bytes. A large bit map is stored in a sequence of disk pages. Each disk page full of bits is termed a fragment. Some bit positions may correspond to non-existent rows. An Existence Bitmap (EBM) has exactly those 1 bits corresponding to existing rows. Some bit positions may correspond to non-existent rows. An Existence Bitmap (EBM) has exactly those 1 bits corresponding to existing rows.
24
Summary ANY QUESTIONS?
25
Range Predicate SELECT target-list SELECT target-list FROM T FROM T WHERE C-range WHERE C-range C-range = {C > c1, C >= c1, C = c1, C c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2} How to process with a bit-map index? How to process with a bit-map index?
26
Range Predicate SELECT target-list SELECT target-list FROM T FROM T WHERE C-range WHERE C-range C-range = {C > c1, C >= c1, C = c1, C c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2} How to process with a bit-map index? How to process with a bit-map index?
27
Range Predicate SELECT target-list SELECT target-list FROM T FROM T WHERE C-range WHERE C-range C-range = {C > c1, C >= c1, C = c1, C c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2} How to process with a bit-map index? How to process with a bit-map index?
28
Conjunctive Queries Count number of Big Mac Sales on “President’s Day” assuming a B+-tree on product (pid) and day of SALES Count number of Big Mac Sales on “President’s Day” assuming a B+-tree on product (pid) and day of SALES With RID With RID Get the Value-List for “Big Mac” using the B+-tree, obtain RID-List1. Get the Value-List for “President’s Day” using the B+-tree, obtain RID-List2. Compute set-intersect of RID-List1 and RID-List2 Count the number of RIDs in the intersection set. With bit maps With bit maps Get the Value-List for “Big Mac” using the B+-tree, obtain bit-map1. Get the Value-List for “President’s Day” using the B+-tree, obtain bit-map2. Recall Existence Bitmap (EBM) identify rows that exist. Let RES = logical AND of bit-map1, bit-map2, and EBM. Count the number of bits set to one to identify how many Big Macs were sold on “President’s Day”.
29
Example 2.1
30
Projection Index Reminiscent of vertical partitioning. Reminiscent of vertical partitioning. Once the qualifying records are found, the projection index enables the system to find the amt attribute value of the record with a few disk I/Os. Once the qualifying records are found, the projection index enables the system to find the amt attribute value of the record with a few disk I/Os. cidpidholliday amt Labor day Presidents day Labor day ….. amt 450699598799520…450699598799520…
31
Projection Index (Definition) Page 41, first paragraph of Section 2.2 Page 41, first paragraph of Section 2.2
32
Projection Index (Example Usage) Page 41, middle of left hand column: Page 41, middle of left hand column:
33
Bit-Sliced Indexes: Motivation Assume the “Amt” values are in dollars and as follows: Assume the “Amt” values are in dollars and as follows: 1357331
34
Bit-Sliced Indexes: Motivation Assume the “Amt” values are in dollars and as follows. Their binary representation is: Assume the “Amt” values are in dollars and as follows. Their binary representation is: 1357331001011101111011011011
35
Bit-Sliced Indexes: Motivation Now, number the order of records as before: Now, number the order of records as before: 13573310010111011110110110110123456
36
Bit-Sliced Indexes: Motivation Construct a Bit-Sliced index: Construct a Bit-Sliced index: 13573310010111011110110110110123456 Bit 0, 1111111 Bit 1, 0101111 Bit 2, 0011000
37
Bit-Sliced Indexes: Motivation To compute the sum of all records using the existence bit-map bnn (1111111): To compute the sum of all records using the existence bit-map bnn (1111111): 13573310010111011110110110010123456 Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000 ?
38
Bit-Sliced Indexes: Motivation To compute the sum of all records using the existence bit-map bnn (1111111): To compute the sum of all records using the existence bit-map bnn (1111111): 13573310010111011110110110010123456 Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000 1 * (7 records with bit 0 set to 1) + 2 * (4 records with bit 1 set to 1) + 4 * (2 records with bit 2 set to 1)
39
Bit-Sliced Indexes: Motivation To compute the sum of all records using the existence bit-map bnn (1111111): To compute the sum of all records using the existence bit-map bnn (1111111): 13573310010111011110110110010123456 Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000 1 * (7 records with bit 0 set to 1) + 2 * (4 records with bit 1 set to 1) + 4 * (2 records with bit 2 set to 1) = (1 * 7) + (2 * 4) + (4 * 2) =23
40
Bit-Sliced Indexes: Definition Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define
41
Bit-Sliced Indexes: Definition Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define Why maintain Bn?
42
Bit-Sliced Indexes: Definition Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define The result of a scalar such as SUM involving a null will itself be a null. Example, see: http://www.oracle.com/technology/oram ag/oracle/05-jul/o45sql.html
43
Bit-Sliced Index 20 Bitmaps for the “Amt” column represents quantities up to 2 20 – 1 pennies, $10,485.75. 20 Bitmaps for the “Amt” column represents quantities up to 2 20 – 1 pennies, $10,485.75. If we assume normal sales range up to $100.00, and all values are as likely to occur, a Value-List index would have nearly 10,000 different values. A Bitmap representation would lose its effectiveness. However, Bit- sliced indexes continue to perform well. If we assume normal sales range up to $100.00, and all values are as likely to occur, a Value-List index would have nearly 10,000 different values. A Bitmap representation would lose its effectiveness. However, Bit- sliced indexes continue to perform well.
44
Example with Value-List Index Assume SALES table has 100 million rows. Each row is 200 bytes in length. Disk page is 4 Kbytes, holding 20 rows. Assume SALES table has 100 million rows. Each row is 200 bytes in length. Disk page is 4 Kbytes, holding 20 rows. Query: Query: SELECT SUM(AMT) FROM SALES WHERE condition Bitmap Bf = the Foundset Bitmap Bf = the Foundset Bitmap Bv for each value Bitmap Bv for each value Bnn = Existance bitmap Bnn = Existance bitmap
45
Example with Bit-Sliced Indexes Query: Query: SELECT SUM(AMT) FROM SALES WHERE condition Bitmap Bf = the Foundset Bitmap Bf = the Foundset Bitmap Bv for each value Bitmap Bv for each value Bnn = Existance bitmap Bnn = Existance bitmap 20 bits: 20 bits: Bit 0, 01010101010… Bit 1, 10101011111… … Bit 19, 0000000001…
46
Other Aggregate Functions Ignore MEDIAN & Column-Product. Ignore MEDIAN & Column-Product. SELECT AGG(C) FROM T WHERE condition SELECT AGG(C) FROM T WHERE condition AGG(C) is COUNT, SUM, AVG, MIN, MAX
47
Range Queries SELECT target-list FROM T WHERE C-range C-range = {C > c1, C >= c1, C = c1, C c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2}
48
Bit-Sliced Indexes Assume c1 = 3, {011} Assume c1 = 3, {011} 13573310010111011110110110010123456 Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000 BGT = BLT = 0000000 BEQ = 1111111 If bit 2 is on in constant c1 {} Else BGT = 0000000 | (1111111 & 0011000) = 0011000 BEQ = 1111111 & ~(0011000) = 1111111 & 1100111 = 1100111 Iteration 1 on Bit 2: BLT = 0000000 BGT = 0011000 BEQ = 1100111
49
Bit-Sliced Indexes Assume c1 = 3, {011} Assume c1 = 3, {011} 13573310010111011110110110010123456 Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000 BLT = 0000000 BGT = 0011000 BEQ = 1100111 If bit 1 is on in constant c1 BLT = BLT | (BEQ & NOT(B1)) = 0000000 | (1100111 & ~(0101110)) = 0000000 | (1100111 & 1010001) = 1000001 BEQ = BEQ & B1 = 1100111 & 0101110 = 0100110 Iteration 2 on Bit 1: BLT = 1000001 BGT = 0011000 BEQ = 0100110
50
Bit-Sliced Indexes Assume c1 = 3, {011} Assume c1 = 3, {011} 13573310010111011110110110010123456 Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000 BLT = 1000001 BGT = 0011000 BEQ = 0100110 If bit 0 is on in constant c1 BLT = BLT | (BEQ & NOT(B1)) = 1000001 | (0100110 & ~(1111111)) = 1000001 | (0100110 & 0000000) = 1000001 BEQ = BEQ & B1 = 0100110 & 1111111 = 0100110 Iteration 3 on Bit 0: BLT = 1000001 BGT = 0011000 BEQ = 0100110
51
Bit-Sliced Indexes & Range Queries Note that = are computed using BEQ, BLT and BGT Note that = are computed using BEQ, BLT and BGT
52
Range Queries
53
Variant Indexes You are not responsible for Section 5, OLAP style queries. You are not responsible for Section 5, OLAP style queries.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.