Download presentation
Presentation is loading. Please wait.
Published byJayson Blake Modified over 9 years ago
1
Improved Query Performance With Variant Indexes Patrick O’Neil, Dallan Quass Presented by Bo Han
2
Motivation 1. Speed up the queries on data warehoues data warehoues are large read-mostly always perform queries of aggregate, filter, and group the data
3
Motivation 2. The first rigorous examination of variant indexes in the literature Advantages over traditional Value-List indexes for certain classes of queries More than one type of index available on a column
4
Motivation 3. Introducing a new indexing approach to support OLAP-type queries Datacube Multi-dimensional query Depends on summary tables
5
Value-List Index (B + tree) Problem: A key values will have large number of associated RID’s! BrightonDowntown Mianus A212 Brighton 750 A101Downtown500 A110Downtown600 ……… Leaf Node RID
6
Bitmap Indexes A Bitmap for a value: an array of bits. The ith bit is set to 1 if the ith record has the value A Bitmap index: consists of one bitmap for each value that attribute can take A Bitmap is an alternate method of representing RID-lists in low-cardinality a Value-List index (low-cardinality) PidBrandP_typeSize 0000001DellP114 0000002HPP213 0000003SonyP115 0000004DellP114 0000005HPP214 0000006IBMP312 0000007HPP212 Bitmap for Size 12 0000011 13 0100000 14 1001100 15 0010000 Bitmap for Brand Dell 1001000 HP 0100101 Sony 0010000 IBM 0000010
7
Bitmap Indexes 1. More space efficient than RID lists in a Value-List index No compression |RID|=32bits, #row=n, #distinct value=m If m<32 m*n<32*n Compression: Run-length encoding 2. More CPU efficient for may functions Boolean operations ex1: Select Brand From Product Where Brand=‘HP’ and Size=13 (AND) ex2: Select Pid From Product Where Size>12 and Size<15 (OR)
8
2. More CPU efficient for may functions Count Select count(*) From Product Where Brand=‘Dell’ and Size>14 3.Each individual bitmap is small and frequently used ones can be cached in memory 4. Available in most major commercial DBMS Bitmap Indexes
9
Projection Index A projection index for column duplicates all column values for lookup by ordinal number. Col1Col2 v1 v2. v k Col3Col4 Col2 v1 v2. v k projection index for col2 Easy to locate N=1000*p+s (p: page#, s: slot#) Few disk I/O
10
Bit-Sliced Index A set of bitmap slices which are orthogonal to the data held in a projection index. (i.e. a bitwise vertical partition) 0 1 0 1 1 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 B0 B0 B1B1 bit-slice B nn : bitmap representing set of non null values in the indexed column Col2: 20 52 20 62 10 34 1 49 B2B2 B3B3 B4 B4 B5 B5
11
Comparison of Indexes (evaluating Single-Column Sum Aggregates) Select SUM(doloar_sales) From Sales Where condition Analyze the disk page I/O cost Plan 1 : Direct access to the rows to calculate the Sum 100million rows, Len(row)=200B, |page|=4K 20rows/page, |Foundset|=2million rows Plan 2 : Calculating Sum through a Projection Index Len(doloar_sales)=4B, 1000rows/page 100,000 pages
12
Plan 3 : Calculating the Sum through a Value-List(Bitmap) Index if (COUNT (Bf AND Bnn) = = 0) Return null; SUM = 0.0; for each non-null value v in the index for C { Designate the set of rows with value v as Bv SUM += v * COUNT(Bf AND Bv); } Return SUM; Bf: 100,000,000bits=12,500,000B 3125 pages Bv: 100,000,000RIDs of 4 bytes each 100,000 pages Total: 103,125pages Comparison of Indexes (evaluating Single-Column Sum Aggregates)
13
Plan 4 : Calculating the SUM through a Bit-Sliced Index if (COUNT (Bf AND Bnn) = = 0) Return null; SUM = 0.0; for i = 0 to N SUM += * COUNT(Bi AND Bf); Return SUM; Bf: 100,000,000bits=12,500,000B 3125 pages 2 million rows: 21Bitmaps Total:22*3125=68,750 pages Comparison of Indexes (evaluating Single-Column Sum Aggregates)
14
MethodI/OCPU contribution Add from Rows1,341 KI/O + 2M*(25 ins) Projection index100KI/O + 2M*(10 ins) Value-List index103KI/O + 100M*(10 ins) Bit-Sliced index69KI/O + 197M*(1 ins) Comparison of Indexes (evaluating Single-Column Sum Aggregates)
15
Evaluating Aggregate Function AggregateValue-List Index Projection Index Bit-Sliced Index COUNTNot needed SUMNot badGoodBest AVGNot badGoodBest MAX/MINBestSlow MEDIAN,N- Tile Usually BestNot UsefulSometimes Best Column- Product Very SlowBestVery Slow
16
Range Evaluation Performance Range Evaluation Value-List Index Projection Index Bit-Sliced Index Narrow RangeBestGood Wide RangeNot BadGoodBest
17
Evaluating OLAP-style Queries OLAP approach creates precalculates results of some Grouped Queries and stores them in summary tables. The expected set of queries is known in advance? Size of data in summary tables grows as the product of the number of values in the independent dimensions (space requirement?) How to speed up Join and Group By ? Join Indexes and Bitmap-Join-Indexes
18
Join Indexes PidBrandP_typeSize 0000001DellP114 0000002HPP213 0000003SonyP115 0000004DellP114 0000005HPP214 0000006IBMP312 0000007HPP212 A join index: an index on one table that involves a column value from different table through a commonly encountered join. CidPidDollar_salesUnit 0100000000112001 0101000000326002 0110000000216001 0111000000112001 CidSize 010014 011114 011013 010115
19
Bitmap Join Index CidPidDollar_salesUnit 0100000000112001 0101000000326002 0110000000216001 0111000000112001 PidBrandP_typeSize 0000001DellP114 0000002HPP213 0000003SonyP115 0000004DellP114 0000005HPP214 0000006IBMP312 0000007HPP212 DellHPSonyIBM 1000 0010 0100 1000 A Bitmap Join Index spans multiple tables and improves query performance between the joined tables.
20
PidBrandP_typeSize 0000001DellP114 0000002HPP213 0000003SonyP115 0000004DellP114 0000005HPP214 0000006IBMP312 0000007HPP212 CidPidDollar_salesUnit 0100000000112001 0101000000326002 0110000000216001 0111000000112001 CidState 0100CA 0101NY 0110CA 0111PA Select Sum(Dollar_sales) From Sales S Natural Join Product P Natural Join Customer C Where P.Brand=‘Dell’ AND C.State=‘PA’ 1 0 0 1 0 0 0 1 0 0 0 1 = and Bitmap Join Index
21
Calculating Groupset Aggregates Select Sum(F.A) From S,D1,D2,D3 Where condition Group by D1.d1, D2.d2, D3.d3 Using Value-List index to determine Groupset (F.di=Di.di, without join!) Using Projection index on F.A to get SUM(F.A)
22
Improved Grouping Efficiency Problem: Groupsets and rows are randomly placed on disk. Segmentation: Partition rows in F into Segments. Query evaluation: one segment at a time. Clustering: Cluster the fact table F D1 =d1-1 111111111111111111111110000000000000000000… =d1-2 000000000000000000000001111111111111111000… …… D2 =d2-1 111111000000000000000001111110000000000000… =d2-2 000000111111100000000000000001111111100000… …… D3 =d3-1 110000110000000000000001111110000000000000… =d3-2 001100001100000000000000000001111111100000… …… =d3-n3 000011000001100000000000000001111111100000… (d1-1, d2-1, d3-1) 11000000000000000000000000000000000000… (d1-1, d2-1, d3-2) 00110000000000000000000000000000000000… Groupset Indexes: Keyvalues are a concatenation of the dimensional primary-key values
23
Conclusion Analyze Value-List index, Bitmap index, Projection index and Bit-Sliced index Combine Bitmap indexing and physical row clustering to evaluate OLAP queries involving aggregation and grouping
24
Reference 1.Improved Query Performance With Variant Indexes – Patrick O’Neil and Dallan Quass, Proc. ACM SIGMOD Conf. 1997, Pages 38-49. 2.Bitmap Index Design and Evaluation – C.Y. Chan and Y.E. Ioannidis 1998. 6 3.Database System Implementation – Hector Garcia M., Jeffrey D.U. and Jennifer W., Prentice Hall, 2000 4.Encoded Bitmap Indexing for Data Warehouses – M.C. Wu and A.P. Buchmann 1998. 2 5.An Efficient Bitmap Encoding Scheme for Selection Queries – C.Y. Chan and Y.E. Ioannidis 1998. 6 6.Multidimensional Indexing and Query Coordination for Tertiary Storage Management – A. Shoshani and L.M. Bernardo, etc. 1999. 10 7.Multi-Table Joins Through Bitmapped Join Indices – P. O’Neil and G. Graefe 1995. 9
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.