Presentation is loading. Please wait.

Presentation is loading. Please wait.

Partitioned Sorting of Bitmap Indices Kyle Brooks.

Similar presentations


Presentation on theme: "Partitioned Sorting of Bitmap Indices Kyle Brooks."— Presentation transcript:

1 Partitioned Sorting of Bitmap Indices Kyle Brooks

2 What’s a bitmap index? Database Indices: – Like an index at the end of a book – Allows quick lookups Bitmap Indices: – Database index – Rows are entries, columns are attributes – Querying is implemented at the CPU level – They are super fast!

3 Run Length Compression We can represent a run of one thousand 1’s in a column of the bitmap as the tuple (1, 1000) We can optimize a bitmap for Run Length Compression by reordering its rows so that there are fewer, longer runs Finding the perfect row order is NP-hard – Need to come up with more efficient approximations

4 Sorting to improve compression Sorting the rows of a bitmap increases run length Lexicographical sort can improve compression by a factor of nine Unfortunately, sorting can be too computationally intense for very large bitmaps

5 Partition sorting Bitmaps can be too large to fit into memory We partition a large bitmap into sections that we can sort in memory After sorting individual sections, partitions are recombined to form the complete bitmap once again

6 Stages of Experiment Raw Data Set Partitioning Sort Individual Partitions Binning Bitmap Recombine Partitions Compress with EWAH I started with two raw data sets: one was from US Census data from 1990, and the other was from a data mining competition in 1999 High-cardinality attributes were “binned:” large intervals of continuous data were replaced with a single value representative of that interval. The binned data sets were then transformed into bitmaps. Then, the bitmaps were split into partitions. The size of these partitions varied from 50,000 to 500,000 lines. The individual partitions were lexicographically sorted using a typical sorting algorithm. Then, the sorted partitions were recombined to form the complete bitmap. Finally, the complete bitmap was compressed using EWAH (A run length compression scheme using 64-bit words)

7 Results Partitioned sorting can improve compression enormously. Compression performance is dependent on the number of partitions. – Doubling number of lines in partition  8.5% reduction in size of compressed bitmap Complete sort outperforms the partition-sorted bitmaps by a large margin

8 Future Research: Smart Partitioning Instead of partitioning blindly, use “smart” partitioning techniques Arranging a list so that identical prefixes are consecutive can be done in O(n) – Example output: [cat, dog, dad, day, bat, box] Because of data correlation, arranging the first columns could lead to fewer runs in the rest I plan to present this new research at CCNC- NW in October


Download ppt "Partitioned Sorting of Bitmap Indices Kyle Brooks."

Similar presentations


Ads by Google