Lecture 6 Indexing Part 2 Column Stores
Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)-- / O(P) O( log B n + R ) LookupO(P)O(C)O(1)O( log B n ) n : number of tuples P : number of pages in file B : branching factor of B-Tree (keys / node) R : number of pages in range C: cardinality (#) of unique values on key
B+ Tree Indexes Balanced wide tree Fast value lookup and range scans Each node is a disk page (except root) Leafs point to tuple pages
4 Secondary Indices Example Index record points to a bucket that contains pointers to all the actual records with that particular search-key value. Secondary indices have to be dense Secondary index on balance field of account
B+ Tree Insertion Locate leaf where for new key and pointer Insert into leaf node If overfull, split node Recursively update parents to keep tree balanced and (non-root) nodes >= half full
B+ Tree Insertion Insert Clearview
B+ Tree Insertion B + -Tree before and after insertion of “Clearview”
B+ Tree Deletion Find leaf key and pointer Delete from leaf If leaf underfull (> ½ entries used), rebalance with neighbors Recursively update parents to keep balance and reflect new leaf contents – May delete root with one entry
B+ Tree Deletion Example Deleting “Downtown” causes merging of under-full leaves – leaf node can become empty only for n=3! Before and after deleting “Downtown”
Study Break: B+ Tree See tree on board Insert 9 into the tree Insert 3 into the original tree Delete 8 from start tree w/left leaf redistribution Delete 8 with right redistribution
Column Store Performance How much do these optimizations matter? Wanted to compare against best you could do with a commercial system
12 Emulating a Column Store Two approaches: 1.Vertical partitioning: for n column table, store n two-column tables, with ith table containing a tuple-id, and attribute i Sort on tuple-id Merge joins for query results 2.Index-only plans Create a secondary index on each column Never follow pointers to base table
13 Bottom Line Time (s) SSBM (Star Schema Benchmark -- O’Neil et al ICDE 08) Data warehousing benchmark based on TPC-H Scale 100 (60 M row table), 17 columns Average across 12 queries Row store is a commercial DB, tuned by professional DBA vs C-Store Commercial System Does Not Benefit From Vertical Partitioning
14 Problems with Vertical Partitioning ①Tuple headers Total table is 4GB Each column table is ~1.0 GB Factor of 4 overhead from tuple headers and tuple-ids ②Merge joins Answering queries requires joins Row-store doesn’t know that column-tables are sorted Sort hurts performance Would need to fix these, plus add direct operation on compressed data, to approach C-Store performance
Problems with Index-Only Plans Consider the query: SELECT store_name, SUM(revenue) FROM Facts, Stores WHERE fact.store_id = stores.store_id AND stores.country = “Canada” GROUP BY store_name Two WHERE clauses result in a list of tuple IDs that pass all predicates Need to go pick up values from store_name and revenue columns But indexes map from value tuple ID! Column stores can efficiently go from tuple ID value in each column
16 Recommendations for Row-Store Designers Might be possible to get C-Store like performance ①Need to store tuple headers elsewhere (not require that they be read from disk w/ tuples) ②Need to provide efficient merge join implementation that understands sorted columns ③Need to support direct operation on compressed data Requires “ late materialization ” design
Study Break: Column Stores Given the schema: grades (a_cid int, student_id int, grade char(2), grade_num int) Estimate how much data we would read if we select avg(grade_num) from 1M records in column store? – What about a row store? If we have 5k students, how much data do we need to access to count the number of students who have earned an A where a_cid=339. Do the same exercise with a row store.
Column Stores Solution Column store: avg(grade_num) = 8 bytes * 1M tuples = 8 MB Row store: (3*8 + 2) bytes * 1M = 26 MB Count # of tuples from two cols, 8 bytes (a_cid) + 2 bytes (grade) * 1M = 10 MB Row store: 26 MB again