C-Store: Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 27, 2009
Motivation In a Column-Oriented DBMS, columns are stored separately Separate column values of the same logical tuple must be stitched together when the tuple is finally returned to a user.
How to Identify Column Values of the Same Logical Tuple? Attach either physical or virtual tuple ID or positions to column values. In the Read Store of C-Store, a Storage Key is equal to a position in a column. In the Write Store of C-Store, a Storage Key is physically stored as a tuple ID. Tuple Reconstruction is easy if columns are sorted in the same order Join on the positions instead of on the physical tuple ID.
Two Strategies of Tuple Reconstruction Early Materialization (EM) Whenever a column C1 is accessed, add C1 (concrete column values) to an intermediate tuple representation if C1 is needed by some later operator, or if C1 is one of the output columns. Late Materialization (LM) Construct tuples as late as possible.
Tuple Reconstruction: An Example (1) Assume a relation R has 3 columns R.a, R.b, R.c All the 3 columns are sorted in the same order, and are stored in separate files. Suppose a query consists of 3 selection predicates σ 1, σ 2, σ 3 over R.a, R.b, R.c respectively σ 1 is the most selective predicate σ 3 is the least selective predicate
Tuple Reconstruction : An Example (2) An early materialization strategy could process the query as follows: Read in a block of R.a, a block of R.b, and a block of R.c from disk. Stitch them together into block(s) of triples (R.a, R.b, R.c ). Apply σ 1, σ 2, σ 3 in turn, allowing tuples that match the predicates to pass through.
Tuple Reconstruction : An Example (3) A late materialization strategy could process the query as follows: First scan R.a, and output the positions in R.a that satisfy σ 1. Second scan R.b, and output the positions in R.b that satisfy σ 2. Third scan R.c, and output the positions in R.c that satisfy σ 3. Fourth use position-wise AND to find the intersection of the 3 position lists. Finally re-scan R.a, R.b, and R.c, and extract the values of the records whose positions are in the intersection, and stitch these values together into output tuples.
Late Materialization: Potential Pros and Cons + Operating directly on position lists + Constructing only relevant tuples. - re-scanning the base columns to form tuples.
Early Materialization Advantages No need to re-scan a column. If the re-scanning cost at tuple reconstruction time is high, early materialization gets bonus.
An Analytical Model for Comparing the Two Materialization Strategies The model is composed of 3 types of operators: Data Source (DS) operator AND operator Tuple Construction operator These operators are enough for expressing simple queries using each materialization strategy.
Data Source (DS) operator: Case 1 Input A column C i of | C i | blocks from disk. A predicate with selectivity SF. Ouput A column of positions of the tuples that satisfy the predicate. Used by late materialization.
Data Source (DS) operator: Case 2 Input A column C i of | C i | blocks from disk. A predicate with selectivity SF. Ouput A column of (position, value) pairs of the tuples that satisfy the predicate. Used by early materialization.
Data Source (DS) operator: Case 3 Input A column C i of | C i | blocks from disk or memory. A list of positions, i.e., POSLIST. Ouput A column of the values corresponding to the positions in POSLIST. Used by late materialization.
Data Source (DS) operator: Case 4 Input A column C i of | C i | blocks from disk. A predicate with selectivity SF. A set of intermediate tuples of the form (pos, ). Ouput A set of intermediate tuples of the new form (pos, <a 1,..., a n,, a n+1 ), i,e., adding column C i to tuples. Used by early materialization.
The AND Operator Input: k position lists, inpos 1,...,inpos k. Output: outpos: a new list of positions representing the inetersection of those input lists. Operating on positions is fast.
Tuple Construction Operators The MERGE operator input: k sets of values VAL 1,...,VAL k. output:a set of k-ary tuples. This operator is used to construct tuples at the top of a late materialization plan. The SPC(Scan, Predicate, and Construct) operator input: k columns VAL 1,...,VAL k from disk; a set of predicates. output:a set of tuples that pass all predicates. This operator can sit at the bottom of an early materialization plan.
Example Query Plans: EM
Example Query Plans: LM
Optimization in Late Materialization Data Source Case 3: produce values from positions Input A column C i of | C i | blocks from disk or memory. A list of positions, i.e., POSLIST. Ouput A column of the values corresponding to the positions in POSLIST. Optimization If the column is in memory, do not read it from disk. i.e., reduce the cost of re-scanning a column.
LM Optimization: Multi-Columns A Multi-Column is a specialized data structure allows blocks of column data to remain in memory after the first scan so that those blocks can be easily scanned again later on. Contains a memory-resident, horizontal partition of some subset of columns from a logical relation.
Components of a Multi-Column A covering position range: Indicates the virtual start position and end position of the horizontal partition An array of mini-columns: A mini-column is the set of corresponding values for a specified position range of a column. Each mini-column is kept compressed the same way as it was on disk. A position descriptor: Indicates which positions in the position range remain valid.
Construction of a Multi-Column Initially a multi-column contains only one mini- column. When a page of a column is read from disk, a mini-column is created with a position descriptor indicating that all positions are valid. Each mini-column can be just a pointer to the page in the buffer. A modified AND operator is used to merge two multi-columns into a wider multi-column.
The Use of a Multi-Column If a DS Case 3 operator takes as input a multi-column rather than just a position list, then it has no need to re-scan the column (from disk).
Predicated vs. Actual Behavior
Heuristic for Choosing Materialization Strategy Use Late Materialization If a query contains aggregation, or if the selectivity of predicates in the query is small. Use Early Materialization in contrast to the conditions for late materialization.
References Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C-Store: A Column Oriented DBMS, VLDB, Daniel J. Abadi, Daniel S. Myers, David J. DeWitt, and Samuel R. Madden 。 Materialization Strategies in a Column-Oriented DBMS. Proceedings of ICDE, April, 2007, Istanbul, Turkey.