Download presentation
Presentation is loading. Please wait.
Published byBennett Dennis Modified over 9 years ago
1
C-Store: A Column-oriented DBMS Speaker: Zhu Xinjie Supervisor: Ben Kao
2
C-Store: A Column-oriented DBMS Introduction Data model RS (read-optimized store) WS (writeable store) Tuple mover Performance comparison
3
Introduction Most existing DBMS are record-oriented (row-oriented) storage systems, whose major features consist of: Store complete tuples of tabular data along with auxiliary B-tree indexes on attributes in the table store values in their native data format Effective on OLTP-style applications
4
Introduction Deficiencies of row-oriented store: Bring into memory irrelative attributes for processing a given query Ineffective in read-mostly (ad hoc query) environment, i.e., not support read-optimized Shifting data values onto byte or word boundaries in main memory is expensive
5
Introduction C-Store physically stores a collection of column-oriented overlapping projections, each sorted on some attributes. Code data elements into a more compact form Query executor operates on the compressed representation to avoid the cost of decompression.
6
Introduction C-Store is implemented as a grid environment where there are G nodes with private disk and private memory. Redundant objects to be stored in different sort-orders provide higher retrieval performance and high availability (K-safe) Simultaneously achieve very high performance on queries and reasonable speed on OLTP-style transactions
7
Introduction Architecture of C-Store: Updates and transactions are sent to WS Queries are sent to RS Tuple mover moves tuples from WS to RS
8
Data Model C-Store implements only projections. Each projection is anchored on a given logical table T, and contains one or more attributes from T. In addition, a projection may also contain other attributes from other non-anchored table.
9
Data Model EMP1, EMP2 and EMP3 are anchored on Table EMP. DEPT1 is anchored on Table DEPT.
10
Data Model If there are k attributes in a projection, then k data structures store k columns, respectively, each of which is sorted on the same sort key (any column or columns).
11
Data Model Every projection is horizontally partitioned into one or more segments identified by a segment identifier Sid.
12
Data Model For every table, there must be a covering set of projections such that every column is stored in at least one projection. To reconstruct complete rows of tables from the stored segments needs: Storage Key Join Indices
13
Data Model Storage Key: each segment associates every data value of every column with a storage key, SK. Values from different column in the same segment with matching SK belongs to the same logical row. SK are integers and not physically stored in RS, but physically stored in WS.
14
Data Model Join Indices: if T1 and T2 are two projections anchored on a table T, a join index from T1 to T2 is logically a collection of tables, one per segment of T1 consisting of rows of the form: (s: Sid in T2, k: SK in s)
15
RS Any segment of any projection is broken into columns, each of which is stored in order of the sort key for the projection. Selecting one of four encoding schemes for a column depends on its ordering (self-order or foreign order) and the proportion of distinct values it contains.
16
RS Type1 self-order, few distinct values a column represented by a sequence of (v,f,n) such that v is the value, f is the position where v first appears and n is the number of times v appears, e.g.(4,12,7)means a group of 4’s appear in position 12,13,…18 in the column. Type2 foreign-order, few distinct values a column represented by a sequence of (v,b) such that v is the value and b is a bitmap indicating the positions where v appears, e.g. 0,0,1,1,2,1,0,2 can be encoded as (0,11000010),(1,00110100),(2,00001001).
17
RS Type3 self-order, many distinct values represent every value as a delta from the previous one,e.g.1,4,7,7,8,12 would be represented as 1,3,3,0,1,4. Type4 foreign-order, many distinct values just leave the values unencoded. Join Indexes can be stored as normal columns.
18
WS Implements the identical physical design as RS Each column in a WS projection is represented as a collections of pairs (v,sk) such that v is the value and sk is its corresponding storage key. Each pair is represented in a B-tree on the second field. “Name” is represented as (Alice,1), (Jill,2), (Bob,3) “Age” is represented as (23,1), (24,2), (25,3)
19
WS The sort key(s) of each projection is represented by pairs (s,sk) such that s is the sort key value and sk is the storage key describing where s first appears. Each pair is represented in a B-tree on the sort key field(s). To perform searches, use the latter B-tree to find the storage keys of interest, then use the former B-tree to find the other fields in the record. The sort key of EMP1 is “age”, so the sort key for EMP1 is represented as (23,1), (24,2), (25,3)
20
Tuple Mover Create a new RS segment named RS’ Read in unmarked records from columns of RS segment, merges in column values from WS Update any join indexes Free disk space used by the old RS
21
Performance Comparison Performance analysis limited to read-only queries Report on only single-site Experiment data: TPC-H scale_10 totals 60,000,000 line items (1.8GB) Run seven queries on each system: a commercial row- store, a commercial column-store and C-Store
22
Performance Comparison Space-constrained case:
23
Performance Comparison Space-unconstrained case:
24
Conclusion A column store representation with an associated query execution engine A hybrid architecture allowing transactions on a column store A focus on economizing storage representation on disk A data model consisting of overlapping projections of tables
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.