Download presentation
Presentation is loading. Please wait.
Published byMaurice Randall Modified over 9 years ago
1
Cloud Computing Lecture Column Store – alternative organization for big relational data
2
C-store C-store is Read-optimized, for OLAP type apps Traditional DBMS, write-optimized (optimized for online transactions) Based on records(rows)
3
C-Store What are the cost-sensitive major factors in query processing? Size of database Index or not Join Current hardware configuration and what a DBMS can do… Cheap storage – allow distributed redundant data store Fast CPUs – compression/decompression Limited disk bandwidth – reduce I/O
4
C-store Supporting OLAP (online analytic processing) operations Optimized read operations Balanced write performance Address the conflict between writes and reads Fast write – append records Fast read – indexed, compressed Think if data organized in columns, what are the unique challenges (different from the row- organization)?
5
C-store’s features Column based store saves space Compression is possible Index size is smaller Multiple projections Allow multiple indices Parallel processing on the same attributes Materialized join results Separation of writeable store and read- optimized store Both write/read are optimized Transactions are not blocked by write locks
6
Data model Same as relational data model Tables, rows, columns Primary keys and foreign keys Projections From single table Multiple joined tables Example EMP1 (name, age) EMP2 (dept, age, DEPT.floor) EMP3 (name, salary) DEPT1(dname, floor) EMP(name, age, dept, salary) DEPT(dname, floor) Normal relational model Possible C-store model
7
Physical projection organization Sort key each projection has one Rows are ordered by sort key Partitioned by key range Linking columns in the same projection Storage key – (segment id, key, i.e.,offset in segment) Linking projections To reconstruct a table Join index
8
Conceptual organization column Segment: by sort key range Sort key column Seg id offset Join index Projection 1 Projection 2
9
Architectural consideration between writes and reads Read often needs indices to speedup Write often index unfriendly: needs to update indices frequently Use “read store” and “write store”
10
Read store: Column encoding Use compression schemes and indices Self-order (key), few distinct values (value, position, # items) Indexed by clustered B-tree Foreign-order (non-key), few distinct values (value, bitmap index) B-tree index: position values Self-order, many distinct values Delta from the previous value B-tree index Foreign-order, many distinct values Unencoded
11
Write Store Same structure, but explicitly use (segment, key) to identify records Easier to maintain the mapping Only concerns the inserted records Tuple mover Copies batch of records to RS Delete record Mark it on RS Purged by tuple mover
12
Tuple mover Moves records in WS to RS Happens between read-only transactions Use merge-out process
13
How to solve read/write conflict Situation: one transaction updates the record X, while another transaction reads X. Use snapshot isolation
14
Benefits in query processing Selection – has more indices to use Projection – some “projections” already defined Join – some projections are materialized joins Aggregations – works on required columns only
15
Evaluation Use TPC-H – decision support queries Storage
16
Query performance
17
Row store uses materialized views
18
Summary: the performance gain Column representation – avoids reads of unused attributes Storing overlapping projections – multiple orderings of a column, more choices for query optimization Compression of data – more orderings of a column in the same amount of space Query operators operate on compressed representation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.