Cloud Computing Lecture Column Store – alternative organization for big relational data.

Cloud Computing Lecture Column Store – alternative organization for big relational data

C-store  C-store is Read-optimized, for OLAP type apps  Traditional DBMS, write-optimized (optimized for online transactions) Based on records(rows)

C-Store  What are the cost-sensitive major factors in query processing? Size of database Index or not Join  Current hardware configuration and what a DBMS can do… Cheap storage – allow distributed redundant data store Fast CPUs – compression/decompression Limited disk bandwidth – reduce I/O

C-store  Supporting OLAP (online analytic processing) operations Optimized read operations Balanced write performance Address the conflict between writes and reads  Fast write – append records  Fast read – indexed, compressed  Think if data organized in columns, what are the unique challenges (different from the row- organization)?

C-store’s features  Column based store saves space Compression is possible Index size is smaller  Multiple projections Allow multiple indices Parallel processing on the same attributes Materialized join results  Separation of writeable store and read- optimized store Both write/read are optimized Transactions are not blocked by write locks

Data model  Same as relational data model Tables, rows, columns Primary keys and foreign keys Projections  From single table  Multiple joined tables  Example EMP1 (name, age) EMP2 (dept, age, DEPT.floor) EMP3 (name, salary) DEPT1(dname, floor) EMP(name, age, dept, salary) DEPT(dname, floor) Normal relational model Possible C-store model

Physical projection organization  Sort key each projection has one Rows are ordered by sort key Partitioned by key range  Linking columns in the same projection Storage key – (segment id, key, i.e.,offset in segment)  Linking projections To reconstruct a table Join index

Conceptual organization column Segment: by sort key range Sort key column Seg id offset Join index Projection 1 Projection 2

Architectural consideration between writes and reads  Read often  needs indices to speedup  Write often  index unfriendly: needs to update indices frequently  Use “read store” and “write store”

Read store: Column encoding  Use compression schemes and indices Self-order (key), few distinct values  (value, position, # items)  Indexed by clustered B-tree Foreign-order (non-key), few distinct values  (value, bitmap index)  B-tree index: position  values Self-order, many distinct values  Delta from the previous value  B-tree index Foreign-order, many distinct values  Unencoded

Write Store  Same structure, but explicitly use (segment, key) to identify records Easier to maintain the mapping Only concerns the inserted records  Tuple mover Copies batch of records to RS  Delete record Mark it on RS Purged by tuple mover

Tuple mover  Moves records in WS to RS  Happens between read-only transactions  Use merge-out process

How to solve read/write conflict  Situation: one transaction updates the record X, while another transaction reads X.  Use snapshot isolation

Benefits in query processing  Selection – has more indices to use  Projection – some “projections” already defined  Join – some projections are materialized joins  Aggregations – works on required columns only

Evaluation  Use TPC-H – decision support queries  Storage

Query performance

 Row store uses materialized views

Summary: the performance gain  Column representation – avoids reads of unused attributes  Storing overlapping projections – multiple orderings of a column, more choices for query optimization  Compression of data – more orderings of a column in the same amount of space  Query operators operate on compressed representation

Cloud Computing Lecture Column Store – alternative organization for big relational data.

Similar presentations

Presentation on theme: "Cloud Computing Lecture Column Store – alternative organization for big relational data."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cloud Computing Lecture Column Store – alternative organization for big relational data.

Similar presentations

Presentation on theme: "Cloud Computing Lecture Column Store – alternative organization for big relational data."— Presentation transcript:

Similar presentations

About project

Feedback