Paritosh Aggarwal Rushi Nadimpally

Slides:



Advertisements
Similar presentations
1 CSIS 7102 Spring 2004 Lecture 9: Recovery (approaches) Dr. King-Ip Lin.
Advertisements

Transaction.
Presented by Vigneshwar Raghuram
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
C-Store: Updates Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 15, 2009.
BTrees & Bitmap Indexes
File Management.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Distributed Databases
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Cloud Computing Lecture Column Store – alternative organization for big relational data.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
C-Store: A Column-oriented DBMS Speaker: Zhu Xinjie Supervisor: Ben Kao.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
1 C-Store: A Column-oriented DBMS New England Database Group (Stonebraker, et al. Brandeis/Brown/MIT/UMass-Boston) Extended for Big Data Reading Group.
1 C-Store: A Column-oriented DBMS By New England Database Group.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun. 5, 2009.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
C-Store: Data Model and Data Organization Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Database Recovery Techniques
Database Recovery Techniques
Virtual memory.
Jonathan Walpole Computer Science Portland State University
Module 11: File Structure
CPS216: Data-intensive Computing Systems
CHP - 9 File Structures.
Chapter 11: File System Implementation
Lecture 16: Data Storage Wednesday, November 6, 2006.
CSE-291 (Cloud Computing) Fall 2016
File System Implementation
Database Management Systems (CS 564)
Database System Implementation CSE 507
Review.
Lecture 11: DMBS Internals
Disk Storage, Basic File Structures, and Buffer Management
Database management concepts
Physical Database Design
Module 11: Data Storage Structure
Main Memory Background Swapping Contiguous Allocation Paging
Introduction to Database Systems
Lecture 15: Bitmap Indexes
Selected Topics: External Sorting, Join Algorithms, …
Printed on Monday, December 31, 2018 at 2:03 PM.
Lecture 3: Main Memory.
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
File Storage and Indexing
Database management concepts
Chapter 13: Data Storage Structures
Contents Memory types & memory hierarchy Virtual memory (VM)
CSTORE E0261 Jayant Haritsa Computer Science and Automation
ICOM 5016 – Introduction to Database Systems
Database Recovery 1 Purpose of Database Recovery
Data Structures & Algorithms
Chapter 13: Data Storage Structures
Lecture Topics: 11/20 HW 7 What happens on a memory reference Traps
Chapter 13: Data Storage Structures
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Lecture 20: Representing Data Elements
Presentation transcript:

Paritosh Aggarwal Rushi Nadimpally C-Store Paritosh Aggarwal Rushi Nadimpally 11/13/2018 EECS 584, Fall 2011

Agenda – Part I Motivation and Basic Idea C-Store Data Model Bimodal Architecture Compression Schemes Working on compressed data 11/13/2018 EECS 584, Fall 2011

Agenda – Part II Transactions and Updates C-Store Performance Experiments on Compression Conclusions 11/13/2018 EECS 584, Fall 2011

Agenda Motivation and Basic Idea C-Store Data Model Bimodal Architecture Compression Schemes Working on compressed data 11/13/2018 EECS 584, Fall 2011

Motivation and Basic Idea Row stores Traditional systems Most natural order for storing data Can we do better? Reads generally involve only a subset of the attributes Row stores perform well on OLTP workloads – small reads and writes with B-Trees, etc. They fare badly in read intensive applications e.g. OLAP 11/13/2018 EECS 584, Fall 2011

New Idea! What is C-Store? Store data by columns Read optimization for OLAP applications C-Stores lend themselves to a variety of other advantages over row stores 11/13/2018 EECS 584, Fall 2011

Main Strengths Read in only required columns COMPRESSION – probably the USP for C-Stores A contiguous sequence of domain limited values lend themselves very nicely to compression schemes Do away with padding You don’t even have to decompress sometimes! SELECT AVG(salary) from EMP: Work on RLE encoded data all along Memory and disk bandwidth is not as abundant as one would like – Compression is a good idea (better cache performance, lower page faults, etc.) 11/13/2018 EECS 584, Fall 2011

There’s no free lunch – Main Drawbacks To answer many queries, joins must be calculated on projections Join indices must be maintained between pairs of relations Leads to added complexity 11/13/2018 EECS 584, Fall 2011

Agenda C-Store Data Model Motivation and Basic Idea Bimodal Architecture Compression Schemes Working on compressed data 11/13/2018 EECS 584, Fall 2011

Data Model Standard relational logical data model Difference in physical representation – you can think of the relational interface as a view Projections are implemented and stored in column-order Projections represented as ti (<names of fields>) Move to #5 11/13/2018 EECS 584, Fall 2011

Data Model Projections Advantages in a distributed system One column may appear in multiple projections Aggressive compression ~ no explosion in space requirements All projects must form a covering on all attributes of the relation Advantages in a distributed system High availability which actually contributes something useful! K-safety 11/13/2018 EECS 584, Fall 2011

Segments, storage keys Every projection is horizontally partitioned into 1 or more segments, each with a segment identifier Sid Each segment is associated with a key (key range to be precise) The projections must form a covering of all columns All rows should be reconstructable from the given set of projections 11/13/2018 EECS 584, Fall 2011

Each segment associates every data value of every column with a storage key, SK Name Age Dept Salary 1 Bob 25 Math 10K 2 Bill 27 EECS 50K 3 Jill 24 Biology 80k 1 2 3 Name Age Dept Salary Items in numbers represent storage keys. Only one segment is shown, but this may have been divided into multiple segments 11/13/2018 EECS 584, Fall 2011

Sort orders, join indices Join indices: A mapping from one sort order to another sort order To reconstruct T segments Ti: find a path through a set of join indices that maps each attribute of T into some sort order O*. Formally: a join index from M segments in T1 to N segments in T2 is a collection of M tables, one per segment S of T1, consisting of rows of the form: (s: SID in T2, k:Storage Key in Segment s) 11/13/2018 EECS 584, Fall 2011

Agenda Bimodal Architecture Motivation and Basic Idea C-Store Data Model Bimodal Architecture Compression Schemes Working on compressed data 11/13/2018 EECS 584, Fall 2011

Bimodal Architecture We cannot ignore reads Retaining sort order on columns difficult while insertion Sub-optimal query performance if insertion order allowed in columns Simply consider: if we maintain insertion order on a given projection, the various advantages of sorting over a certain attribute go away (indexing, compression) 11/13/2018 EECS 584, Fall 2011

Bimodal Architecture Small WS : write-opt Larger RS : read-opt Tuple mover : transport from WS to RS 11/13/2018 EECS 584, Fall 2011

Bimodal Architecture Writes sent to WS / Reads from RS / Updates = Insert + Delete Tuple-Mover uses LSM-tree to bulk-move WS objects into RS Locking Snapshot isolation: queries are run in historical mode – read only queries run only at snapshot T, and are semantically accurate 11/13/2018 EECS 584, Fall 2011

Agenda Compression Schemes Motivation and Basic Idea C-Store Data Model Bimodal Architecture Compression Schemes Working on compressed data 11/13/2018 EECS 584, Fall 2011

Integrating Compression and Execution in Column-Oriented Database Systems Benefits of Compression Improved seek times Reduced transfer times Higher buffer hit rate CPU overhead of decompression compensated by I/O improvement Sometimes, compressed data can be worked on directly (shown later) 11/13/2018 EECS 584, Fall 2011

Various Compression Schemes Null Suppression Leading zeros or blanks in data are deleted and replaced with the location and number pair. Field sizes are allowed to be variable to avoid padding. Exact size of field is stored before the field, in two bits Run-length Encoding Runs of same value are replaced with triples (value, start position, run_length) Example: 111000000110000000000 may become 111110010110110 State their use cases - Add examples 11/13/2018 EECS 584, Fall 2011

Various Compression Schemes Dictionary Encoding 32 values for an attribute A  5 bits to represent it 1 value in 1 byte, 3 values in 2 bytes, 4 values in 3 bytes, etc. If 5 is 00000, 25 is 00001; 31 is 00010, then: X000000000100010 -> 31 25 1 Decoding: Read in two bytes, lookup bit pattern in dictionary. 11/13/2018 EECS 584, Fall 2011

Various Compression Schemes Bit-Vector Encoding Original String: 1132231 Compressed String(maybe stored in a table): 1 => 1100001 2 => 0001100 3 => 0010010 Lempel-Ziv Encoding Takes variable sized patterns and replaces them with fixed length codes Used by the unix program gzip 11/13/2018 EECS 584, Fall 2011

Lempel – Ziv Algorithm (Example thanks to: http://www.data-compression.com/lempelziv.html) Initialize the dictionary to contain all blocks of length one (D={a,b}). Search for the longest block W which has appeared in the dictionary. Encode W by its index in the dictionary. Add W followed by the first symbol of the next block to the dictionary. Go to Step 2. 11/13/2018 EECS 584, Fall 2011

Agenda Working on compressed data Motivation and Basic Idea C-Store Data Model Bimodal Architecture Compression Schemes Working on compressed data 11/13/2018 EECS 584, Fall 2011

Example

Compression-Aware Optimizations - Comments The example illustrates how processing can be done directly and efficiently on compressed data. It also shows how easily the code can digress to a separate if..then..else block for each section. What’s the solution? 11/13/2018 EECS 584, Fall 2011

Compression-Aware Optimizations -Solution Code based on properties of compression that actually determine the optimization SELECT c1, COUNT(*) FROM t GROUP BY c1 11/13/2018 EECS 584, Fall 2011

Agenda – Part II Transactions and Updates C-Store Performance Experiments on Compression Conclusions 11/13/2018 EECS 584, Fall 2011

Agenda – Part II Transactions and Updates C-Store Performance Experiments on Compression Conclusions 11/13/2018 EECS 584, Fall 2011

Updates & Transactions Dealt with in Writeable Store. Insertion is just a collection in WS. Implementation is shielded logically: User still deals with logical rows. No synchronization needed between nodes. Large main memories: Not an issue nowadays. 11/13/2018 EECS 584, Fall 2011

Snapshot Isolation Snapshots of RS at different epoch periods, after writes committed. Reads view the closest snapshot. Terms: High Water Mark: Most recent time at which a snapshot isolation can be run. Low Water Mark: The earliest time at which snapshot isolation can be run. IV (Insertion Vector) & DRV (Deleted Record Vector), maintained in WS. 11/13/2018 EECS 584, Fall 2011

Maintaining HWM HWM maintained by the Timestamp Authority(TA) and a system of messages. Reclamation of epochs to conserve space. 11/13/2018 EECS 584, Fall 2011

Locking-Based Concurrency No Force +Steal, locking mechanism. Distributed Commit Processing: Master distributes the work. No prepare phase. Not the conventional two phase mechanism. Transaction Rollback: Using UNDO log 11/13/2018 EECS 584, Fall 2011

Distributed Recovery Loss cases : No Data loss: Direct Recovery Track updates in the network. Catastrophic failure: RS and WS eliminated. Reconstruct using projections in other nodes. Not a likely scenario as only tuple mover writes to RS. Only WS fails: Most Likely 11/13/2018 EECS 584, Fall 2011

Recovering Writeable Store Find all the elements in the WS after that latest timestamp in RS. Are all columns overlapped? If yes, Done !! No? Then query remote site for all its content and perform a differencing operation with the local RS segment. Tuple Mover maintains a log of the segment , epoch number of tuple before it was moved from WS. 11/13/2018 EECS 584, Fall 2011

Tuple Mover Operates as a background task. Records on the DRV list are deleted. Records on IV list are inserted/updated. Join indexes are maintained. New RS segment created. Old one is not updated !! Old RS records are copied in bulk and system now ignores original RS segment. 11/13/2018 EECS 584, Fall 2011

Query Execution: Operators Decompress Select Mask Project Sort Aggregation Concatenation Permute Join Bit operations Optimization Cost depends on Compression type. A major decision: Which projections to use to answer a query ? 11/13/2018 EECS 584, Fall 2011

Agenda – Part II C-Store Performance Transactions and Updates Experiments on Compression Conclusions 11/13/2018 EECS 584, Fall 2011

How good is C-Store ? No write/update measurements !! 3.0 Ghz Pentium, RedHat Linux, 2 GB of memory. 750 GB of disk. 11/13/2018 EECS 584, Fall 2011

Why is it better? By How much? Reasons : Column representation Storing overlapping projections Better compression Operations on compressed data. Figures: 164 times X row store; 21 times X column store. Unconstrained space: 6.4 X row store (with space X 1/6) 16.5 X column store. (with space X 1/1.83) 11/13/2018 EECS 584, Fall 2011

Agenda – Part II Experiments on Compression Transactions and Updates C-Store Performance Experiments on Compression Conclusions 11/13/2018 EECS 584, Fall 2011

Experiments on Compression Purpose: To identify situations where encoding provides significant gains. Test Variations: Eager Decompression. Lazy Decompression. (After computation) Tests with CPU contention Identify performance criteria: The length of sorted runs The number of distinct values 11/13/2018 EECS 584, Fall 2011

Savings in Size Run length: 50 Run length: 1000 11/13/2018 EECS 584, Fall 2011

Improvements in Speed Run length: 50 Run length:1000 11/13/2018 EECS 584, Fall 2011

Computing on Compressed Data Run length: 50 Run length: 1000 11/13/2018 EECS 584, Fall 2011

Does CPU Contention matter in a database system ? 11/13/2018 EECS 584, Fall 2011

Extreme Cases: (High Cardinality) Run length:14 Run length:1 11/13/2018 EECS 584, Fall 2011

Relative Performance 11/13/2018 EECS 584, Fall 2011

How applicable is it to Real-World data ? 11/13/2018 EECS 584, Fall 2011

How Does Ordering Matter ? 11/13/2018 EECS 584, Fall 2011

Agenda – Part II Conclusions Transactions and Updates C-Store Performance Experiments on Compression Conclusions 11/13/2018 EECS 584, Fall 2011

Conclusions on C-Store: Column store representation. Hybrid architecture . Data Compression. Overlapping projections. Distributed transactions without redo log or two phase commit. Efficient Snapshot isolation. 11/13/2018 EECS 584, Fall 2011

Decision Tree for Compression Are these moderate length runs ? (Average size>4) Solution: Compress columns with RLE. Is the number of distinct values large (>50,000) ? Is this column likely to be used in a contiguous fashion ? Solution: Use Dictionary Encoding Is the number of distinct values > 50 Solution: Use Bit Vector Encoding. Does the data exhibit good locality ? Solution: Use LZ Solution: Don’t compress Yes No Yes No Yes Yes No No Yes 11/13/2018 EECS 584, Fall 2011

Conclusions on Compression Low cardinality items on the left side of the sorting order. (Provides larger runs) Its a good idea to operate directly on the compressed data. Sacrificing heavy weight compression schemes for those which can be efficiently computed on is a good idea. Compression-Aware Optimizer Taking only IO costs is a bad idea !! 11/13/2018 EECS 584, Fall 2011