1 C-Store: A Column-oriented DBMS New England Database Group (Stonebraker, et al. Brandeis/Brown/MIT/UMass-Boston) Extended for Big Data Reading Group.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
CS 245Notes 31 (1) Insertion/Deletion (2) Buffer Management (3) Comparison of Schemes Other Topics.
Hashing and Indexing John Ortiz.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Transaction.
C-Store: Updates Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 15, 2009.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Physical Database Monitoring and Tuning the Operational System.
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
Chapter 9: Database Systems
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Cloud Computing Lecture Column Store – alternative organization for big relational data.
Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed.
C-Store: A Column-oriented DBMS Speaker: Zhu Xinjie Supervisor: Ben Kao.
CSC271 Database Systems Lecture # 30.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Database Tuning Prerequisite Cluster Index B+Tree Indexing Hash Indexing ISAM (indexed Sequential access)
C-Store: Column-Oriented Data Warehousing Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
MIT DB GROUP. People Sam Madden Daniel Abadi (Yale)Daniel Abadi Magdalena Balazinska (U. Wash.)Magdalena Balazinska.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
1 C-Store: A Column-oriented DBMS By New England Database Group.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun. 5, 2009.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
C-Store: Data Model and Data Organization Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
Chapter 9 Database Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 9 Database Systems. © 2005 Pearson Addison-Wesley. All rights reserved 9-2 Chapter 9: Database Systems 9.1 Database Fundamentals 9.2 The Relational.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 4 Logical & Physical Database Design
Session 1 Module 1: Introduction to Data Integrity
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Column Oriented Database By: Deepak Sood Garima Chhikara Neha Rani Vijayita Gumber.
SQL Basics Review Reviewing what we’ve learned so far…….
Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: In- Memory DB (IMDB) and Column-Oriented DB.
Indexing Structures for Files and Physical Database Design
Advanced Database Systems: DBS CB, 2nd Edition
COMP 430 Intro. to Database Systems
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Paritosh Aggarwal Rushi Nadimpally
Chapter 9: Database Systems
Physical Database Design
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
CSTORE E0261 Jayant Haritsa Computer Science and Automation
Self-organizing Tuple Reconstruction in Column-stores
Presentation transcript:

1 C-Store: A Column-oriented DBMS New England Database Group (Stonebraker, et al. Brandeis/Brown/MIT/UMass-Boston) Extended for Big Data Reading Group Presentation by Shimin Chen

2 M.I.T Relational Database Record 1 Record 2 Record 3 Attribute1Attribute2Attribute3 e.g. Customer(cid, name, address, discount) Product(pid, name, manufacturer, price, quantity) Order(oid, cid, pid, quantity)

3 M.I.T Current DBMS -- “Row Store” Record 2 Record 4 Record 1 Record 3 E.g. DB2, Oracle, Sybase, SQLServer, …

4 M.I.T Row Stores are Write Optimized (use white board) Row Stores are Write Optimized (use white board)  Store fields in one record contiguously on disk  Use small (e.g. 4K) disk blocks  Use B-tree indexing  Align fields on byte or word boundaries  Assume shifting data values is costly  Transactions: write-ahead logging  Store fields in one record contiguously on disk  Use small (e.g. 4K) disk blocks  Use B-tree indexing  Align fields on byte or word boundaries  Assume shifting data values is costly  Transactions: write-ahead logging

5 M.I.T Row Stores are Write Optimized Row Stores are Write Optimized  Can insert and delete a record in one physical write  Good for on-line transaction processing (OLTP)  But not for read mostly applications  Data warehouses  Customer Relationship Management (CRM)  Electronic library card catalogs  …  Can insert and delete a record in one physical write  Good for on-line transaction processing (OLTP)  But not for read mostly applications  Data warehouses  Customer Relationship Management (CRM)  Electronic library card catalogs  …

6 M.I.T Column Stores

7 M.I.T At 100K Feet….  Read-optimized:  Periodically a bulk load of new data  Long period of ad-hoc queries  Benefit:  Ad-hoc queries read 2 columns out of 20  Column store reads 10% of what a row store reads  Previous pioneering work: Sybase IQ (early ’90s) Monet (see CIDR ’05 for the most recent description)  Read-optimized:  Periodically a bulk load of new data  Long period of ad-hoc queries  Benefit:  Ad-hoc queries read 2 columns out of 20  Column store reads 10% of what a row store reads  Previous pioneering work: Sybase IQ (early ’90s) Monet (see CIDR ’05 for the most recent description)

8 M.I.T C-Store Technical Ideas  Data storage:  Only materialized views (perhaps many)  Compress the columns to save space  No alignment  Big disk blocks  Innovative redundancy  Optimize for grid (cluster) computing  Focus on Sorting not indexing  Automatic physical DBMS design  Column optimizer and executor  Data storage:  Only materialized views (perhaps many)  Compress the columns to save space  No alignment  Big disk blocks  Innovative redundancy  Optimize for grid (cluster) computing  Focus on Sorting not indexing  Automatic physical DBMS design  Column optimizer and executor

9 M.I.T How to Evaluate This Paper….  None of the ideas in isolation merit publication  Judge the complete system by its (hopefully intelligent) choice of  Small collection of inter-related powerful ideas  That together put performance in a new sandbox  None of the ideas in isolation merit publication  Judge the complete system by its (hopefully intelligent) choice of  Small collection of inter-related powerful ideas  That together put performance in a new sandbox

10 M.I.T Outline  Overview  Read-optimized column store  Query execution and optimization  Handling transactional updates  Performance  Summary

11 M.I.T Data Model  Projection (materialized view):  some number of columns from a fact table  plus columns in a dimension table – with a 1-n join between Fact and Dimension table  (conceptually) no duplicate elimination  Stored in order of a storage key(s)  Note: base table is not stored anywhere  Projection (materialized view):  some number of columns from a fact table  plus columns in a dimension table – with a 1-n join between Fact and Dimension table  (conceptually) no duplicate elimination  Stored in order of a storage key(s)  Note: base table is not stored anywhere

12 M.I.T Example  Logical base tables: –EMP (name, age, salary, dept) –DEPT (dname, floor)  Example projections –EMP1 (name, age | age) –EMP2 (dept, age, DEPT.floor | DEPT.floor) –EMP3 (name, salary | salary) –DEPT1 (dname, floor | floor)

13 M.I.T Optimize for Grid Computing  I.e. shared-nothing  Horizontal partitioning and intra-query parallelism as in Gamma  Paper talks about “Grid computers … may have tens to hundreds of nodes …”  I.e. shared-nothing  Horizontal partitioning and intra-query parallelism as in Gamma  Paper talks about “Grid computers … may have tens to hundreds of nodes …”

14 M.I.T Projection Detail #1  Each projection is horizontally partitioned into “segment”s –Segment identifier –Unit of distribution and parallelism –Value-based partitioning, key range of sort key(s)  Column-wise store inside segment  Storage key: ordinal record number in segment –calculated as needed

15 M.I.T Projection Detail #2  Different encoding schemes for different columns  Depends on ordering and value distribution –Self-order, few distinct values: (value, position, num_entries) –Foreign-order, few distinct values: (value, bitmap), bitmap is run-length encoded –Self-order, many distinct values: block-oriented, delta value encoding –Foreign-order, many distinct values: gzip

16 M.I.T Different Indexing Few valuesMany values Sequential (self-order) RLE encoded Conventional B-tree at the value level Delta encoded Conventional B-tree at the block level Non sequential (foreign-order) Bitmap per value Conventional Gzip Conventional B-tree at the block level

17 M.I.T Big Disk Blocks  Tunable  Big (minimum size is 64K)  Tunable  Big (minimum size is 64K)

18 M.I.T Reconstructing Base Table from Projections  Join Index: –Projection T1 has M segments, projection T2 has n segments –T1 and T2 are on same base table –Join index consists of M tables, one per T1 segment –Entry: segment ID and storage key of corresponding record in T2  In general, needs multiple join indices for reconstructing a base table  Join index is costly to store and maintain –Each column expected to be in multiple projections –Reduce # of join indices

19 M.I.T Innovative Redundancy  Hardly any warehouse is recovered by redo from log  Takes too long!  Store enough projections to ensure K-safety  Column can be in K different projections  Rebuild dead objects from elsewhere in the network  Hardly any warehouse is recovered by redo from log  Takes too long!  Store enough projections to ensure K-safety  Column can be in K different projections  Rebuild dead objects from elsewhere in the network

20 M.I.T Automatic Physical DBMS Design  Accept a “training set” of queries and a space budget  Choose the projections and join indices auto-magically  Re-optimize periodically based on a log of the interactions  Accept a “training set” of queries and a space budget  Choose the projections and join indices auto-magically  Re-optimize periodically based on a log of the interactions

21 M.I.T Outline  Overview  Read-optimized column store  Query execution and optimization  Handling transactional updates  Performance  Summary

22 M.I.T Operators  Decompress  Select: generate bitstring  Mask: bitstring+projection  selected rows  Project: choose a subset of columns  Concat: combine multiple projections that are sorted in the same order  Sort  Permute: according to a join index  Join  Aggregation operators  Bitstring operators

23 M.I.T Execution  Query plan: a tree of operators (data flow) –Leaf: accessing the data storage –Internal: calls “get_next”  Operators return 64KB blocks

24 M.I.T Column Optimizer (discussion)  Cost-based estimation for query plan construction  Chooses projections on which to run the query  Cost model includes compression types  When to perform “mask” operator  Build in snowflake schemas  Which are simple to optimize without exhaustive search  Looking at extensions  Cost-based estimation for query plan construction  Chooses projections on which to run the query  Cost model includes compression types  When to perform “mask” operator  Build in snowflake schemas  Which are simple to optimize without exhaustive search  Looking at extensions

25 M.I.T Outline  Overview  Read-optimized column store  Query execution and optimization  Handling transactional updates  Performance  Summary

26 M.I.T Online Updates Are Necessary  Transactional updates are necessary even in read- mostly environment  Online updates for error corrections  Real-time data warehouses –Reduce the delay between OLTP system and warehouse towards zero

27 M.I.T Solution – a Hybrid Store Read-optimized Column store Write-optimized Column store Tuple mover (What we have been talking about so far) (Batch rebuilder)

28 M.I.T Write Store  Column store  Horizontally partitioned as the read store –1:1 mapping between RS segments and WS segments  Storage keys are explicitly stored –Btree: sort key  storage key  No compression (the data size is small)

29 M.I.T Handling Updates  Optimize read-only query: do not hold locks –Snapshot isolation –The query is run on a snapshot of the data –Ensure transactions related to this snapshot have already committed  Each WS site: insertion vector (with timestamps), deletion vector, (updates become insertions and detetions)  Maintain a high water mark and a low water mark of WS sites: –HWM: all transactions before HWM have committed –LWM: no records in read store are inserted before LWM  Queries can specify a time before HWM

30 M.I.T HWM and epochs  TA: time authority updates the coarse timer (epochs)

31 M.I.T Transactions  Undo from a log (that does not need to be persistent)  Redo by rebuild from elsewhere in the network  Undo from a log (that does not need to be persistent)  Redo by rebuild from elsewhere in the network

32 M.I.T Tuple-Mover  Read RS segment  Combine WS segment into a new version of the RS segment, do not update in place  Record last move time for this segment in WS  T last_move  LWM  Time authority will periodically sends out a new LWM epoch number

33 M.I.T Current Performance Varying storage:  100X popular row store in 40% of the space  10X popular column store in 70% of the space  7X popular row store in 1/6 th of the space  Code available with BSD license Varying storage:  100X popular row store in 40% of the space  10X popular column store in 70% of the space  7X popular row store in 1/6 th of the space  Code available with BSD license

34 M.I.T Summary  Column store is optimized for read queries  Cluster parallelism  Interesting data organization  Handling write queries