H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.

Slides:



Advertisements
Similar presentations
Chen Zhang Hans De Sterck University of Waterloo
Advertisements

TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Distributed databases
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture X: Transactions.
Spark: Cluster Computing with Working Sets
Transaction.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
What Should the Design of Cloud- Based (Transactional) Database Systems Look Like? Daniel Abadi Yale University March 17 th, 2011.
Distributed Database Management Systems
DBMS Functions Data, Storage, Retrieval, and Update
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Transaction Management WXES 2103 Database. Content What is transaction Transaction properties Transaction management with SQL Transaction log DBMS Transaction.
Distributed Databases
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Transaction Management: Concurrency Control CS634 Class 16, Apr 2, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
H-Store: A Specialized Architecture for High-throughput OLTP Applications Evan Jones (MIT) Andrew Pavlo (Brown) 13 th Intl. Workshop on High Performance.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Databases Illuminated
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.
Transaction Processing Concepts Muheet Ahmed Butt.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Introducing Hekaton The next step in SQL Server OLTP performance Mladen Prajdić
CPSC-310 Database Systems
CSCI5570 Large Scale Data Processing Systems
CS 540 Database Management Systems
Cloud Computing and Architecuture
Remote Backup Systems.
Distributed Database Concepts
DURABILITY OF TRANSACTIONS AND CRASH RECOVERY
CS 540 Database Management Systems
CSCI5570 Large Scale Data Processing Systems
CS 540 Database Management Systems
CS 440 Database Management Systems
Transaction Management and Concurrency Control
Database Applications (15-415) DBMS Internals- Part XIII Lecture 22, November 15, 2016 Mohammad Hammoud.
Cassandra Transaction Processing
Introduction to NewSQL
Chapter 19: Distributed Databases
Replication.
Transactions Properties.
Predictive Performance
NoSQL Databases An Overview
Batches, Transactions, & Errors
CS 440 Database Management Systems
Transactions, Locking and Query Optimisation
Transactions.
Database Applications (15-415) DBMS Internals- Part XIII Lecture 25, April 15, 2018 Mohammad Hammoud.
The PROCESS of Queries John Deardurff
HStore: A High Performance, Distributed Main Memory Transaction Processing System Authors: Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo,
Outline Introduction Background Distributed DBMS Architecture
The PROCESS of Queries John Deardurff Website: ThatAwesomeTrainer.com
Interpret the execution mode of SQL query in F1 Query paper
Database Security Transactions
The PROCESS of Queries John Deardurff
Batches, Transactions, & Errors
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Lecture 21: Replication Control
Lecture 20: Intro to Transactions & Logging II
Database System Architectures
Data Models in DBMS A model is a representation of reality, 'real world' objects and events, associations. Data Model can be defined as an integrated collection.
Database Applications (15-415) DBMS Internals- Part XIII Lecture 24, April 14, 2016 Mohammad Hammoud.
CSCI 6442 Main Memory Database
Lecture 21: Replication Control
Remote Backup Systems.
Advanced Topics: Indexes & Transactions
Presentation transcript:

H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex Rasin, Stanley B. Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, Daniel J. Abadi

Paper highlights An experimental main-memory, parallel DBMS Optimized for on-line transaction processing (OLTP) applications Highly distributed, row-store-based relational database Runs on a cluster on shared-nothing, main memory executor nodes.

Background

Relational DBMS Data are stored in tables Each row corresponds to a record No pointers or other links Matching keys

Example PSID is a key Links rows among tables Must be unique Barbara 13339 CS CE Student PSID Major Alan 08887 PSID is a key Links rows among tables Must be unique 08887 89 13339 87 90 PSID Mid Final

Atomic Transactions: You buy a car

What are atomic transactions? A mechanism used in databases and other financial systems. System guarantees that an atomic transaction will either be executed properly or abort without leaving any trace All or nothing semantics Atomic transactions verify the four ACID properties

The ACID properties Atomicity: All or nothing property Consistency: A transaction either brings the data in a new consistent state of data or returns them to their previous state Isolation: A transaction in process and not yet committed has no effect on any other transaction. Durability: Committed data are stored by the system in some kind of crash-proof storage

Importance of atomic transactions Atomicity and consistency properties guarantee that either the transaction is correct or will leave no traces No partial updates No incorrect updates Isolation property allows concurrent execution of transactions Much faster than serial execution Durability property ensures transactions will not be lost

Back to the paper

Motivation Legacy OLTP databases Too many of their architectural components are old Inherited from original System R Mid-seventies! Take advantage of recent trends Multi-core architectures Cheap abundant main memory Dominant use of stored procedures

The focus Reject “one size fits all” approach On-line transaction processing (OLTP) systems have specific proprieties Repetitive short-lived transactions Stored procedures Sole focus of this work

Rely on replication to minimize the risk of data loss Main issue Poor I/O performance of RDBMS Their solution Scale system “horizontally” Partition responsibilities among multiple shared nothing machines Store entire DB in the memory of a large cluster of server machines Rely on replication to minimize the risk of data loss

H-Store Next generation OLTP system Operates on a distributed cluster of shared nothing machines Coordinates the work of multiple single-threaded engines All data are always kept in main memory

System Overview H-Store Cluster containing two or more computational nodes Nodes Single physical component that holds multiple sites Sites Normally run on a dedicated core Single-threaded Do not share any data structure or memory with any collocated site

H-Store system architecture

System deployment Cluster deployment framework takes at inputs A set of stored procedures A database schema A sample workload (used to optimize data layout) A set of available sites in the cluster Two-phase optimization First optimize stored procedures as if the database was not distributed Then come with distributed query plans

Run-time model All sites in the cluster are trusted Any site is able to execute any OLTP application request Execution plan is Annotated with the locations of the target sites Passed to a transaction manager No shared data structures Everything is single-threaded

Database properties Physical layout of DB specifically optimized to execute precompiled transactions Not ad hoc queries Can still be executed but could be very slow

Transaction classes Two important special cases Single-Site Transactions Can be entirely executed on a single site Easy to send the transaction to one of the target sites One-Shot Transactions Each of is individual queries executes on a single site Output of these queries is not reused as inputs for other queries Easy to execute in parallel

Physical layout Replicate frequently-accessed or read-only tables on each site Horizontal partition of tables Partitions can be accessed in parallel Collocate them with related data Protect data against node failures Important for in-memory DBs k-safety Number k of node failures DB must tolerate

DB layout loader Table Replication Replicate all read-only tables on all sites Data Partitioning Divide horizontally each table into four disjoint partitions Each partition is stored on two different sites Accent is on parallelism K-Safety k = 2