Chen Zhang Hans De Sterck University of Waterloo

Slides:



Advertisements
Similar presentations
Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Advertisements

Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
Serializability in Multidatabases Ramon Lawrence Dept. of Computer Science
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
High throughput chain replication for read-mostly workloads
Omid Efficient Transaction Management and Incremental Processing for HBase Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution.
Serializable Isolation for Snapshot Databases Michael J. Cahill, Uwe Röhm, and Alan D. Fekete University of Sydney ACM Transactions on Database Systems.
CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Distributed databases
COLUMN-BASED DBS BigTable, HBase, SimpleDB, and Cassandra.
1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL.
Distributed Systems 2006 Styles of Client/Server Computing.
Overview Distributed vs. decentralized Why distributed databases
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Overview  Strong consistency  Traditional approach  Proposed approach  Implementation  Experiments 2.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems The Google File System S. Ghemawat, H. Gobioff and S-T. Leung, The Google File.
Distributed Databases
Distributed storage for structured data
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
Distributed Deadlocks and Transaction Recovery.
1 The Google File System Reporter: You-Wei Zhang.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
1 Large-scale Incremental Processing Using Distributed Transactions and Notifications Written By Daniel Peng and Frank Dabek Presented By Michael Over.
Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
Concurrency and Transaction Processing. Concurrency models 1. Pessimistic –avoids conflicts by acquiring locks on data that is being read, so no other.
DISTRIBUTED DATABASES JORGE POMBAR. Overview Most businesses need to support databases at multiple sites. Most businesses need to support databases at.
Homework 4 Code for word count com/content/repositories/releases/com.cloud era.hadoop/hadoop-examples/
1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Cloudera Kudu Introduction
Distributed DBMS, Query Processing and Optimization
Bigtable: A Distributed Storage System for Structured Data
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
Synchronization in Distributed File Systems Advanced Operating System Zhuoli Lin Professor Zhang.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Dsitributed File Systems
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
An Introduction to Super-Scalability But first…
Bigtable A Distributed Storage System for Structured Data.
Amirhossein Saberi May CASSANDRA NAME A daughter of the Trojan king Priam, who was given the gift of prophecy by Apollo. When she cheated him, however,
Amit Ohayon, seminar in databases, 2017
DURABILITY OF TRANSACTIONS AND CRASH RECOVERY
HBase Mohamed Eltabakh
Client/Server Databases and the Oracle 10g Relational Database
Open Source distributed document DB for an enterprise
NoSQL Database and Application
Operational & Analytical Database
Cassandra Transaction Processing
Clock-SI: Snapshot Isolation for Partitioned Data Stores
Gowtham Rajappan.
Introduction to NewSQL
Concurrency Control II (OCC, MVCC)
Fundamentals of Databases
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
Lecture 20: Intro to Transactions & Logging II
Transactions and Concurrency
History Database - Typical historical query requirements
Concurrency control (OCC and MVCC)
Presentation transcript:

Chen Zhang Hans De Sterck University of Waterloo Supporting Multi-row Distributed Transactions with Global Snapshot Isolation Using Bare-bones HBase Chen Zhang Hans De Sterck University of Waterloo

Outline Introduction System Design System Performance Future Work General Background Snapshot Isolation (SI) HBase System Design Transactional SI Protocol System Performance Future Work

General Background (1) Database transactions have been widely used by websites, analytical programs, etc. Snapshot isolation (SI) has been adopted by major DBMS for high throughput No solution exists for traditional DBMS to be easily replicated and scaled on clouds Column-oriented data stores are proven to be scalable on clouds (BigTable, HBase). However, multi-row distributed transactions are not supported out-of-the-box

General Background (2) Google recently published a paper in OSDI10, Oct. 4 (submission deadline May 7) about their “Percolator” system on top of BigTable for multi-row distributed transactions with SI Our paper describes an approach for multi-row distributed transactions with SI on top of HBase, and it turns out that Google’s system has many design elements that are similar to ours

Snapshot Isolation (1) Snapshot Isolation (SI) For transaction T1 that starts at timestamp ts1 T1 is given the database snapshot up to ts1, and T1 can do reads/writes on its own snapshot independently When T1 commits, T1 checks to see if any other transactions have committed conflicting data updates. If not, T1 commits

Snapshot Isolation (2) Strong SI vs SI Strong SI requires every transaction T to see the most up-to- date snapshot of data SI requires every transaction T to see a consistent snapshot which can be any snapshot taken earlier than T’s start timestamp

HBase HBase is a column-oriented data store A single global database-like table view Multi-version data distinguished by timestamp A data table is horizontally split into row regions and each region is hosted by a region server HBase guarantees single atomic row read/write

Outline Introduction System Design System Performance Future Work General Background Snapshot Isolation (SI) HBase System Design Transactional SI Protocol System Performance Future Work

Design-Overview (1) General Design Objective No deployment of extra programs and inherits HBase properties Scalability, fault tolerance, high throughput, access transparency, etc. Non-intrusive to user data and easy to be adopted No modification to existing user data Implement a client library to manage transaction at client side autonomously; no server-side changes Transactions put their own information into the global tables Meanwhile query those tables for information about other transactions to determine whether to commit/abort

Design-Overview (2) General SI Protocol Every transaction, when it commits, obtains a unique, strictly incremental commit timestamp to determine the order between transactions and be used to enforce SI Every transaction commits successfully by inserting a row in the Committed table Every transaction, when it starts, read the commit timestamp of the most recently committed transaction, and use that as its start timestamp

Design-Overview (3) Simplified Protocol Walkthrough Get start timestamp S, a snapshot of Committed table T reads/writes versions of data identified by S When T tries to commit Checks conflicting updates committed by other transactions by scanning Committed table T writes a row into Precommit table to indicate its attempt to commit Checks conflicting commit attempts by scanning Precommit table If both checks return no conflict, T proceeds to commit by atomically inserting one row to Committed table

Design-SI Protocol For Read-only transactions: Only need to obtain start timestamp and read the correct version of data from the snapshot No need to do Precommit/Commit For Update transactions: Get start timestamp ts Read/write Precommit Commit

Design-SI Protocol For Read-only Transaction Ti Get start timestamp Si and maintain DataSet DS in memory Data read {(L1, data1),…} To read data item at L1 If L1 is in DS, read from DS. Otherwise Query Version table and get C1 Scan Committed table and get the most recent transaction Ci that updates data to version V; update Version table with Ci Use V to read data and add (L1, data) to DS if necessary

Design-SI Protocol For Update Transaction Ti (1) Get start timestamp Si and maintain DataSet DS Data read/written {(L1, data1),…} Read data item at L1 (same as Read-only case) Write Directly write to data tables with unique timestamp Wi

Design-SI Protocol For Update Transaction Ti (2) Precommit Get precommit label Pi Scan Committed table at range [Si+1, ∞) for conflicting commits with overlapping write set. If no conflicts, proceed Add a row Pi to Precommit table. Scan Precommit Table at full range for other rows with overlapping writeset with either nothing under column “Committed” or a value under “Committed” column larger than Si. If no conflicts, proceed

Design-SI Protocol For Update Transaction Ti (3) Commit Get Commit timestamp Ci Add a row Ci to Committed table with data items in writeset as columns (HBase atomic row write) Add “Ci” to row Pi in Precommit table

Design-Timestamp Mechanism For each transaction Ti, four labels/timestamps are used

Design-Timestamp Mechanism Issue globally unique and incremental timestamp/label by using the HBase atomic incrementColumnValue method on a single HBase Table

Design- Obtain Start Timestamp For example, at the time T1 starts, there is a gap for C2 in Committed table The snapshot for T1 is C3, which includes L1 with version W1, and L2 with version W3 Before T1 commits, C2 appears. The snapshot of T1 should have included L1 with version W2 Use CommittedIndex Table to store recent snapshot

Outline Introduction System Design System Performance Future Work General Background Snapshot Isolation (SI) HBase System Design Transactional SI Protocol System Performance Future Work

Performance (1) Test the basic timestamp/label issuing mechanism using a single HBase table

Performance (2) Test the necessity of using version table to minimize the range of scanning Committed table to find the most recent data version

Performance (3) Compare SI Read performance compared to bare-bones HBase Read

Performance (4) Compare SI Write performance compared to bare-bones HBase Write

Future Work Support strong SI with no blocking reads providing high throughput Add mechanism in handling straggling/failed transactions Explore and experiment with usage application scenarios

General Background (3) Similarities-Compared with Percolator Support ACID transactions and guarantee snapshot isolation utilizing the multi-version data support from the underlying column store Implemented as client library rather than as server side middleware Dispense globally unique and well-ordered timestamps from a central location Share some similar protocols for the commit process

General Background (4) Differences-Compared with Percolator Percolator focuses on analytical workloads that tolerate large latency; our system focuses on random data access with high throughput and low latency for web applications Percolator achieves Strong SI but reads may be blocking, sacrificing throughput to data freshness; our system achieves SI and does not block reads, sacrificing data freshness to high throughput Percolator requires modification to existing user data; our system uses a separate set of tables, which is non- intrusive to user data Percolator relies on BigTable single row atomic transaction which is not supported by HBase

Questions Thank you!