CSCI5570 Large Scale Data Processing Systems

Slides:



Advertisements
Similar presentations
More About Transaction Management Chapter 10. Contents Transactions that Read Uncommitted Data View Serializability Resolving Deadlocks Distributed Databases.
Advertisements

High throughput chain replication for read-mostly workloads
Calvin : Fast Distributed Transactions for Partitioned Database
Lock-Based Concurrency Control
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture X: Transactions.
Transaction.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
What Should the Design of Cloud- Based (Transactional) Database Systems Look Like? Daniel Abadi Yale University March 17 th, 2011.
Transaction Management and Concurrency Control
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Overview Distributed vs. decentralized Why distributed databases
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Distributed Databases
1 The Google File System Reporter: You-Wei Zhang.
Presented by Dr. Greg Speegle April 12,  Two-phase commit slow relative to local transaction processing  CAP Theorem  Option 1: Reduce availability.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
Databases Illuminated
XA Transactions.
Real-time Databases Presented by Parimala kyathsandra CSE 666 fall 2006 Instructor Prof. Subra ganesan.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
MBA 664 Database Management Systems Dave Salisbury ( )
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
3 Database Systems: Design, Implementation, and Management CHAPTER 9 Transaction Management and Concurrency Control.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Distributed Databases
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
CSCI5570 Large Scale Data Processing Systems
Remote Backup Systems.
Database Recovery Techniques
Slide credits: Thomas Kao
CS 347: Parallel and Distributed Data Management Notes07: Data Replication Hector Garcia-Molina CS 347 Notes07.
CSCI5570 Large Scale Data Processing Systems
CS 440 Database Management Systems
Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li
Distributed Systems – Paxos
Concurrency Control Techniques
Alternative system models
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
Introduction to NewSQL
Database Recovery Techniques
Implementing Consistency -- Paxos
CS 440 Database Management Systems
Outline Announcements Fault Tolerance.
Chapter 10 Transaction Management and Concurrency Control
Distributed Databases
EEC 688/788 Secure and Dependable Computing
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
EEC 688/788 Secure and Dependable Computing
Distributed Transactions
Lecture 21: Replication Control
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
Database Recovery 1 Purpose of Database Recovery
Transactions in Distributed Systems
Database System Architectures
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Implementing Consistency -- Paxos
Remote Backup Systems.
Transactions, Properties of Transactions
Presentation transcript:

CSCI5570 Large Scale Data Processing Systems NewSQL James Cheng CSE, CUHK Slide Ack.: modified based on the slides from Lianghong Xu

Calvin: fast distributed transactions for partitioned database systems Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Daniel J. Abadi SIGMOD 2012

High-­level Overview A transaction processing and replication layer Based on generic, non-transactional data store Full ACID for distributed transactions Active, consistent replication Horizontal scalability Key technical feature Scalable distributed transaction processing by a deterministic locking mechanism that eliminates distributed commit protocols

The Problem Distributed transactions are expensive Agreement protocol Multiple roundtrips in 2-phase commit Much longer than transaction logic itself Limited scalability Locking Lock held during the entire transaction Possible deadlock

Consistent replication Many systems allow inconsistent replication Replicas can diverge, but are eventually consistent Dynamo, SimpleDB, Cassandra… Consistent replication: emerging trend (for OLTP) Instant failover Increased latency, especially for geo-replication Cost is only in latency, not throughput or contention http://stackoverflow.com/questions/1970345/what-is-thread-contention lock contention: This occurs whenever one process or thread attempts to acquire a lock held by another process or thread. The more fine-grained the available locks, the less likely one process/thread will request a lock held by the other. (For example, locking a row rather than the entire table, or locking a cell rather than the entire row.) Lock contention occurs when one process or thread attempts to acquire a lock held by another process or thread

Goals of Calvin Eliminate 2-phase commit (scalability) Reduce lock duration (throughput) Provide consistent replication (consistency) Approach: Decoupling “transaction logic” from “heavy-lifting tasks” replication, locking, disk access … Deterministic concurrency control

Non-deterministic Database Systems Serial ordering cannot be pre-determined given certain transaction inputs Determined in the runtime logically equivalent to some serial ordering Ordering can diverge for different executions Due to different thread scheduling, network latencies … Abort on non-deterministic events, e.g., node failure

Serial Ordering in 2-phase Locking

Deterministic database systems Given transactional inputs, serial ordering is pre-determined Acquire looks in the order of the pre-determined transactional input All replicas can be made to emulate the same serial execution order Database states can be guaranteed not to diverge

Deterministic database systems Given transactional inputs, serial ordering is pre-determined Benefits: No agreement protocol No need to check node failures Recovery from other replicas No aborts due to non-deterministic events Consistent replication made easier Only need to replicate transactional inputs Execute same transaction inputs in replicating sites Disadvantage Reduced concurrency due to the pre-determined ordering

Architecture Consistent replication Deterministic locking Client Requests Replicated requests Consistent replication Sequencer Sequencer Sequencer Sequencer Partial Requests batch Deterministic locking Scheduler Scheduler Scheduler Scheduler Remote access create, read, update and delete Local access “CRUD” storage interface Storage Storage Storage Storage Partition 1 Partition 2 Partition 1 Partition 2 Replica A Replica B

Sequencer and replication Transactional inputs are divided into batches each fallen into a 10-msec epoch Sequencer places transactions in each batch as a global sequence Nodes are organized into replication groups, each containing all replicas of a particular partition (e.g., partition 1 in replica A and partition 1 in replica B form a replication group) Two modes of replication asynchronous mode synchronous mode

Sequencer and replication Only transactional inputs need to be replicated Asynchronous replication one replica is designated as master master replica handles all transaction inputs each sequencer on master propagates to slave sequencers in its replication group low latency, but complex failure recovery Synchronous replication all sequencers in a replication group use Paxos (ZooKeeper) to agree on a combined batch of transactions for each epoch larger latency but throughput not affected contention footprint (i.e., total duration a transaction holds its locks) remains the same

Scheduler and deterministic locking Each node’s scheduler only lock records stored locally Single-threaded locking manager Resemble 2-phase locking Enforce transaction ordering from sequencers if transactions A and B both request locks on record R, and A before B in the sequence, then A must request lock on R before B, and lock manager must grant lock on R to A before B No deadlock Question: what if A and B come from 2 sequencers, X and Y, that are not in the same replication group? How to order A and B? it seems that X and Y do not talk to each other (or also use Paxos to agree on a global order?)

Deterministic locking Read A Read B Write A T1 Pre-determined serial ordering Read B Write B T2 Read C T3 T1 must precede T2 T1: Lw(A) T1: Lr(B) T2: Lw(B) T3: Lr(C) Memory A B C Disk

Worker execution A B C Read A Read B Write A T1 Pre-determined serial ordering Read B Write B T2 Read C T3 Active participant: write set is stored locally Passive participant: only read set is stored Memory A B C Disk Active participant Passive participant Passive participant

Worker execution A B C Read A Read B Write A T1 Pre-determined serial ordering Read B Write B T2 Read C T3 Memory A B C Disk Active participant

Problem of deterministic locking Non-deterministic locking Read A Read B Write A Read A Read B Write B Read C Write A Need to fetch from disk T1 Reordering Read B Write B T2 Read C T3 T2 cannot proceed because T1 is block waiting on disk access to A Even if T1 does NOT acquire lock for B yet T3 can still proceed

Calvin’s solution Delay forwarding transaction requests Problem Prefetch data from disk to eliminate contention Hope: most (99%) transactions execute without access to disk Problem How long to delay? Need to estimate disk and network latency Tracking all in-memory records Need a scalable approach Future work

Checkpointing Fault-tolerance made easier in Calvin Instant failover Only log transactional inputs, no REDO log Checkpointing for fast recovery Failure recovery from a recent state Rather than from scratch

Checkpointing Naive synchronous checkpointing Freeze one replica only for checkpointing Difficult to bring the frozen replica up-to-date Asynchronous variation of Zigzag Zigzag 2 copies per record in each replica, 2 times memory footprint 1 copy is up-to-date, the other is the latest checkpoint Calvin’s Zigzag algorithm Specify a point in the global serial order, two versions for each record: before: only updated by transactions preceding the point after: only updated by transactions after the point Once all transactions preceding the point completed, before version is immutable and checkpointed Then before version is deleted, after version will be used for the next checkpointing

Scalability Calvin scales (near-) linearly Throughput comparable to world-record With much cheaper hardware

Scalability Scales linearly for low contention workloads Figure 5: Total and per-node microbenchmark throughput, varying deployment size. Scales linearly for low contention workloads Scales sub-linearly when contention is high Stragglers (slow machines, execution process skew) Exacerbated with high contention

Scalability under high contention Slowdown is measured by comparing with running a perfectly partitionable, low-contention version of the same workload Calvin scales better than 2PC in face of high contention