CS239-Lecture 15 Transactions Madan Musuvathi Visiting Professor, UCLA Principal Researcher, Microsoft Research.

Slides:



Advertisements
Similar presentations
Computer-System Structures Er.Harsimran Singh
Advertisements

Multiple Processor Systems
Remus: High Availability via Asynchronous Virtual Machine Replication
Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Monitoring Data Structures Using Hardware Transactional Memory Shakeel Butt 1, Vinod Ganapathy 1, Arati Baliga 2 and Mihai Christodorescu 3 1 Rutgers University,
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 16 – Intro. to Transactions.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Transaction Processing Lecture ACID 2 phase commit.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Two phase commit. What we’ve learnt so far Sequential consistency –All nodes agree on a total order of ops on a single object Crash recovery –An operation.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Transaction Management and Concurrency Control
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Reliability and Partition Types of Failures 1.Node failure 2.Communication line of failure 3.Loss of a message (or transaction) 4.Network partition 5.Any.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
CS 603 Data Replication February 25, Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased.
Transaction Management Chapter 9. What is a Transaction? A logical unit of work on a database A logical unit of work on a database An entire program An.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
1 The Google File System Reporter: You-Wei Zhang.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 10 Transaction Management.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Transactions and Locks A Quick Reference and Summary BIT 275.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Databases Illuminated
Implementing Linearizability at Large Scale and Low Latency Collin Lee, Seo Jin Park, Ankita Kejriwal, † Satoshi Matsushita, John Ousterhout Platform Lab.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 4 Computer Systems Review.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 16 – Intro. to Transactions.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
FaRM: Fast Remote Memory Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson and Miguel Castro, Microsoft Research NSDI’14 January 5 th, 2016 Cho,
Hathi: Durable Transactions for Memory using Flash
Lecture 20: Consistency Models, TM
Remote Backup Systems.
Alternative system models
Transaction Management and Concurrency Control
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Introduction to NewSQL
SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing
Introduction to Operating Systems
Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B
HyperLoop: Group-Based NIC Offloading to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh.
Outline Announcements Fault Tolerance.
Chapter 10 Transaction Management and Concurrency Control
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Lecture 21: Replication Control
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
Atomic Commit and Concurrency Control
Software Transactional Memory Should Not be Obstruction-Free
Concurrency Control II and Distributed Transactions
UNIVERSITAS GUNADARMA
Transactions in Distributed Systems
The SMART Way to Migrate Replicated Stateful Services
Lecture 21: Replication Control
Remote Backup Systems.
Presentation transcript:

CS239-Lecture 15 Transactions Madan Musuvathi Visiting Professor, UCLA Principal Researcher, Microsoft Research

No Compromises: Distributed Transactions with Consistency, Availability, and Performance Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B. Nightingale, ´ Matthew Renzelmann, Alex Shamis, Anirudh Badam, Miguel Castro Presented by Lun Liu 2

Distributed Transactions Transaction –A unit of work that you want to treat as a “whole” –ACID Distributed transaction provides powerful abstraction –Simplify building and reasoning about distributed systems Previous implementations performed poorly –Believed to be inherently slow –Not widely used 3

Hardware Trends Modern cluster –Large main memory –Non-volatile memory –Fast network with RDMA 4

Remote Direct Memory Access (RDMA) Read/ write remote memory –NIC (Network Interface Controller) performs DMA (direct memory access) request 5

Remote Memory Direct Access (RDMA) 6

Remote Direct Memory Access (RDMA) Read/ write remote memory –NIC performs DMA (direct memory access) request Great performance –Bypasses the kernel –Bypasses the remote CPU 7

Hardware Trends Modern cluster –Large main memory –Non-volatile memory –Fast network with RDMA Eliminates storage and network bottlenecks but leaves the CPU bottlenecks 8

FaRM: No Need to Compromise Main memory distributed computing platform –Use hardware effectively Strict serializability High performance Durability High availability 9

Outline FaRM background Transaction protocol Failure recovery Evaluation 10

FARM BACKGROUND 11

FaRM Programming Model Distributed shared memory abstraction FaRM API provides transparent access to local and remote objects 12

FaRM Programming Model Application thread can start a transaction –Becomes the coordinator During transaction, can execute arbitrary logic as well as read, write, allocate, free. At the end of the execution, invoke FaRM to commit. 13

FaRM Architecture Primary – backup for fault tolerance Configuration Manager(CM) –Manage leases, detect failures, coordinate recovery 14

RDMA Read in FaRM 15

RDMA Read in FaRM 16

RDMA Write in FaRM 17

TRANSACTION PROTOCOL 18

FaRM Transaction Execution 19 C P1 B1 P2 B2 Execute One-sided RDMA read Buffer writes as local

Two Phase Commit 20 C P1 B1 P2 B2 Execute Prepare Commit Two round-trip message passing Requires all machines’ CPU

FaRM Commit Use one-sided RDMA operations Reduce message counts Optimistic Concurrency Control (OCC) + read validation –Version number –Speculative execution -> lock write set -> validate read set (decision point) -> commit and unlock 21

FaRM Commit 22 C P1 B1 P2 B2 Execute Lock Writes LOCK record to primary of written objects Primary attempts to lock and sends back message reporting succeed or not

FaRM Commit 23 C P1 B1 P2 B2 Execute LockValidate Decision Validates read set using one-sided RDMA

FaRM Commit 24 C P1 B1 P2 B2 Execute Lock Validate Replicate Logs to Backups using COMMIT-BACKUP Coordinator waits until receives all hardware ack

FaRM Commit 25 C P1 B1 P2 B2 Execute Lock Validate Replicate Update and Unlock Writes COMMIT-PRIMARY Primary processes by updating and unlocking Responds to application upon receiving one hardware ack

FaRM Commit 26 C P1 B1 P2 B2 Execute Lock Validate Replicate Update and Unlock Coordinator truncates after receiving all ack from Primaries Piggybacking in other log records Backups apply updates at truncation time Truncate

FaRM Commit Correctness Serialization point –Read-write Tx: all write locks acquired –Read-only Tx: last read –Equivalent to committing atomically at this point 27

FaRM Commit Performance Use primary - backup to reduce number of copies of data Use one-sided RDMA extensively P w (f + 3) RDMA writes P r RDMA reads 28

FAILURE RECOVERY 29

One Sided Operations Complicate Recovery CPU does not process remote access Configuration change –Precise membership Recovery –Drain logs before recovering 30

High Availability Short failure detection time Data available instantly –A backup promoted to primary Use parallelism –New transactions in parallel with recovery –Recover transactions in parallel –Recover data in parallel 31

Failure Recovery Steps Detect failure Reconfiguration Recover transactions Recover data 32

Failure Detection 33 FaRM uses leases to detect failure

Failure Detection 34 Short lease interval, high availability

Reconfiguration 35 CM=S1 S2 S3 S4 suspect S3 Probe

Reconfiguration 36 CM=S1 S2 S3 S4 suspect S3 Probe ZooKeeper -> REMAP Backup becomes primary, load balancing

Reconfiguration 37 CM=S1 S2 S3 S4 suspect S3 Probe ZooKeeper -> REMAP NEW-CONFIG NEW-CONFIG-ACK NEW-CONFIG-COMMIT Machines apply new configurations, stop accessing to affected region

Reconfiguration 38 CM=S1 S2 S3 S4 suspect S3 Probe ZooKeeper -> REMAP NEW-CONFIG NEW-CONFIG-ACK NEW-CONFIG-COMMIT Recovery starts after everyone stops accessing recovering regions

NEW-CONFIG- COMMIT NEW- CONFIG Transaction Recovery 39 C P B1 B2 Block Drain Process all the messages in logs that haven’t been processed

Transaction Recovery 40 C P B1 B2 Block Drain Find Recovering Lock Region looks as it is before the failure

Transaction Recovery 41 C P B1 B2 Block Drain Find Recovering Lock Region is active Operations on the region in parallel with recovery

Transaction Recovery 42 C P B1 B2 Block Drain Find Recovering Lock Region is active Replicate logs Vote Decide New coordinators are spread around cluster

Data Recovery Need to re-replicate data at new backups –To tolerate failures in the future Parallelize data replication Done in background –Starts after locks acquired at all primaries –Paced 43

EVALUATION 44

Evaluation Settings 90 machines cluster, 5 machines ZooKeeper –256 GB DRAM, 16 cores –30 threads for foreground work, 2 threads for lease manager –2x Infiniband Mellanox ConnectX-3 56 Gbps 3 way replication (1 primary 2 backups) Standard OLTP benchmarks –TATP, TPCC 45

TATP Performance 46 Read dominated 70% single read 10% read 2- 4 rows 20% updates

TPC-C Performance 47 Complex transactions 10% distributed 45% “new orders”

TPC-C Recovery 48

TPC-C Recovery 49

TATP 18 Machines Failing 50

Conclusion FaRM transactions –Tailored to hardware in modern data centers –RDMA, low message counts, parallelism High performance High availability 51

Backup 52

Paxos VS PB 53

Transaction and Serializability Serializability –the outcome is equal to the outcome of transactions executed serially 54

Use one-sided RDMA operations Reduce message counts Effectively use parallelism 55

TPC-C Aggressive Data Recovery 56

TATP CM Failure 57

Lease Duration 58

Type-Aware Transactions for Faster Concurrent Code Presented by Nisarg Shah

Overview Paper talks about their improvement over Software Transactional Memory (STM). STM is an alternative to traditional locking in a programming language for consistency in concurrent applications. Paper introduces STO (Software Transactional Objects) which are faster than STMs. Track abstract operations on transactional datatypes. Predicates prevent false conflicts.

STM DB transactions Foo * getFoo() { static Foo * pFoo = 0; atomic { if (pFoo != 0) // read pFoo = new Foo(); // write(s) } return pFoo; } TL2

Problems with STM Not as fast as purpose-built concurrent code Can’t be implemented at hardware level High costs of book keeping and concurrency control Track all objects Locking them or taking snapshots Solution? Abstract datatype operations! Abstract reads, writes and predicates on transactional datatypes.

Software Transactional Object [STO] Aims to Use the datatype semantics Reduce book keeping Limit false conflicts Efficient concurrency control Datatypes instead of Memory Words. Vectors, trees, hash tables, priority queues. Can add your own datatype too. Optimistic predicate verification Based off a lot of other related work. Do go through in the paper.

Code bool transfer(TBox & bal1, TBox & bal2, int amt) { TRANSACTION { // open new transaction if (amt < 0 || bal1 < amt) return false; bal1 = bal1 - amt; bal2 = bal2 + amt; } RETRY(true); // commit with retry return true; }

STO platform TObject is the base for all Transactional Objects Version number to track update. [ID and helper bits] Version number protect different logical segment of state. COMMUTE! Tracking set [class TItem: read version, write value, predicate value] Read Modify Callbacks

Commit protocol Phase 1 All ‘write’ TItems are locked Lock() callback must lock relevant segment of state

Commit protocol Phase 1 All ‘write’ TItems are locked Lock() callback must lock relevant segment of state Phase 2 Read version numbers verified Check() callback

Commit protocol Phase 1 All ‘write’ TItems are locked Lock() callback must lock relevant segment of state Phase 2 Read version numbers verified Check() callback Transaction will commit now. Phase 3 Install()

Commit protocol Phase 1 All ‘write’ TItems are locked Lock() callback must lock relevant segment of state Phase 2 Read version numbers verified Check() callback Transaction will commit now. Phase 3 Install() Cleanup

False conflicts | Optimistic Predicates Operations that commute do not require rolling back. Segmentation (with version number for each) Logical partition Accumulated counter delta Multiple versions of read for a no-partition datatype Optimistic transactional predicate TItem records a ‘predicate expression’

Opacity User code never observes transactionally inconsistent state. STO uses TL2 for this. Unlike TL2, STO always records reads in the tracking set. This version of TL2 is fast, but overheads of checks, contention on global version clock make it slow.

Datatypes Consideration: Specification Insert element Absent element Read my writes Correctness Composition

Evaluation - STAMP

References transactional-memory/ transactional-memory/