Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.

Slides:



Advertisements
Similar presentations
Replication techniques Primary-backup, RSM, Paxos Jinyang Li.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
1 Transactions and Web Services. 2 Web Environment Web Service activities form a unit of work, but ACID properties are not always appropriate since Web.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christos Karamanolis.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
E-Transactions: End-to-End Reliability for Three-Tier Architectures Svend Frølund and Rachid Guerraoui.
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Two phase commit. What we’ve learnt so far Sequential consistency –All nodes agree on a total order of ops on a single object Crash recovery –An operation.
CS 582 / CMPE 481 Distributed Systems
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Persistent State Service 1 Distributed Object Transactions  Transaction principles  Concurrency control  The two-phase commit protocol  Services for.
1 ICS 214B: Transaction Processing and Distributed Data Management Replication Techniques.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 11: Transactions Dr. Michael R. Lyu Computer Science & Engineering.
Failure and Availibility DS Lecture # 12. Optimistic Replication Let everyone make changes –Only 3 % transactions ever abort Make changes, send updates.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Low-Latency Multi-Datacenter Databases using Replicated Commit
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Databases
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Distributed Deadlocks and Transaction Recovery.
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
CS162 Section Lecture 10 Slides based from Lecture and
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Distributed Transactions
Distributed Transaction Management, Fall 2002Lecture Distributed Commit Protocols Jyrki Nummenmaa
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
University of Tampere, CS Department Distributed Commit.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
Distributed Transactions Chapter – Vidya Satyanarayanan.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
IM NTU Distributed Information Systems 2004 Distributed Transactions -- 1 Distributed Transactions Yih-Kuen Tsay Dept. of Information Management National.
Manish Kumar,MSRITSoftware Architecture1 Remote procedure call Client/server architecture.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Recovery in Distributed Systems:
Two phase commit.
COS 418: Distributed Systems Lecture 6 Daniel Suo
Commit Protocols CS60002: Distributed Systems
RELIABILITY.
Outline Announcements Fault Tolerance.
Fault-tolerance techniques RSM, Paxos
From Viewstamped Replication to BFT
CSE 486/586 Distributed Systems Concurrency Control --- 3
Assignment 5 - Solution Problem 1
Distributed Transactions
Lecture 21: Replication Control
Atomic Commit and Concurrency Control
Causal Consistency and Two-Phase Commit
Database Recovery 1 Purpose of Database Recovery
UNIVERSITAS GUNADARMA
Transactions in Distributed Systems
Lecture 21: Replication Control
CIS 720 Concurrency Control.
CSE 486/586 Distributed Systems Concurrency Control --- 3
Presentation transcript:

Two phase commit

Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers applied update X to a replica? Achieving agreement w/ failures is hard –Impossible to distinguish host vs. network failures This class: –all-or-nothing atomicity in distributed systems

Example Bank ABank B Transfer $1000 From A:$3000 To B:$2000 Clients want all-or-nothing transactions –Transfer either happens or not at all client

Strawman solution Bank ABank B Transfer $1000 From A:$3000 To B:$2000 client Transaction coordinator

Strawman solution What can go wrong? –A does not have enough money –B’s account no longer exists –B has crashed –Coordinator crashes client transaction coordinator bank Abank B start done A=A-1000 B=B+1000

Reasoning about correctness TC, A, B each has a notion of committing Correctness: –If one commits, no one aborts –If one aborts, no one commits Performance: –If no failures, A and B can commit, then commit –If failures happen, find out outcome soon

Correctness first client transaction coordinator bank Abank B start result prepare rBrB rArA outcome If r A ==yes && r B ==yes outcome = “commit” else outcome = “abort” B commits upon receiving “commit”

Performance Issues What about timeouts? –TC times out waiting for A’s response –A times out waiting for TC’s outcome message What about reboots? –How does a participant clean up?

Handling timeout on A/B TC times out waiting for A (or B)’s “yes/no” response Can TC unilaterally decide to commit? Can TC unilaterally decide to abort?

Handling timeout on TC If B responded with “no” … –Can it unilaterally abort? If B responded with “yes” … –Can it unilaterally abort? –Can it unilaterally commit?

Possible termination protocol Execute termination protocol if B times out on TC and has voted “yes” B sends “status” message to A –If A has received “commit”/”abort” from TC … –If A has not responded to TC, … –If A has responded with “no”, … –If A has responded with “yes”, … Resolves most failure cases except sometimes when TC fails

Handling crash and reboot Nodes cannot back out if commit is decided TC crashes just after deciding “commit” –Cannot forget about its decision after reboot A/B crashes after sending “yes” –Cannot forget about their response after reboot

Handling crash and reboot All nodes must log protocol progress What and when does TC log to disk? What and when does A/B log to disk?

Recovery upon reboot If TC finds no “commit” on disk, abort If TC finds “commit”, commit If A/B finds no “yes” on disk, abort If A/B finds “yes”, run termination protocol to decide

Summary: two-phase commit 1.All nodes that decide reach the same decision 2.No commit unless everyone says "yes". 3.No failures and all "yes", then commit. 4.If failures, then repair, wait long enough for recovery, then some decision.

A Case study of 2P commit in real systems Sinfonia (SOSP’07)

What problem is Sinfonia addressing? Targeted uses – systems or infrastructural apps within a data center Sinfonia: a shared data service –Span multiple nodes –Replicated with consistency guarantees Goal: reduce development efforts for system programmers

Sinfonia architecture Each memory node provides a shared address space with name (node-id, address)

Sinfonia mini-transactions Provide all-or-nothing atomic operations –as well as before-after atomicity (using locks) Trade off expressiveness for efficiency –fewer network roundtrips to execute –Less flexible, general-purpose than traditional transactions Result –a lightweight, short-lived type of transaction –over unstructured data

Mini-transaction details Mini-transaction –Check compare items –If match, retrieve data in read items, modify data in write items Example: t = new Minitransaction() t->cmp(node-X:0x000, 4, 3000) t->cmp(node-Y:0x100, 4, 2000 t->write(node-X:0x000, 4, 2000) t->write(node-Y:0x100, 4, 3000) Status = t->exec_and_commit()

Sinfonia uses 2P commit prepare commit action1 action2 actions… Traditional transactions: general but expensive BEGIN tx If (a > 0 && b== 0) b = a * a for (i = 0; i < a; i++) b += i END tx Mini-transaction: less general but efficient BEGIN tx If (a == 3000 && b==2000) { a=2000 b=3000 } END tx Prepare & exec commit Traditional transactions Mini- transactions coordinator

Potential uses of mini- transactions 1. atomic swap operation 2. atomic read of many data 3. try to acquire a lease 4. try to acquire multiple leases atomically 5. change data if lease is held 6. validate cache then change data

Sinfonia’s 2P protocol Transaction coordinator is at application node instead of memory node –Saves one RTT Problems: crashed TC blocks transaction progress –App nodes are less reliable than memory nodes

Sinfonia’s 2P protocol TC keeps no log A transaction is committed iff all participants have “yes” in their logs Recovery coordinator cleans up –Ask all participants for existing vote (or vote “no” if not voted yet) –Commit iff all vote “yes” Transaction blocks if a memory node crashes –Must wait for memory node to recovery from disk

Sinfonia applications SinfoniaFS –hosts share the same set of files, files stored in Sinfonia –scalable: performance improves with more memory nodes –fault tolerant SinfoniaFS exports a NFS interface –Each NFS op corresponds to 1 mini-transaction

SinfoniaFS architecture

Example use of mini-transaction setattr(ino_t inum, sattr_t newattr) { do { addr = address of inode curr_version = inode->version t = new Minitransaction; t->cmp(addr, 4, curr_version) t->write(addr, 4, curr_version+1) t->write(addr, 20, newattr); }while (status == fail); }

General use of mini- transaction in SinfoniaFS 1.If local cache is empty, load it 2.Make modifications to local cache 3.Issue a mini-transaction to check the validity of cache, apply modification 4.If mini-transaction fails, reload cached item and try again

More examples: append to file Find a free block in cached freemap Issue mini-transaction with –Compare items: cached inode, free status of the block –Write items: inode, append new block, freemap, new block If mini-transaction fails, reload cache

Sinfonia’s mini-transaction is fast