COS 418: Distributed Systems Lecture 6 Daniel Suo

Slides:



Advertisements
Similar presentations
Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
6.852: Distributed Algorithms Spring, 2008 Class 7.
CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
E-Transactions: End-to-End Reliability for Three-Tier Architectures Svend Frølund and Rachid Guerraoui.
Transaction Processing Lecture ACID 2 phase commit.
Two phase commit. What we’ve learnt so far Sequential consistency –All nodes agree on a total order of ops on a single object Crash recovery –An operation.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
CS162 Section Lecture 10 Slides based from Lecture and
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Distributed Transactions Chapter 13
University of Tampere, CS Department Distributed Commit.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Section 06 (a)RDBMS (a) Supplement RDBMS Issues 2 HSQ - DATABASES & SQL And Franchise Colleges By MANSHA NAWAZ.
Introduction Many distributed systems require that participants agree on something On changes to important data On the status of a computation On what.
Primary-Backup Replication
The consensus problem in distributed systems
Distributed Systems – Paxos
Chapter 19: Distributed Databases
Outline Introduction Background Distributed DBMS Architecture
Database System Implementation CSE 507
Two phase commit.
View Change Protocols and Reconfiguration
IS 651: Distributed Systems Consensus
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
Distributed Systems, Consensus and Replicated State Machines
Fault-tolerance techniques RSM, Paxos
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Concurrency Control --- 3
Distributed Transactions
Lecture 21: Replication Control
Atomic Commit and Concurrency Control
EEC 688/788 Secure and Dependable Computing
EECS 498 Introduction to Distributed Systems Fall 2017
IS 651: Distributed Systems Final Exam
Causal Consistency and Two-Phase Commit
Lecture 20: Intro to Transactions & Logging II
Distributed Databases Recovery
View Change Protocols and Reconfiguration
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
UNIVERSITAS GUNADARMA
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Transactions
Paxos Made Simple.
Lecture 21: Replication Control
Implementing Consistency -- Paxos
CSE 486/586 Distributed Systems Concurrency Control --- 3
Last Class: Fault Tolerance
Transaction Communication
Presentation transcript:

COS 418: Distributed Systems Lecture 6 Daniel Suo Two-Phase Commit COS 418: Distributed Systems Lecture 6 Daniel Suo Acknowledgements: Kyle Jamieson, Mike Freedman, Irene Zhang

Plan Fault tolerance in a nutshell Safety and liveness Two-phase commit Floods, fires, top-of-rack switch failure & c.

Fault tolerance in a nutshell QUESTION: how is it different from parallelism?

What is fault tolerance? Building reliable systems from unreliable components Three basic steps Detecting errors: discovering the presence of an error in a data value or control signal Containing errors: limiting how far errors propagate Masking errors: designing mechanisms to ensure a system operates correctly despite error (and possible correct error)

Why is fault tolerance hard? Failures Propagate Say one bit in a DRAM fails… …it flips a bit in a memory address the kernel is writing to... ...causes big memory error elsewhere, or a kernel panic... ...program is running one of many distributed file system storage servers... ...a client can’t read from FS, so it hangs.

So what to do? Do nothing: silently return the failure Fail fast: detect the failure and report at interface Ethernet station jams medium on detecting collision Fail safe: transform incorrect behavior or values into acceptable ones Failed traffic light controller switches to blinking-red Mask the failure: operate despite failure Retry op for transient errors, use error-correcting code for bit flips, replicate data in multiple places

Masking failures We mask failures on one server via Atomic operations Logging and recovery In a distributed system with multiple servers, we might replicate some or all servers But if you give a mouse some replicated servers She’s going to want a way to keep them all consistent in a fault tolerant way

Safety and liveness QUESTION: how is it different from parallelism?

Reasoning about fault tolerance This is hard! How do we design fault-tolerant systems? How do we know if we’re successful? Often use “properties” that hold true for every possible execution We focus on safety and liveness properties

Safety “Bad things” don’t happen Examples No stopped or deadlocked states No error states Examples Mutual exclusion: two processes can’t be in a critical section at the same time Bounded overtaking: if process 1 wants to enter a critical section, process 2 can enter at most once before process 1 NOTE: can be violated in finite execution

Liveness “Good things” happen Examples …eventually Starvation freedom: process 1 can eventually enter a critical section as long as process 2 terminates Eventual consistency: if a value in an application doesn’t change, two servers will eventually agree on its value NOTE: might not be violated in finite execution

Often a tradeoff “Good” and “bad” are application-specific Safety is very important in banking transactions May take some time to confirm a transaction Liveness is very important in social networking sites See updates right away (what about the “breakup problem”?)

Two-phase commit QUESTION: how is it different from parallelism?

Motivation: sending money send_money(A, B, amount) { Begin_Transaction(); if (A.balance - amount >= 0) { A.balance = A.balance - amount; B.balance = B.balance + amount; Commit_Transaction(); } else { Abort_Transaction(); } }

Single-server: ACID Atomicity: all parts of the transaction execute or none (A’s decreases and B’s balance increases) Consistency: the transaction only commits if it preserves invariants (A’s balance never goes below 0) Isolation: the transaction executes as if it executed by itself (even if C is accessing A’s account, that will not interfere with this transaction) Durability: the transaction’s effects are not lost after it executes (updates to the balances will remain forever)

Distributed transactions? Partition databases across multiple machines for scalability (A and B might not share a server) A transaction might touch more than one partition How do we guarantee that all of the partitions commit the transactions or none commit the transactions?

Two-Phase Commit (2PC) Goal: General purpose, distributed agreement on some action, with failures Different entities play different roles in the action Running example: Transfer money from A to B Debit at A, credit at B, tell the client “okay” Require both banks to do it, or neither Require that one bank never act alone This is an all-or-nothing atomic commit protocol Later will discuss how to make it before-or-after atomic

Transaction Coordinator TC Straw Man protocol C  TC: “go!” Client C go! Transaction Coordinator TC Invent a transaction coordinator as a single authoritative entity Bank A B

Transaction Coordinator TC Straw Man protocol C  TC: “go!” TC  A: “debit $20!” TC  B: “credit $20!” TC  C: “okay” A, B perform actions on receipt of messages Client C go! okay Transaction Coordinator TC debit $20! credit $20! Invent a transaction coordinator as a single authoritative entity Bank A B

Reasoning about the Straw Man protocol What could possibly go wrong? Not enough money in A’s bank account? B’s bank account no longer exists? A or B crashes before receiving message? The best-effort network to B fails? TC crashes after it sends debit to A but before sending to B?

Safety versus liveness Note that TC, A, and B each have a notion of committing We want two properties: Safety If one commits, no one aborts If one aborts, no one commits Liveness If no failures and A and B can commit, action commits If failures, reach a conclusion ASAP Segue: Let’s start with correctness

A correct atomic commit protocol C  TC: “go!” Client C go! Transaction Coordinator TC Bank A B

A correct atomic commit protocol C  TC: “go!” TC  A, B: “prepare!” Client C Transaction Coordinator TC prepare! prepare! Bank A B

A correct atomic commit protocol C  TC: “go!” TC  A, B: “prepare!” A, B  P: “yes” or “no” Client C Transaction Coordinator TC yes yes Bank A B

A correct atomic commit protocol C  TC: “go!” TC  A, B: “prepare!” A, B  P: “yes” or “no” TC  A, B: “commit!” or “abort!” TC sends commit if both say yes TC sends abort if either say no Client C Transaction Coordinator TC commit! commit! Bank A B

A correct atomic commit protocol C  TC: “go!” TC  A, B: “prepare!” A, B  P: “yes” or “no” TC  A, B: “commit!” or “abort!” TC sends commit if both say yes TC sends abort if either say no TC  C: “okay” or “failed” A, B commit on receipt of commit message Client C okay Transaction Coordinator TC Bank A B

Reasoning about atomic commit Why is this correct? Neither can commit unless both agreed to commit What about performance? Timeout: I’m up, but didn’t receive a message I expected Maybe other node crashed, maybe network broken Reboot: Node crashed, is rebooting, must clean up Segue: Let’s deal with timeout first

Timeouts in atomic commit Where do hosts wait for messages? TC waits for “yes” or “no” from A and B TC hasn’t yet sent any commit messages, so can safely abort after a timeout But this is conservative: might be network problem We’ve preserved correctness, sacrificed performance A and B wait for “commit” or “abort” from TC If it sent a no, it can safely abort (why?) If it sent a yes, can it unilaterally abort? Can it unilaterally commit? A, B could wait forever, but there is an alternative… … 2. If A/B voted no, it can safely abort (since the TC will abort from receiving the no, or conservatively abort) If sent yes, can it unilaterally abort? (NO, since the TC may go ahead, or may have crashed before sending commit to B but will come back up) Can it unilaterally commit? (NO, never, needs consensus)

Server termination protocol Consider Server B (Server A case is symmetric) waiting for commit or abort from TC Assume B voted yes (else, unilateral abort possible) B  A: “status?” A then replies back to B. Four cases: (No reply from A): no decision, B waits for TC Server A received commit or abort from TC: Agree with the TC’s decision Server A hasn’t voted yet or voted no: both abort TC can’t have decided to commit Server A voted yes: both must wait for the TC TC decided to commit if both replies received TC decided to abort if it timed out Segue: Let’s deal with timeout first

Reasoning about the server termination protocol What are the liveness and safety properties? Safety: if servers don’t crash, all processes will reach the same decision Liveness: if failures are eventually repaired, then every participant will eventually reach a decision Can resolve some timeout situations with guaranteed correctness Sometimes however A and B must block Due to failure of the TC or network to the TC But what will happen if TC, A, or B crash and reboot? SEGUE: Crash and reboot? Let's see (next slide).

How to handle crash and reboot? Can’t back out of commit if already decided TC crashes just after sending “commit!” A or B crash just after sending “yes” If all nodes knew their state before crash, we could use the termination protocol… Use write-ahead log to record “commit!” and “yes” to disk Big Trouble: We’re back to debiting one bank and not crediting another Why write to log first? (If we write to log second we might send commit then crash!)

Recovery protocol with non-volatile state If everyone rebooted and is reachable, TC can just check for commit record on disk and resend action TC: If no commit record on disk, abort You didn’t send any “commit!” messages A, B: If no yes record on disk, abort You didn’t vote “yes” so TC couldn’t have committed A, B: If yes record on disk, execute termination protocol This might block

Two-Phase Commit This recovery protocol with non-volatile logging is called Two-Phase Commit (2PC) Safety: All hosts that decide reach the same decision No commit unless everyone says “yes” Liveness: If no failures and all say “yes” then commit But if failures then 2PC might block TC must be up to decide Doesn’t tolerate faults well: must wait for repair SEGUE: So, let’s investigate replicating the data.

Assignment 2 Due October 19 Wednesday topic Consensus I: FLP Impossibility and Paxos QUESTION: how is it different from parallelism?

Two-phase commit failure scenarios QUESTION: how is it different from parallelism?

What if participant fails before sending response? SEGUE: So, let’s investigate replicating the data.

What if participant fails after sending vote SEGUE: So, let’s investigate replicating the data.

What if participant lost a vote? SEGUE: So, let’s investigate replicating the data.

What if coordinator fails before sending prepare? SEGUE: So, let’s investigate replicating the data.

What if coordinator fails after sending prepare? SEGUE: So, let’s investigate replicating the data.

What if coordinator fails after receiving votes SEGUE: So, let’s investigate replicating the data.

What if coordinator fails after sending decision? SEGUE: So, let’s investigate replicating the data.

Do we need the coordinator? SEGUE: So, let’s investigate replicating the data.

What happens if we don’t have a decision? SEGUE: So, let’s investigate replicating the data.