Download presentation
1
Consensus on Transaction Commit
Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond, WA Article MSR-TR Consensus on Transaction Commit
2
Commit is Common Do you? I do. I now pronounce you… Marriage ceremony
Ready on the set? Ready! Action! Offer Signature Deal / lawsuit Marriage ceremony Theater Contract law
3
The Common Picture director Ready Action! Ready? actors Ready? Ready
4
All or Nothing: If any actor says no the deal is off.
Ready? No deal! actors director Ready Ready? No deal! actors No! No deal! Ready? actors Ready Ready? Ready No! or timeout No deal!
5
The Database Version TM: Transaction Manager RM: Resource Manager
client TM director director RM actors actors RM RM actors Commit Ready? Ready Commit Commit TM: Transaction Manager RM: Resource Manager
6
Two Phase Commit N Resource Managers (RMs)
Want all RMs to commit or all abort. Coordinated by Transaction Manager (TM) TM sends Prepare, Commit-Abort RM responds Prepared, Aborted 3N+1 messages N+1 stable writes Delay 4 message 2 stable write Blocking: if TM fails, Commit-Abort stalls RequestCommit Prepare Commit Prepared working prepared committed aborted Resource Manager working committed aborted Transaction Manager
7
The Problem With 2PC Blocks if TM fails Atomicity – all or nothing
Consistency – does right thing Isolation – no concurrency anomalies Durability / Reliability – state survives failures Availability: always up Blocks if TM fails
8
Problem Statement ACID Transactions make error handling easy.
One fault can make 2-Phase Commit block. Goal: ACID and Available. Non-blocking despite F faults.
9
Fault-Tolerant Two Phase Commit
Prepared client TM RM RequestCommit Prepare Prepared Prepare TM RM RequestCommit Prepare Prepared If the 2PC Transaction Manager (TM) Fails, transaction blocks. Solution: Add a “spare” transaction manager (non blocking commit, 3 phase commit)
10
Fault-Tolerant Two Phase Commit
client TM RM abort Prepare Prepared commit commit TM TM RM Prepared commit Prepare RequestCommit Prepare Prepared Inconsistent! Now What? Prepare Prepared commit commit abort If the 2PC Transaction Manager (TM) Fails, transaction blocks. Solution: Add a “spare” transaction manager (non blocking commit, 3 phase commit) But… What if….? The complexity is a mess.
11
Fault Tolerant 2PC Several workarounds proposed in database community:
Often called "3-phase" or "non-blocking" commit. None with complete algorithm and correctness proof.
12
“Reaching Agreement in the Presence of Faults”
Shostak, Pease, & Lamport JACM, 1980 25 years of theory Now called the Consensus problem N processes want to agree on a value, even if F of them have failed.
13
Consensus consensus box collects proposed values
Propose X consensus box client W Chosen Propose W client W Chosen client W Chosen collects proposed values Picks one proposed value remembers it forever
14
Consensus for Commit The Obvious Approach
box client TM RM Request Commit Propose Prepared Prepared Chosen Prepared Prepare Commit Commit Prepare Commit TM RM Prepared Chosen Prepared RequestCommit Prepare Prepared Propose Prepared Prepared Chosen Commit Commit Get consensus on TM’s decision. TM just learns consensus value. TM is “stateless”
15
Consensus for Commit The Paxos Commit Approach
client TM RM Request Commit Propose RM1 Prepared consensus box Prepare RM1 Prepared Chosen Commit Commit Prepare consensus box Commit TM RM Propose RM2 Prepared RM2 Prepared Chosen RequestCommit Prepare Propose RM1 Prepared Propose RM2 Prepared RM1 Prepared Chosen RM2 Prepared Chosen Commit Commit Get consensus on each RM’s choice. TM just combines consensus values. TM is “stateless”
16
One fewer message delay
The Obvious Approach Paxos Commit One fewer message delay Prepare Prepare Prepared Propose RM1 Prepared Propose RM2 Prepared Propose Prepared RM1 Prepared Chosen Prepared Chosen RM2 Prepared Chosen Commit Commit
17
Consensus in Action The normal (failure-free) case Two message delays
Consensus box Propose RM Prepared acceptor Propose RM Prepared Vote RM Prepared TM RM Prepared Chosen Propose RM Prepared Vote RM Prepared acceptor Vote RM Prepared TM acceptor The normal (failure-free) case Two message delays Can optimize
18
Consensus in Action TM can always learn what was chosen,
RM Consensus box acceptor TM acceptor TM TM acceptor TM can always learn what was chosen, or get Aborted chosen if nothing chosen yet; if majority of acceptors working .
19
The Complete Algorithm
Subtle. More weird cases than most people imagine. Proved correct.
20
Paxos Commit N RMs 2F+1 acceptors (~2F+1 TMs)
If F+1 acceptors see all RMs prepared, then transaction committed. 2F(N+1) + 3N + 1 messages 5 message delays 2 stable write delays. Client TM RM1…N Acceptors 0…2F request commit prepare prepared all prepared
21
Same algorithm when F=0 and TM = Acceptor
Two-Phase Commit Paxos Commit tolerates F faults 3N+1 messages N+1 stable writes 4 message delays 2 stable-write delays 3N+ 2F(N+1) +1 messages N+2F+1 stable writes 5 message delays 2 stable-write delays Same algorithm when F=0 and TM = Acceptor
22
Summary Commit is common
Two Phase commit is good but… It is the un-availability protocol Paxos commit is non-blocking if there are at most F faults. When F=0 (no fault-tolerance), Paxos Commit == 2PC
24
Paxos Consensus 6F+4 messages, 2F+1 stable writes
Group has a leader known to all leader election is a subroutine Process proposes a value v to leader. Leader sends proposal (phase 2) (ballot, value) to all acceptors Acceptors respond with: max(ballot, value) they have seen If leader gets no higher ballot, and gets at least F+1 responses then leader can announce (ballot, value) Full protocol 3-phase Phase 1: Leader starts new ballot Phase 2 Leader proposes value Phase 3 If value accepted by F+1 then value is accepted. If not, leader tries to get majority value accepted. 6F+4 messages, 2F+1 stable writes 4 message delays and 2 stable write delays
25
Using Consensus Have a consensus for each RM
Prepared client TM RM RequestCommit consensus box Prepare Commit consensus box Prepared Commit Prepare Commit TM RM RequestCommit Prepare Prepared Commit Commit
26
Propose X consensus box RM X Chosen Propose W TM X Chosen X Chosen TM
27
Paxos Commit (success case)
Request Commit Prepare Prepared Commit All Prepared working prepared committed aborted Resource Managers Acceptors Commit Leader working AllPrepared aborted working committed aborted
28
Consensus The distributed systems theory community has thought about this a lot. They call it Consensus: N processes want to agree on a value Want to tolerate F faults Tolerate F processes stopping Tolerate F Messages delayed or lost If there are fewer than F faults in a window Then consensus achieved. Byzantine faults need 3F “acceptors” Benign faults need 2F+1 “acceptors” stalls but safe if more than F faults
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.