Download presentation
Presentation is loading. Please wait.
Published byAnabel Bryant Modified over 9 years ago
1
HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba Shrira 3 1 MIT CSAIL 2 INESC-ID and Instituto Superior Técnico 3 Brandeis University
2
Byzantine Fault Tolerance ›Reliable client-server distributed systems » Server replicated across group of replica machines ›General operations ›Bounded number f of Byzantine replicas ›Must ensure correct system state » Consistent ordering of client operations
3
State of the Art ›Approaches: » State Machine Replication – BFT 3f+1 replicas » Byzantine Quorums – Q/U 5f+1 replicas Increased performance Degradation when writes contend
4
Contributions ›Low overhead Byzantine Fault Tolerance » Performance of Byzantine Quorums without 5f+1 replicas or contention degradation ›Hybrid Quorum scheme for Byzantine Fault Tolerance » Quorum approach in normal-case » Use Byzantine agreement to resolve write contention
5
Outline ›Current Approaches ›HQ Replication ›BFT Improvements ›Performance Evaluation ›Conclusions
6
State Machine Replication ›BFT - Castro and Liskov TOCS ’02 » Operations ordered by primary » Agreed upon by replicas Client Primary Replica 2 Replica 3 Replica 4 RequestPre-PreparePrepareCommitReply
7
Byzantine Quorums ›Q/U - Abd-El-Malek et al. SOSP ’05 ›Client controlled protocol » Replicas order operations independently ›Optimistic » Best case one-phase protocol » Worst case unbounded Randomized backoff Client Replica 1 Replica 2 Replica 3 Replica 4 Replica 5 UpdateReply Replica 6
8
Advantages/Disadvantages BFT ›Good » 3f+1 replicas » Bounded number of phases ›Bad » Higher latency » Quadratic communication Q/U ›Good » Best-case performance One-phase write Low replica load ›Bad » 5f+1 replicas » Degraded performance when writes contend
9
Outline ›Current Approaches ›HQ Replication » Normal-case Protocol » Contention Resolution ›BFT Improvements ›Performance Evaluation ›Conclusions
10
HQ Replication ›3f+1 replicas ›Supports general operations ›No all-to-all communication in normal- case ›BFT used to resolve contention
11
HQ Replication Client Replica 1 Replica 2 Replica 3 Replica 4 Write1Write1 OKWrite2Write2 OK ›One-phase read ›Two-phase write
12
System Architecture (remove this?)
13
High-level Write Protocol ›Two-phase write protocol ›Phase 1: » Client obtains timestamp grant from each replica ›Phase 2: » Client forms certificate from 2f+1 matching grants » Sends to replicas to complete write
14
Grants ›Promise to execute operation at given sequence number » Assuming agreement from quorum ›Grant » Client ID » Object ID » Hash over requested operation » Sequence Number (timestamp) » Replica signature
15
Certificates ›Certificate » Quorum (2f+1) matching grants ›Proves quorum of replicas agree to ordering of operation » Uniquely identify client, operation and sequential ordering » Existence of certificate precludes existence of conflicting certificate
16
Replica State ›Multiple independent objects ›State per-object » Certificate supporting most recent write » Operation status Active –Write in progress, outstanding grant Quiescent –No current write operation
17
Write Phase 1 ›Client sends write request to replicas » If quiescent, replica assigns new grant to client » If active, replica sends currently outstanding grant ›Several Possibilities » All grants match » Grants for different client » Grants conflict
18
Isolated Write
19
client 1replica 1replica 2replica 3 State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant
20
Isolated Write client 1replica 1replica 2replica 3 State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant Write A
21
Isolated Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant Write A
22
Isolated Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant Grant 1 Grant 2 Grant 3
23
Isolated Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant Matching grants: Phase 2 write Grant 1 Grant 2 Grant 3
24
Isolated Write client 1replica 1replica 2replica 3 Cert {G 1,G 2,G 3 } State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant Matching grants: Phase 2 write
25
Isolated Write client 1replica 1replica 2replica 3 execute A Cert {G 1,G 2,G 3 }
26
Isolated Write client 1replica 1replica 2replica 3 State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant Result A
27
Isolated Write client 1replica 1replica 2replica 3 State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant result Write Complete Result A
28
Incomplete Write
29
client 1replica 1replica 2replica 3 State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant client 2
30
Incomplete Write client 1replica 1replica 2replica 3 State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant client 2 Write A
31
Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Write A
32
Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Grant 1 Grant 2 Grant 3
33
Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Client 1 slow or failed Grant 1 Grant 2 Grant 3
34
Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Write B
35
Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Grant 1 Grant 2 Grant 3 Replicas active: Return current grant
36
Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Grants for different client: Perform Writeback Grant 1 Grant 2 Grant 3
37
Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Cert {G 1,G 2,G 3 }, Write B Grants for different client: Perform Writeback
38
Incomplete Write client 1replica 1replica 2replica 3 client 2 execute A Cert {G 1,G 2,G 3 }, Write B
39
Incomplete Write client 1replica 1replica 2replica 3 State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant client 2 Cert {G 1,G 2,G 3 }, Write B
40
Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 2 Seq No: 2 Operation: B Grant State: Active Client: 2 Seq No: 2 Operation: B Grant State: Active Client: 2 Seq No: 2 Operation: B Grant client 2 Grant 1 Grant 2 Grant 3
41
Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 2 Seq No: 2 Operation: B Grant State: Active Client: 2 Seq No: 2 Operation: B Grant State: Active Client: 2 Seq No: 2 Operation: B Grant client 2 Matching grants: Phase 2 write Grant 1 Grant 2 Grant 3
42
Write Contention
43
client 1replica 1replica 2replica 3 State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant client 2 Write A
44
Write Contention client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant client 2 Write A
45
Write Contention client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant client 2 Write A Write B Write A
46
Write Contention client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 2 Seq No: 1 Operation: B Grant client 2 Write A Write B Write A
47
Write Contention client 1replica 1replica 2replica 3 client 2 Grant 1 Grant 2 Grant 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 2 Seq No: 1 Operation: B Grant
48
Write Contention client 1replica 1replica 2replica 3 client 2 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 2 Seq No: 1 Operation: B Grant Conflicting grants: Request resolution Grant 1 Grant 2 Grant 3
49
Write Contention client 1replica 1replica 2replica 3 client 2 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 2 Seq No: 1 Operation: B Grant Cert {G 1,G 2,G 3 } Conflicting grants: Request resolution Resolve Request
50
Write Contention client 1replica 1replica 2replica 3 client 2 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 2 Seq No: 1 Operation: B Grant Contention Resolution Cert {G 1,G 2,G 3 } Resolve Request
51
Write Contention client 1replica 1replica 2replica 3 client 2 execute A Cert {G 1,G 2,G 3 } Resolve Request
52
Write Contention client 1replica 1replica 2replica 3 client 2 execute B Cert {G 1,G 2,G 3 } Resolve Request
53
Write Contention client 1replica 1replica 2replica 3 State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant client 2 Result A
54
Write Contention client 1replica 1replica 2replica 3 State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant client 2 result Result A
55
Write Contention client 1replica 1replica 2replica 3 State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant client 2 Result B
56
Write Contention client 1replica 1replica 2replica 3 State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant client 2 result Result B
57
Contention Resolution ›BFT module used to resolve contention » Establish sequential order on contending ops ›On receiving resolve request: » Freeze local object state » Send state to primary ›Primary runs BFT on combined state ›Replicas execute contending operations
58
Read Protocol ›Client sends read request to replicas ›Replica returns current object state » Supported by previous write certificate ›Read complete if quorum of matching responses » Writeback used to retry if responses inconsistent
59
Additional Details ›Read protocol ›State transfer ›Multi-object transactions ›Performance enhancements
60
Performance Enhancements ›Preferred quorums » Core protocol run by only 2f+1 replicas ›Symmetric-key cryptography » Authenticators instead of signatures Collection of 3f+1 MACs » Lower CPU overhead
61
BFT Improvements ›Preferred quorums » Reduces degree of quadratic communication ›Single MAC per message » Significant improvements over authenticators
62
Outline ›Current Approaches ›HQ Replication ›BFT Improvements ›Performance Evaluation » Analysis » Experiments ›Conclusions
63
Non-Contention Message Overhead Messages sent/received at each replica per write request
64
Non-Contention Bandwidth Use Total bandwidth at each replica per write request
65
Experimental Setup ›HQ and BFT prototypes deployed on Emulab » Up to 16 replicas (f=5), 200 clients (4 per machine) ›New BFT codebase ›Implement counter service » Negligible operation payload » Multiple objects Private non-contention objects Shared contention object
66
Non-contention Throughput Maximum operation throughput
67
Resilience to Contention Throughput degradation with increasing write-contention
68
Resilience to Contention Throughput degradation with increasing write-contention new
69
BFT Batching ›BFT allows batching at primary ›Greatly reduces internal protocol communication ›Increased delay Client Primary Replica 1 Replica 2 Replica 3 RequestPre-PreparePrepareCommitReply once per batch
70
Batched Performance Effect of BFT batching on maximum write throughput
71
Recommendations ›Use Q/U when » Latency critical » Contention low » 5f+1 replicas acceptable ›Use HQ when » Low latency important » Moderate contention ›Use BFT when » Contention high » Throughput more important than latency
72
Conclusions ›First Byzantine Quorum protocol with 3f+1 replicas » Supports general operations » Resilient to Byzantine clients ›Introduced Hybrid technique » Resolve contention without performance degradation » Applicable to general quorum systems ›Found optimized BFT to perform well under high load
73
Questions?
74
Further Details ›HQ Replication: Properties and optimizations » James Cowling, Daniel Myers, Barbara Liskov, Rodrigo Rodrigues and Liuba Shrira. Technical Memo In Prep., MIT Computer Science and Artificial Laboratory, Cambridge, Massachusetts, 2006. ›Contact: » cowling@csail.mit.edu » http://people.csail.mit.edu/cowling/
75
Write-back Operation ›Write certificate paired with a subsequent request ›Used to ensure progress with slow replicas or clients » Completes phase 2 for a slow client » Advances state of slow replicas ›Replica processes write phase 2 based on certificate, then the paired request
77
Backups…
78
Slow Replicas ›Some grants in quorum have old timestamp ›Perform writeback to slow replicas, using certificate provided with highest grant » Brings replicas up to date and solicits new grants
79
Why 3f+1? ›3f+1 replicas » f of which can be faulty ›2f+1 agree on any ordering » f of these may be Byzantine » The remaining f may be slow ›Maximum of 2f can respond with old system state, but not 2f+1
80
›Won’t HQ have a higher rate of contention since it’s two phase (higher latency) than Q/U? » No – contention window only between first replica receives phase 1 request to last replica receives it. Hence independent of two-phase, and actually smaller than in Q/U
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.