Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 Benjamin Wester1, James Cowling2, Edmund B. Nightingale3, Peter.

Similar presentations


Presentation on theme: "Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 Benjamin Wester1, James Cowling2, Edmund B. Nightingale3, Peter."— Presentation transcript:

1 Tolerating Latency in Replicated State Machines through Client Speculation
April 22, 2009 Benjamin Wester1, James Cowling2, Edmund B. Nightingale3, Peter M. Chen1, Jason Flinn1, Barbara Liskov2 University of Michigan1, MIT CSAIL2, Microsoft Research3

2 Simple Service Configuration
1 x=1 x=0 ++x NSDI'09 Benjamin Wester University of Michigan CSE

3 Replicated State Machines (RSM)
2 x=1 x=2 ++x 2 x=1 ++x 2 Agree on request All non-faulty replies are identical 2 NSDI'09 Benjamin Wester University of Michigan CSE

4 Benjamin Wester University of Michigan CSE
RSMs have high latency x=2 2 2 2 Need many replies Agreement Geographic Distribution NSDI'09 Benjamin Wester University of Michigan CSE

5 Benjamin Wester University of Michigan CSE
Hide the Latency Use speculative execution inside RSM Speculate before consensus is reached Without faults, any reply predicts consensus value Let client continue after receiving one reply NSDI'09 Benjamin Wester University of Michigan CSE

6 Benjamin Wester University of Michigan CSE
Overview Introduction Improving RSMs with speculation Application to PBFT Performance Conclusion NSDI'09 Benjamin Wester University of Michigan CSE

7 Speculative Execution in RSM
Take Checkpoint Predict: 1 Speculate! Blocked Commit Rollback x=1 x=2 x=1 Continue processing while waiting NSDI'09 Benjamin Wester University of Michigan CSE

8 Critical path: first reply
1 Completion latency less relevant First reply latency sets critical path Speed Accuracy Other desirable properties Throughput Stability under contention Smaller number of replicas NSDI'09 Benjamin Wester University of Michigan CSE

9 Requests while speculative
while !check_lottery(): submit_tps() buy_corvette() Predict win? = yes yes buy win? What do we do with this? Hold request Bad performance Distributed commit/rollback State tracking complex NSDI'09 Benjamin Wester University of Michigan CSE

10 Resolve speculations on the replicas
while !check_lottery(): submit_tps() buy_corvette() Predict win? = yes win? = yes buy = keys yes keys yes buy if win?=yes: buy buy win? Explicitly encode dependencies as predicates No special request handling needed Replicas need to log past replies Local decision at replicas matches client NSDI'09 Benjamin Wester University of Michigan CSE

11 Benjamin Wester University of Michigan CSE
Overview Introduction Improving RSMs with speculation Application to PBFT Performance Conclusion NSDI'09 Benjamin Wester University of Michigan CSE

12 Benjamin Wester University of Michigan CSE
Practical BFT -CS [Castro and Liskov 1999] client primary f=1 NSDI'09 Benjamin Wester University of Michigan CSE

13 Benjamin Wester University of Michigan CSE
Additional Details Tentative execution PBFT/PBFT-CS complete in 4 phases Read-only optimization Accurate answer from backup replica Failure threshold Bound worst-case failure Correctness NSDI'09 Benjamin Wester University of Michigan CSE

14 Benjamin Wester University of Michigan CSE
Overview Introduction Improving RSMs with speculation Application to PBFT Performance Conclusion NSDI'09 Benjamin Wester University of Michigan CSE

15 Benjamin Wester University of Michigan CSE
Benchmarks Shared counter Simple checkpoint No computation NFS: Apache httpd build Complex checkpoint Significant computation NSDI'09 Benjamin Wester University of Michigan CSE

16 Benjamin Wester University of Michigan CSE
Topology Primary-local Primary-remote Uniform 2.5 or 15 ms Primary NSDI'09 Benjamin Wester University of Michigan CSE

17 Base case: no replication
Primary-local Primary-remote Uniform 2.5 or 15 ms NSDI'09 Benjamin Wester University of Michigan CSE

18 Benjamin Wester University of Michigan CSE
Shared Counter Primary-local topology NSDI'09 Benjamin Wester University of Michigan CSE

19 Benjamin Wester University of Michigan CSE
Shared Counter Primary-local topology [Kotla et al. 07] NSDI'09 Benjamin Wester University of Michigan CSE

20 Shared Counter Uniform & Primary-remote topology NSDI'09
Benjamin Wester University of Michigan CSE

21 Shared Counter Uniform & Primary-remote topology NSDI'09
Benjamin Wester University of Michigan CSE

22 Benjamin Wester University of Michigan CSE
NFS: Apache build Primary-local topology NSDI'09 Benjamin Wester University of Michigan CSE

23 Benjamin Wester University of Michigan CSE
NFS: Apache build Uniform topology NSDI'09 Benjamin Wester University of Michigan CSE

24 NFS: Apache build Primary-remote topology NSDI'09
Benjamin Wester University of Michigan CSE

25 Benjamin Wester University of Michigan CSE
NFS: With Failure Primary-local topology (1% fail) NSDI'09 Benjamin Wester University of Michigan CSE

26 Throughput (Shared Counter)
LAN topology NSDI'09 Benjamin Wester University of Michigan CSE

27 Conclusion Makes WAN deployments more practical
Integrate client speculation within RSMs Predicated requests: performance without complexity Clients less sensitive to latency between replicas 5x speedup over non-speculative protocol Makes WAN deployments more practical NSDI'09 Benjamin Wester University of Michigan CSE


Download ppt "Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 Benjamin Wester1, James Cowling2, Edmund B. Nightingale3, Peter."

Similar presentations


Ads by Google