Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 Benjamin Wester1, James Cowling2, Edmund B. Nightingale3, Peter M. Chen1, Jason Flinn1, Barbara Liskov2 University of Michigan1, MIT CSAIL2, Microsoft Research3
Simple Service Configuration 1 x=1 x=0 ++x NSDI'09 Benjamin Wester University of Michigan CSE
Replicated State Machines (RSM) 2 x=1 x=2 ++x 2 x=1 ++x 2 Agree on request All non-faulty replies are identical 2 NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE RSMs have high latency x=2 2 2 2 Need many replies Agreement Geographic Distribution NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE Hide the Latency Use speculative execution inside RSM Speculate before consensus is reached Without faults, any reply predicts consensus value Let client continue after receiving one reply NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE Overview Introduction Improving RSMs with speculation Application to PBFT Performance Conclusion NSDI'09 Benjamin Wester University of Michigan CSE
Speculative Execution in RSM Take Checkpoint Predict: 1 Speculate! Blocked Commit Rollback x=1 x=2 x=1 Continue processing while waiting NSDI'09 Benjamin Wester University of Michigan CSE
Critical path: first reply 1 Completion latency less relevant First reply latency sets critical path Speed Accuracy Other desirable properties Throughput Stability under contention Smaller number of replicas NSDI'09 Benjamin Wester University of Michigan CSE
Requests while speculative while !check_lottery(): submit_tps() buy_corvette() Predict win? = yes yes buy win? What do we do with this? Hold request Bad performance Distributed commit/rollback State tracking complex NSDI'09 Benjamin Wester University of Michigan CSE
Resolve speculations on the replicas while !check_lottery(): submit_tps() buy_corvette() Predict win? = yes win? = yes buy = keys yes keys yes buy if win?=yes: buy buy win? Explicitly encode dependencies as predicates No special request handling needed Replicas need to log past replies Local decision at replicas matches client NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE Overview Introduction Improving RSMs with speculation Application to PBFT Performance Conclusion NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE Practical BFT -CS [Castro and Liskov 1999] client primary f=1 NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE Additional Details Tentative execution PBFT/PBFT-CS complete in 4 phases Read-only optimization Accurate answer from backup replica Failure threshold Bound worst-case failure Correctness NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE Overview Introduction Improving RSMs with speculation Application to PBFT Performance Conclusion NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE Benchmarks Shared counter Simple checkpoint No computation NFS: Apache httpd build Complex checkpoint Significant computation NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE Topology Primary-local Primary-remote Uniform 2.5 or 15 ms Primary NSDI'09 Benjamin Wester University of Michigan CSE
Base case: no replication Primary-local Primary-remote Uniform 2.5 or 15 ms NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE Shared Counter Primary-local topology NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE Shared Counter Primary-local topology [Kotla et al. 07] NSDI'09 Benjamin Wester University of Michigan CSE
Shared Counter Uniform & Primary-remote topology NSDI'09 Benjamin Wester University of Michigan CSE
Shared Counter Uniform & Primary-remote topology NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE NFS: Apache build Primary-local topology NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE NFS: Apache build Uniform topology NSDI'09 Benjamin Wester University of Michigan CSE
NFS: Apache build Primary-remote topology NSDI'09 Benjamin Wester University of Michigan CSE
Benjamin Wester University of Michigan CSE NFS: With Failure Primary-local topology (1% fail) NSDI'09 Benjamin Wester University of Michigan CSE
Throughput (Shared Counter) LAN topology NSDI'09 Benjamin Wester University of Michigan CSE
Conclusion Makes WAN deployments more practical Integrate client speculation within RSMs Predicated requests: performance without complexity Clients less sensitive to latency between replicas 5x speedup over non-speculative protocol Makes WAN deployments more practical NSDI'09 Benjamin Wester University of Michigan CSE