Download presentation
Presentation is loading. Please wait.
Published byKatharina Walter Modified over 6 years ago
1
Prophecy: Using History for High-Throughput Fault Tolerance
Siddhartha Sen Joint work with Wyatt Lloyd and Mike Freedman Princeton University
2
Non-crash failures happen
Model as Byzantine (malicious) It’s hard to model non-crash failures. One way we can model them… Independence
3
Mask Byzantine faults Service Clients
Given this failure model, we then try to mask the failure so the client doesn’t see it. Clients Service
4
Mask Byzantine faults Throughput Clients Replicated service
5
Mask Byzantine faults Linearizability (strong consistency) Clients
Throughput Linearizability (strong consistency) Clients Replicated service
6
Byzantine fault tolerance (BFT)
Low throughput Modifies clients Long-lived sessions
7
Prophecy High throughput + good consistency No free lunch:
Read-mostly workloads Slightly weakened consistency
8
Byzantine fault tolerance (BFT)
Low throughput Modifies clients Long-lived sessions D-Prophecy Prophecy
9
Traditional BFT reads application Agree? … Clients Replica Group
10
A cache solution cache application Agree? … Clients Replica Group
11
A cache solution … Problems: Huge cache Invalidation Clients
application Problems: Huge cache Invalidation Agree? … Your cache becomes as large as your disk Clients Replica Group
12
A compact cache … Clients Replica Group Requests Responses req1 resp1
application Requests Responses req1 resp1 req2 resp2 req3 resp3 … … Clients … Replica Group
13
A compact cache … Clients Replica Group Requests Responses Requests
application Requests Responses Requests Responses sketch(req1) sketch(resp1) sketch(req2) sketch(resp2) sketch(req3) sketch(resp3) … … Clients … Replica Group
14
A sketcher … Clients Replica Group sketcher application
That solves the big cache problem. To solve the invalidation problem, we will just make sure to get the full response from one replica. Clients Replica Group
15
A sketcher sketch webpage ………… ………… … Clients ………… Replica Group
16
Fast, load-balanced reads
Executing a read sketch webpage ………… Agree? Fast, load-balanced reads ………… ………… … END: Suppose the responses don’t match… how could this happen? Clients ………… Replica Group
17
Executing a read … Clients Replica Group sketch webpage Agree? …………
18
Executing a read … key-value store replicated state machine Clients
sketch webpage key-value store ………… replicated state machine ………… … Clients ………… Replica Group
19
Executing a read … Maintain a fresh cache Clients Replica Group sketch
webpage Agree? ………… Maintain a fresh cache ………… ………… … Clients ………… Replica Group
20
Did we achieve linearizability?
NO!
21
Executing a read … Clients Replica Group sketch webpage ………… ………… …………
22
Executing a read … Clients Replica Group sketch webpage Agree? …………
23
Executing a read … Fast reads may be stale Clients Replica Group
sketch webpage ………… Agree? Fast reads may be stale ………… ………… … Clients ………… Replica Group
24
Load balancing … Pr(k stale) = gk Clients Replica Group sketch webpage
………… ………… Agree? Pr(k stale) = gk ………… … Clients ………… Replica Group
25
D-Prophecy vs. BFT … Traditional BFT: D-Prophecy:
Each replica executes read Linearizability D-Prophecy: One replica executes read “Delay-once” linearizability Clients Replica Group … We call this system D-Prophecy
26
Byzantine fault tolerance (BFT)
Low throughput Modifies clients Long-lived sessions D-Prophecy Prophecy We’ve solved the throughput problem for read requests at the cost of slightly weakened consistency But we still have two other problems, and these are bad for internet services
27
Key-exchange overhead
11% 3% Sessions containing only 8 requests, throughput is 1/10 of its maximum
28
Internet services … Clients Replica Group
So if we apply our current design to internet services, we are faced with the problems of high per-session overhead, and modified clients. We can solve both of these problems by introducing a proxy in between the client and replica group … Clients Replica Group
29
Consolidate sketchers
A proxy solution Consolidate sketchers Proxy Sketcher … Clients Replica Group
30
Sketcher must be fail-stop
A proxy solution Sketcher must be fail-stop Sketcher … Clients Trusted Replica Group
31
A proxy solution … Trust middlebox already Small and simple Clients
Sketcher must be fail-stop Trust middlebox already Small and simple Sketcher Our implementation required less than 3000 LOC … Clients Trusted Replica Group
32
Fast, load-balanced reads
Prophecy Executing a read Fast, load-balanced reads ………… ………… ………… ………… q Sketcher ………… … Clients Trusted Req Resp s(q) ………… Replica Group
33
Prophecy … Fast reads may be stale Clients Replica Group Sketcher Req
………… ………… ………… ………… ………… ………… Sketcher … Clients Trusted Req Resp s(q) ………… ………… Replica Group
34
Delay-once linearizability
No assumptions about organization of system state or consistency model of replica group
35
Delay-once linearizability
Read-after-write property W, R, W, W, R, R, W, R No assumptions about organization of system state or consistency model of replica group
36
Delay-once linearizability
Read-after-write property W, R, W, W, R, R, W, R No assumptions about organization of system state or consistency model of replica group
37
Example application Upload embarrassing photos Weak may reorder
1. Remove colleagues from ACL 2. Upload photos 3. (Refresh) Weak may reorder Delay-once preserves order Eventual consistency: updates may appear in different orders at different replicas
38
Byzantine fault tolerance (BFT)
Low throughput Modifies clients Long-lived sessions D-Prophecy Prophecy
39
Implementation Modified PBFT C++, Tamer async I/O
PBFT is stable, complete Competitive with Zyzzyva et. al. C++, Tamer async I/O Sketcher: 2000 LOC PBFT library: 1140 LOC PBFT client: 1000 LOC
40
Evaluation Prophecy vs. proxied-PBFT D-Prophecy vs. PBFT
Proxied systems D-Prophecy vs. PBFT Non-proxied systems
41
Evaluation Prophecy vs. proxied-PBFT We will study: Proxied systems
Performance on “null” workloads Performance with real replicated service Where system bottlenecks, how to scale
42
Basic setup Sketcher (concurrent) Clients (100) Replica Group (PBFT)
43
Fraction of failed fast reads
Alexa top sites: < 15% Fraction of failed fast reads If you have a transition ratio of 1, then every read is being preceded by a write. If you have a transition ratio of 0, then reads are only preceded by other reads.
44
Small benefit on null reads
45
Apache webserver setup
Sketcher Clients Replica Group
46
Large benefit on real workload
3.7x 2.0x Concurrent reads of a 1-byte webpage
47
Benefit grows with work
94s (Apache) Null workloads are misleading!
48
Benefit grows with work
49
Single sketcher bottlenecks
Concurrent reads of a 1-byte webpage
50
Scaling out First tier does load balancing, second tier does consistent hashing… no more complex than how you’d scale-out and partition a system today
51
Scales linearly with replicas
Concurrent reads of a 1-byte webpage
52
Summary Prophecy good for Internet services
Fast, load-balanced reads D-Prophecy good for traditional services Prophecy scales linearly while PBFT stays flat Limitations: Read-mostly workloads (meas. study corroborates) Delay-once linearizability (useful for many apps)
53
Thank You
54
Additional slides
55
Transitions Prophecy good for read-mostly workloads
Are transitions rare in practice?
56
Measurement study Alexa top sites
Access main page every 20 sec for 24 hrs
57
Mostly static content
58
Mostly static content 15%
59
Dynamic content Rabin fingerprinting on transitions
43% differ by single contiguous change Sampled 4000 of them, over half due to: Load balancing directives Random IDs in links, function parameters “change” = insertion/deletion/substitution (i.e. edit distance 1)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.