Download presentation
Presentation is loading. Please wait.
Published byRandall Jenkins Modified over 8 years ago
1
Space Bounds for Reliable Storage: Fundamental Limits of Coding Alexander Spiegelman Yuval Cassuto Gregory Chockler Idit Keidar 1
2
2
3
3
4
4
5
5
6
Storage Blow-Up Demands are growing exponentially Replication is costly Erasure coding can help [Goodson et al. DSN 2004] [Aguilera et al. DSN 2005] [Cachin and Tessaro DSN 2006] [Hendricks et al. SOSP 2007] [Dutta et al. DISC 2008] [Cadambe et al. NCA 2014] But … within limits 6
7
Model 7 n servers f can fail Asynchronous RMWs
8
Distributed Storage: Space Bounds 8 Replication Coding O(D c ) with c concurrent writes O(D f ) bits Lower bound (D min(f,c)) Best-of-both algorithm O(D min(f,c))
9
k-of-n Erasure Codes 9 encode
10
k-of-n Erasure Codes decode 10 encode
11
Register Emulation 11 decode
12
12 Write S1 S2 S3 S4
13
Write 13 Generate timestamp S1 S2 S3 S4
14
Write 14 Generate timestamp encode S1 S2 S3 S4
15
Write 15 Generate timestamp encode S1 S2 S3 S4
16
Write 16 Generate timestamp encode S1 S2 S3 S4
17
Write 17 S1 S2 S3 S4
18
Write 18 Wait for n-f replies S1 S2 S3 S4
19
Read 19 S1 S2 S3 S4
20
20 Read S1 S2 S3 S4
21
21 Read Wait for n-f replies S1 S2 S3 S4
22
Read decode 22 S1 S2 S3 S4
23
What about concurrent read and write? 23
24
24 Write S1 S2 S3 S4
25
25 Write S1 S2 S3 S4
26
26 Write Overwrite? S1 S2 S3 S4
27
27 Write Overwrite? Suppose yes, if TS is bigger S1 S2 S3 S4
28
28 Write S1 S2 S3 S4
29
29 Write Read S1 S2 S3 S4
30
30 Write Read S1 S2 S3 S4
31
31 Write Read No written value can be restored! S1 S2 S3 S4
32
What About Replication? 32
33
33 Write Read No problem!
34
34 Write Read No written value can be restored! S1 S2 S3 S4
35
Enough for safe register only 35 Write Read S1 S2 S3 S4
36
Let’s Try to Make It Regular Return a written value 36
37
37 S1 S2 S3 S4
38
38 S1 S2 S3 S4
39
39 Overwrite? S1 S2 S3 S4
40
40 Overwrite? No! S1 S2 S3 S4
41
41 S1 S2 S3 S4
42
42 S1 S2 S3 S4
43
43 What can be overwritten? S1 S2 S3 S4
44
44 Can green be overwritten? S1 S2 S3 S4
45
45 Can green be overwritten? No! It’s the only value that can be returned if S3 fails S1 S2 S3 S4
46
46 Yellow Can Yellow be overwritten? S1 S2 S3 S4
47
47 S1 S2 S3 S4
48
48 S1 S2 S3 S4
49
49 S1 S2 S3 S4
50
50 S1 S2 S3 S4
51
51 S1 S2 S3 S4
52
52 S1 S2 S3 S4
53
53 Read cannot be restored! S1 S2 S3 S4
54
54 Read cannot be restored! Regularity violation S1 S2 S3 S4
55
55 What can be overwritten? Nothing! S1 S2 S3 S4
56
56 S1 S2 S3 S4
57
57 S1 S2 S3 S4
58
58 S1 S2 S3 S4
59
Ok, so the standard algorithm can use loads of storage But… 59
60
Inherent 60
61
Inherent 61 For asynchronous algorithms that store coded data
62
Distributed Storage: Space Bounds 62 Replication Coding O(D c ) with c concurrent writes O(D f ) bits Lower bound (D min(f,c)) Best-of-both algorithm O(D min(f,c))
63
But First: More About The Model Black-box encoding Arbitrary encoding scheme Storage holds: – Coded blocks – Unbounded data-independent meta-data 63 encode independent of other values
64
Observation Every data bit in the storage can be associated with a unique written value – Given our storage model (black-box encoding) 64
65
Storage Complexity Storage measured in bits Data only (neglect meta-data) Clients write values from some domain V – D=log 2 |V| Count bits associated with value v V stored in servers and other clients 65
66
Need D=log 2 |V| Bits To Read A Value For most values in V Pigeonhole Argument 66
67
Theorem Every regular lock-free MWMR register emulation – Tolerating f failures – Allowing c concurrent writes – Storing values from domain of size 2 D Needs to store (D min(f,c)) bits – At some point 67
68
Adversary Ad We’ll define a particular adversary structure – Controls scheduling – Prevents progress – Blows up the storage Defined with a parameter 0 < l < D 68
69
69
70
70 C + (t) clients with pending writes that store at least D- l bits at time t
71
71 C + (t) C - (t) Other clients with pending writes
72
C + and C - C + have at least D- l bits in the storage C - have less than D- l bits in the storage – Cannot be read – l bits are missing 72
73
73 C + (t) C - (t)
74
74 C + (t) C - (t) D- l
75
75 C + (t) C - (t) D- l
76
76 C + (t) C - (t) D- l
77
77 C + (t) C - (t) D- l
78
78 C + (t) C - (t) D- l D
79
79 C + (t) C - (t) D D- l
80
80 C + (t) C - (t) l D- l D
81
F(t) Servers that store at least l bits at time t 81 C + (t) C - (t) l D- l D
82
82 Storage Size F(t) C + (t)C - (t) At least (D- l ) |C + (t)| bits l D- l D
83
83 Storage Size At least (D- l ) |C + (t)| bits F(t) C + (t)C - (t) l D- l D
84
Ad’s Scheduling Let clients trigger RMWs on servers Allow pending RMWs to take effect and return 84 l D- l
85
Ad’s Scheduling 85 C + (t) C - (t) F(t) freeze F: no RMWs take effect l D- l
86
Ad’s Scheduling 86 C + (t) C - (t) F(t) delay C + : no RMWs take effect l D- l
87
Ad’s Scheduling 87 C + (t) C - (t) F(t) l D- l
88
Ad’s Scheduling 88 Rule 1: choose the longest pending RMW on a server not in F(t) by a client in C - (t) F(t) C + (t) C - (t) l D- l
89
Ad’s Scheduling 89 C - (t+1) C + (t+1) F(t+1) l D- l
90
Ad’s Scheduling 90 When no such RMWs C - (t+1) C + (t+1) F(t+1) l D- l
91
Ad’s Scheduling 91 Rule 2: choose in round robin order a client that wants to trigger an RMW C - (t+1) C + (t+1) F(t+1) l D- l
92
Ad’s Scheduling 92 C - (t+2) C + (t+2) F(t+2) l D- l
93
Ad’s Scheduling 93 C - (t+3) C + (t+3) F(t+3) l D- l
94
Ad’s Scheduling 94 C - (t+4) C + (t+4) F(t+4) l D- l
95
Ad’s Scheduling 95 C - (t+5) C + (t+5) F(t+5) l D- l
96
Ad’s Scheduling 96 C - (t+6) C + (t+6) F(t+6) l D- l
97
Implications of Ad F only grows – Servers in F are “frozen” Clients can move out of C + – If their blocks are overwritten Values written by clients in C - cannot be read – Each such client stores less than D- l bits We will next show that no client can complete a write operation 97
98
Every set of n-f servers must store D bits of some pending write for a write to return Observation 98
99
Observation 99 Otherwise
100
Observation 100 Read No value can be restored!
101
Observation 101 Read No value can be restored! Regularity violation
102
Lemma |F(t)|=1=f 102 l
103
Lemma Proof 103 assume v can be read write(v) RMW to server S responds time if RMW writes l bits to S, S is added to F and v cannot be restored without servers in F write(v) is in C - l bits are missing otherwise, at least one bit is still missing Contradiction!
104
Corollary : Lemma + Observation 104 |F(t)|=1=f l
105
Corollary : Lemma + Observation Lock-freedom violation 105 |F(t)|=1=f l
106
Are We Done? No! The adversary is not fair We need to guarantee: every correct client gets infinitely many opportunities to trigger RMWs every RMW triggered by a correct client on a correct server eventually takes effect 106
107
Not Fair! RMWs on servers in F(t) never take effect from time t onward RMWs by clients that remain in C + from some point onward never take effect – Call these frozen clients 107
108
Theorem Proof 108
109
109 Storage Size Recalled At least (D- l ) |C + (t)| bits F(t) C + (t)C - (t) l D- l D
110
Theorem Proof (Continued) 110
111
Constructing a Fair Run (Sketch) Run r with Ad where some client is not frozen – No write completes in r Let r’ be an identical run to r, but in which we kill all the frozen clients and servers in F r’ is fair with at least one correct process – By lock-freedom, some write completes in r’ r and r’ are indistinguishable to all correct clients – Therefore, some write completes in r 111
112
Constructing a Fair Run (Sketch) Let r be an infinite run with Ad – No write completes in r – Therefore, some write completes in r 112 Contradiction
113
Constructing a Fair Run (Sketch) Let r be an infinite run with Ad – No write completes in r – Therefore, some write completes in r 113 Details in the paper
114
Theorem (Restated) Every regular lock-free MWMR register emulation – Tolerating f failures – Allowing C concurrent writes – Storing values from domain of size 2 D Needs to store (D min(f,c)) bits – At some point 114
115
Distributed Storage: Space Bounds 115 Replication Coding O(D c ) with c concurrent writes O(D f ) bits Lower bound (D min(f,c)) Best-of-both algorithm O(D min(f,c))
116
Wrap Up We illustrated erasure coded storage – Storage-efficient safe wait-free register emulation – Regular with unbounded storage We proved a fundamental limit of coding – (D min(f,c)) bits for regular lock-free register – Replication is the best solution in worst case Why not enjoy both worlds? 116
117
Adaptive Algorithm Replication storage cost: nD Coding with k,n = O(f) storage cost: (c+1)(D/k)n We combine both approaches Storage cost: min(2nD, (c+1)(D/k)n) = O(D min(f,c)) 117
118
Adaptive Algorithm Replication storage cost: nD Coding with k,n = O(f) storage cost: (c+1)(D/k)n We combine both approaches Storage cost: min(2nD, (c+1)(D/k)n) = O(D min(f,c)) 118 Details in the paper
119
Wrap Up We illustrated erasure coded storage – Storage-efficient safe wait-free register emulation – Regular with unbounded storage We proved a fundamental limit of coding – (D min(f,c)) bits for regular lock-free register – Replication is the best solution in worst case Enjoy both worlds 119
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.