Presentation is loading. Please wait.

Presentation is loading. Please wait.

Space Bounds for Reliable Storage: Fundamental Limits of Coding Alexander Spiegelman Yuval Cassuto Gregory Chockler Idit Keidar 1.

Similar presentations


Presentation on theme: "Space Bounds for Reliable Storage: Fundamental Limits of Coding Alexander Spiegelman Yuval Cassuto Gregory Chockler Idit Keidar 1."— Presentation transcript:

1 Space Bounds for Reliable Storage: Fundamental Limits of Coding Alexander Spiegelman Yuval Cassuto Gregory Chockler Idit Keidar 1

2 2

3 3

4 4

5 5

6 Storage Blow-Up Demands are growing exponentially Replication is costly Erasure coding can help [Goodson et al. DSN 2004] [Aguilera et al. DSN 2005] [Cachin and Tessaro DSN 2006] [Hendricks et al. SOSP 2007] [Dutta et al. DISC 2008] [Cadambe et al. NCA 2014] But … within limits 6

7 Model 7 n servers f can fail Asynchronous RMWs

8 Distributed Storage: Space Bounds 8 Replication Coding O(D  c ) with c concurrent writes O(D  f ) bits Lower bound  (D  min(f,c)) Best-of-both algorithm O(D  min(f,c))

9 k-of-n Erasure Codes 9 encode

10 k-of-n Erasure Codes decode 10 encode

11 Register Emulation 11 decode

12 12 Write S1 S2 S3 S4

13 Write 13 Generate timestamp S1 S2 S3 S4

14 Write 14 Generate timestamp encode S1 S2 S3 S4

15 Write 15 Generate timestamp encode S1 S2 S3 S4

16 Write 16 Generate timestamp encode S1 S2 S3 S4

17 Write 17 S1 S2 S3 S4

18 Write 18 Wait for n-f replies S1 S2 S3 S4

19 Read 19 S1 S2 S3 S4

20 20 Read S1 S2 S3 S4

21 21 Read Wait for n-f replies S1 S2 S3 S4

22 Read decode 22 S1 S2 S3 S4

23 What about concurrent read and write? 23

24 24 Write S1 S2 S3 S4

25 25 Write S1 S2 S3 S4

26 26 Write Overwrite? S1 S2 S3 S4

27 27 Write Overwrite? Suppose yes, if TS is bigger S1 S2 S3 S4

28 28 Write S1 S2 S3 S4

29 29 Write Read S1 S2 S3 S4

30 30 Write Read S1 S2 S3 S4

31 31 Write Read No written value can be restored! S1 S2 S3 S4

32 What About Replication? 32

33 33 Write Read No problem!

34 34 Write Read No written value can be restored! S1 S2 S3 S4

35 Enough for safe register only 35 Write Read S1 S2 S3 S4

36 Let’s Try to Make It Regular Return a written value 36

37 37 S1 S2 S3 S4

38 38 S1 S2 S3 S4

39 39 Overwrite? S1 S2 S3 S4

40 40 Overwrite? No! S1 S2 S3 S4

41 41 S1 S2 S3 S4

42 42 S1 S2 S3 S4

43 43 What can be overwritten? S1 S2 S3 S4

44 44 Can green be overwritten? S1 S2 S3 S4

45 45 Can green be overwritten? No! It’s the only value that can be returned if S3 fails S1 S2 S3 S4

46 46 Yellow Can Yellow be overwritten? S1 S2 S3 S4

47 47 S1 S2 S3 S4

48 48 S1 S2 S3 S4

49 49 S1 S2 S3 S4

50 50 S1 S2 S3 S4

51 51 S1 S2 S3 S4

52 52 S1 S2 S3 S4

53 53 Read cannot be restored! S1 S2 S3 S4

54 54 Read cannot be restored! Regularity violation S1 S2 S3 S4

55 55 What can be overwritten? Nothing! S1 S2 S3 S4

56 56 S1 S2 S3 S4

57 57 S1 S2 S3 S4

58 58 S1 S2 S3 S4

59 Ok, so the standard algorithm can use loads of storage But… 59

60 Inherent 60

61 Inherent 61 For asynchronous algorithms that store coded data

62 Distributed Storage: Space Bounds 62 Replication Coding O(D  c ) with c concurrent writes O(D  f ) bits Lower bound  (D  min(f,c)) Best-of-both algorithm O(D  min(f,c))

63 But First: More About The Model Black-box encoding Arbitrary encoding scheme Storage holds: – Coded blocks – Unbounded data-independent meta-data 63 encode independent of other values

64 Observation Every data bit in the storage can be associated with a unique written value – Given our storage model (black-box encoding) 64

65 Storage Complexity Storage measured in bits Data only (neglect meta-data) Clients write values from some domain V – D=log 2 |V| Count bits associated with value v  V stored in servers and other clients 65

66 Need D=log 2 |V| Bits To Read A Value For most values in V Pigeonhole Argument 66

67 Theorem Every regular lock-free MWMR register emulation – Tolerating f failures – Allowing c concurrent writes – Storing values from domain of size 2 D Needs to store  (D  min(f,c)) bits – At some point 67

68 Adversary Ad We’ll define a particular adversary structure – Controls scheduling – Prevents progress – Blows up the storage Defined with a parameter 0 < l < D 68

69 69

70 70 C + (t) clients with pending writes that store at least D- l bits at time t

71 71 C + (t) C - (t) Other clients with pending writes

72 C + and C - C + have at least D- l bits in the storage C - have less than D- l bits in the storage – Cannot be read – l bits are missing 72

73 73 C + (t) C - (t)

74 74 C + (t) C - (t) D- l

75 75 C + (t) C - (t) D- l

76 76 C + (t) C - (t) D- l

77 77 C + (t) C - (t) D- l

78 78 C + (t) C - (t) D- l D

79 79 C + (t) C - (t) D D- l

80 80 C + (t) C - (t) l D- l D

81 F(t) Servers that store at least l bits at time t 81 C + (t) C - (t) l D- l D

82 82 Storage Size F(t) C + (t)C - (t) At least (D- l )  |C + (t)| bits l D- l D

83 83 Storage Size At least (D- l )  |C + (t)| bits F(t) C + (t)C - (t) l D- l D

84 Ad’s Scheduling Let clients trigger RMWs on servers Allow pending RMWs to take effect and return 84 l D- l

85 Ad’s Scheduling 85 C + (t) C - (t) F(t) freeze F: no RMWs take effect l D- l

86 Ad’s Scheduling 86 C + (t) C - (t) F(t) delay C + : no RMWs take effect l D- l

87 Ad’s Scheduling 87 C + (t) C - (t) F(t) l D- l

88 Ad’s Scheduling 88 Rule 1: choose the longest pending RMW on a server not in F(t) by a client in C - (t) F(t) C + (t) C - (t) l D- l

89 Ad’s Scheduling 89 C - (t+1) C + (t+1) F(t+1) l D- l

90 Ad’s Scheduling 90 When no such RMWs C - (t+1) C + (t+1) F(t+1) l D- l

91 Ad’s Scheduling 91 Rule 2: choose in round robin order a client that wants to trigger an RMW C - (t+1) C + (t+1) F(t+1) l D- l

92 Ad’s Scheduling 92 C - (t+2) C + (t+2) F(t+2) l D- l

93 Ad’s Scheduling 93 C - (t+3) C + (t+3) F(t+3) l D- l

94 Ad’s Scheduling 94 C - (t+4) C + (t+4) F(t+4) l D- l

95 Ad’s Scheduling 95 C - (t+5) C + (t+5) F(t+5) l D- l

96 Ad’s Scheduling 96 C - (t+6) C + (t+6) F(t+6) l D- l

97 Implications of Ad F only grows – Servers in F are “frozen” Clients can move out of C + – If their blocks are overwritten Values written by clients in C - cannot be read – Each such client stores less than D- l bits We will next show that no client can complete a write operation 97

98 Every set of n-f servers must store D bits of some pending write for a write to return Observation 98

99 Observation 99 Otherwise

100 Observation 100 Read No value can be restored!

101 Observation 101 Read No value can be restored! Regularity violation

102 Lemma |F(t)|=1=f 102 l

103 Lemma Proof 103 assume v can be read write(v) RMW to server S responds time if RMW writes l bits to S, S is added to F and v cannot be restored without servers in F write(v) is in C - l bits are missing otherwise, at least one bit is still missing Contradiction!

104 Corollary : Lemma + Observation 104 |F(t)|=1=f l

105 Corollary : Lemma + Observation Lock-freedom violation 105 |F(t)|=1=f l

106 Are We Done? No! The adversary is not fair We need to guarantee:  every correct client gets infinitely many opportunities to trigger RMWs  every RMW triggered by a correct client on a correct server eventually takes effect 106

107 Not Fair! RMWs on servers in F(t) never take effect from time t onward RMWs by clients that remain in C + from some point onward never take effect – Call these frozen clients 107

108 Theorem Proof 108

109 109 Storage Size Recalled At least (D- l )  |C + (t)| bits F(t) C + (t)C - (t) l D- l D

110 Theorem Proof (Continued) 110

111 Constructing a Fair Run (Sketch) Run r with Ad where some client is not frozen – No write completes in r Let r’ be an identical run to r, but in which we kill all the frozen clients and servers in F r’ is fair with at least one correct process – By lock-freedom, some write completes in r’ r and r’ are indistinguishable to all correct clients – Therefore, some write completes in r 111

112 Constructing a Fair Run (Sketch) Let r be an infinite run with Ad – No write completes in r – Therefore, some write completes in r 112 Contradiction

113 Constructing a Fair Run (Sketch) Let r be an infinite run with Ad – No write completes in r – Therefore, some write completes in r 113 Details in the paper

114 Theorem (Restated) Every regular lock-free MWMR register emulation – Tolerating f failures – Allowing C concurrent writes – Storing values from domain of size 2 D Needs to store  (D  min(f,c)) bits – At some point 114

115 Distributed Storage: Space Bounds 115 Replication Coding O(D  c ) with c concurrent writes O(D  f ) bits Lower bound  (D  min(f,c)) Best-of-both algorithm O(D  min(f,c))

116 Wrap Up We illustrated erasure coded storage – Storage-efficient safe wait-free register emulation – Regular with unbounded storage We proved a fundamental limit of coding –  (D  min(f,c)) bits for regular lock-free register – Replication is the best solution in worst case Why not enjoy both worlds? 116

117 Adaptive Algorithm Replication storage cost: nD Coding with k,n = O(f) storage cost: (c+1)(D/k)n We combine both approaches Storage cost: min(2nD, (c+1)(D/k)n) = O(D  min(f,c)) 117

118 Adaptive Algorithm Replication storage cost: nD Coding with k,n = O(f) storage cost: (c+1)(D/k)n We combine both approaches Storage cost: min(2nD, (c+1)(D/k)n) = O(D  min(f,c)) 118 Details in the paper

119 Wrap Up We illustrated erasure coded storage – Storage-efficient safe wait-free register emulation – Regular with unbounded storage We proved a fundamental limit of coding –  (D  min(f,c)) bits for regular lock-free register – Replication is the best solution in worst case Enjoy both worlds 119


Download ppt "Space Bounds for Reliable Storage: Fundamental Limits of Coding Alexander Spiegelman Yuval Cassuto Gregory Chockler Idit Keidar 1."

Similar presentations


Ads by Google