Download presentation
Presentation is loading. Please wait.
1
Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011
2
Multiple node failures Large-scale storage system – Google data center, example from Kannan’s talk. – 800000 servers, fail rate = 4% per year – Repair in 2 days – Mean number of failed servers in 2 days = 175. The lazy-repair policy in TotalRecall – A repair process is triggered only after the number of failed nodes has reached a certain threshold. Jul, 2011 2 kshum
3
Jointly repair multiple failures Jul, 2011 Hu et al. (JSAC, Feb 2010) 3 Can we further reduce the repair-bandwidth? Data exchange kshum Storage nodes Newcomers
4
Distributed storage (erasure coding) Jul, 2011 4 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 Data Collector Wu, Dimakis ISIT09 kshum
5
Naive Repair Jul, 2011 5 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 4 packets required. A1A2A1A2 B 1, B 2 A 1 +B 1, 2 A 1 +B 2 kshum
6
Repair with ``code alignment’’ Jul, 2011 6 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 A1A2A1A2 3 packets required. B 1 + B 2 A 1 +2 A 2 +B 1 + B 2 2 A 1 + A 2 +B 1 + B 2 Solve: P 1 = A 1 +2 A 2 P 2 = 2 A 1 + A 2 kshum
7
Multiple failures, separate repair Jul, 2011 7 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 8 packets in total 4 packets per newcomer B1B2B1B2 2 packets 2 A 1 +B 1 A 2 +B 2 2 packets kshum
8
Multiple failures, cooperative repair (I) Jul, 2011 8 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 6 packets in total 3 packets per newcomer A 1, A 2 2A 2 +B 2 A 1 +B 1 B 1,B 2 B1B2B1B2 2 A 1 +B 1 A 2 +B 2 kshum
9
Multiple failures, cooperative repair (II) Jul, 2011 9 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 6 packets in total 3 packets per newcomer A 1 +B 1 A1A1 A 1 A 1 +B 1 A2A2 2A 2 +B 2 A 2 2A 2 +B 2 B2B2 B2B2 2A 1 +B 1 A 2 +B 2 B1B1 kshum
10
Outline of the talk Is it optimal in terms of repair-bandwidth? What is the tradeoff between storage and repair-bandwidth for cooperative repair? Can we achieve the Pareto-optimal operating points on the tradeoff curve by linear network coding? – Exact repair – Functional repair Jul, 2011 10 kshum
11
In 2 Information flow graph Jul, 2011 11 S In 1 Out 1 Data Collector Out 2 In 3 Out 3 In 4 Out 4 In 5 Out 5 Out 6 Out 7 11 11 11 In 6 In 7 11 11 11 Mid 6 Mid 7 22 22 kshum
12
Is this regenerating code optimal ? Jul, 2011 12 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 6 packets in total 3 packets per newcomer A 1 +B 1 A1A1 A 1 A 1 +B 1 A2A2 2A 2 +B 2 A 2 2A 2 +B 2 B2B2 B2B2 2A 1 +B 1 A 2 +B 2 A1A1 kshum
13
In 2 First cut Jul, 2011 13 B In 1 Out 1 Data Collector Out 2 In 3 Out 3 In 4 Out 4 Out 6 Out 7 Mid 6 Mid 7 22 22 11 11 11 11 B 4 1 In 6 In 7 kshum
14
Second cut Jul, 2011 14 Out 1 Data Collector Out 2 Out 3 Out 4 2 Out 1 2 Out 2 Mid 1 Mid 2 22 22 11 11 11 11 Out 3 Out 4 Mid 3 Mid 4 22 22 In 1 In 2 In 3 In 4 11 11 B 2+ 1 + 2 kshum
15
A linear programming problem Minimize 2 1 + 2 (repair bandwidth) Subject to 4 4 1 4 2+ 1 + 2 1, 2 0 Jul, 2011 15 1 1 2 1 22 11 1 1 At least 3 packets kshum
16
In 2 Non-homogeneous download traffic Jul, 2011 16 B In 1 Out 1 Data Collector Out 2 In 3 Out 3 In 4 Out 4 Out 6 Out 7 Mid 6 Mid 7 22 22 aa dd cc bb B a + b + c + d In 6 In 7 kshum
17
Non-homogeneous traffic Jul, 2011 17 Out 1 Data Collector Out 2 Out 3 Out 4 2 Out 1 2 Out 2 Mid 1 Mid 2 22 22 11 11 11 11 Out 3 Out 4 Mid 3 Mid 4 ii jj In 1 In 2 In 3 In 4 hh ff ee ff gg B 2+ f + j kshum
18
Non-homogeneous traffic Jul, 2011 18 Out 1 Data Collector Out 2 Out 3 Out 4 2 Out 1 2 Out 2 Mid 1 Mid 2 22 22 11 11 11 11 Out 3 Out 4 Mid 3 Mid 4 ii jj In 1 In 2 In 3 In 4 hh ff ee ff gg B 2+ f + j B 2+ h + i kshum
19
Non-homogeneous traffic Jul, 2011 19 Out 1 Data Collector Out 2 Out 3 Out 4 2 Out 1 2 Out 2 Mid 1 Mid 2 22 22 11 11 11 11 Out 3 Out 4 Mid 3 Mid 4 ii jj In 1 In 2 In 3 In 4 hh ff ee ff gg B 2+ f + j B 2+ h + i B 2+ e + j kshum
20
Non-homogeneous traffic Jul, 2011 20 Out 1 Data Collector Out 2 Out 3 Out 4 2 Out 1 2 Out 2 Mid 1 Mid 2 22 22 11 11 11 11 Out 3 Out 4 Mid 3 Mid 4 ii jj In 1 In 2 In 3 In 4 hh ff ee ff gg B 2+ f + j B 2+ h + i B 2+ e + j B 2+ g + i kshum
21
The same LP problem Minimize Subject to Jul, 2011 21 1 1 At least 3 packets kshum
22
TRADEOFF BETWEEN STORAGE AND REPAIR-BANDWIDTH Jul, 2011 22 kshum
23
Storage vs Repair-bandwidth Jul, 2011 23 One-by-one repair Repairing 3 newcomers jointly File size = 420 d = 8 k = 4 d DC k kshum (S., ICC 2011, Kermarrec, Le Scouamec and Straub, Netcod 2011.)
24
Fair comparison? Jul, 2011 24 One-by-one repair repair degree = 8 Cooperative repair Surviving nodes Number of connections per each newcomer = 8 Number of connections per each newcomer = 8+2 kshum
25
MBCR and MSCR Jul, 2011 25 One-by-one repair Cooperative repair Minimum bandwidth cooperative repair (MBCR) Minimum storage cooperative repair (MSCR) kshum
26
How much can we improve? Jul, 2011 26 One-by-one repair Repairing 10 newcomers jointly File size = 2275 d = 30 k = 5 d DC k When d is large, joint repair does not have significant advantage over one-by-one repair. kshum
27
How much can we improve? Jul, 2011 27 One-by-one repair Repairing 10 newcomers jointly File size = 616 d = 8 k = 4 d DC k Repair-bandwidth reduction is more prominent when d is not so large. kshum
28
AN EXPLICIT CONSTRUCTION FOR MINIMUM-BANDWIDTH COOPERATIVE REPAIR Jul, 2011 28 kshum
29
An explicit construction for MBCR Jul, 2011kshum 29 Minimum repair- bandwidth Storage per node B = 8 information packets n = 4 nodes Each node stores 5 packets. Repair r = 2 failures simultaneously No. of connections for each DC = k=2 No. of helpers for each failed node =d=2 (S., Hu, ISIT 2011.) Require d = k, r = n–d
30
Min-Bandwidth point Jul, 2011 30 kshum One-by-one repair Repairing 2 new nodes cooperatively
31
Data Distribution 8 data packets: A, B, C, D, E, F, G, H A, B, C, D, F+G C, D, E, F, H+A E, F, G, H, B+C G, H, A, B, D+E XOR 5 packets: 4 systematic, 1 parity-check Jul, 2011 31 kshum
32
Data collection A, B, C, D, F+G C, D, E, F, H+A E, F, G, H, B+C G, H, A, B, D+E Data collector A,B,C,D,E,F,G,H A, B, C, D E, F, G, H Jul, 2011 32 kshum
33
Data collection A, B, C, D, F+G C, D, E, F, H+A E, F, G, H, B+C G, H, A, B, D+E Data collector A B C D E F G H Triangular, Full-rank F+G H+A A B C D E F A, B, C, F+G D, E, F, H+A Jul, 2011 33 kshum
34
Exact Repair A, B, C, D, F+G C, D, E, F, H+A E, F, G, H, B+C G, H, A, B, D+E BADC GH EF F+G B+C F+G How to repair? Total repair-bandwidth=10 Jul, 2011 34 kshum
35
Exact Repair A, B, C, D, F+G C, D, E, F, H+A E, F, G, H, B+C G, H, A, B, D+E CD GH D+EEH+A B+C F+GF E F E F How to repair? Total repair-bandwidth=10 Jul, 2011 35 kshum
36
Min-Bandwidth point Jul, 2011 36 kshum One-by-one repair Repairing 2 new nodes cooperatively
37
AN EXPLICIT CONSTRUCTION FOR MINIMUM-STORAGE COOPERATIVE REPAIR Jul, 2011 37 kshum
38
An explicit construction for MSCR Jul, 2011kshum 38 Minimum repair- bandwidth Storage per node B = 6 information packets n nodes Each node stores 2 packets. Repair r = 2 failures simultaneously No. of connections for each DC = k=3 No. of helpers for each failed node =d=3 (S. ICC 2011.) Require d = k
39
The min-storage point Jul, 2011 39 Non-cooperative k=3,d=3, r =2,B=6 Cooperative storage cost per node = 2 repair bandwidth per node = 4 3 DC 3 kshum
40
Data retrieval Jul, 2011 40 MDS code with dimension k=3 Source data encode codeword Storage nodes …… Data collector decode =2 kshum
41
Repair : phase 1 Jul, 2011 41 encode codeword Storage nodes lost decode newcomers kshum Source data
42
Repair: phase 2 Jul, 2011 42 encode codeword Storage nodes lost Re-encode exchange Repair bandwidth per node = 8/2 = 4 newcomers kshum
43
The construction is optimal Jul, 2011 43 Non-cooperative k=3,d=3, r =2,B=6 Cooperative storage cost per node = 2 repair bandwidth per node = 4 3 DC 3 kshum
44
EXISTENCE OF COOPERATIVE REGENERATING CODES UNDER FUNCTIONAL REPAIR Jul, 2011 44 kshum
45
Existence of optimal linear regenerating codes in general Sustainable storage system – Will it work after arbitrarily many repairs? Technical difficulty: The information flow graph is unbounded. Can we work over a fixed finite field, for unlimited number of regenerations? – Yes if we can construct an exact regenerating code. – The answer is also “yes” for cooperative functional repair in general. Jul, 2011kshum 45 (S., Hu, Netcod 2011.)
46
Trellis structure Jul, 2011kshum 46 m Message vector (row vector) … … … … Stage 0 Stage 1 Stage 2 mT 0 T 0 is the “transfer matrix” in stage 0 mT 0 T 1 T 1 is the “transfer matrix” in stage 1 T 2 is the “transfer matrix” in stage 2 mT 0 T 1 T 2
47
Flow in information flow graph Jul, 2011kshum 47 S Out 1 Out 2 Out 3 Out 4 In 1 In 2 Mid 1 Mid 2 Out 1 Out 2 5 5 5 5 5 5 2 2 2 2 1 1 DC In 3 In 4 Mid 3 Mid 4 Out 3 Out 4 5 5 1 1 2 2 2 2 4 4 4 1 1 3 1 2 5 3 1 2 2 2 2 4 4 0 0 0 Out 3 Out 4 The cut-set bound says that the cut capacity is at least 8. Can we construct a flow with value 8?
48
Cross-sectional flow pattern Jul, 2011kshum 48 S Out 1 Out 2 Out 3 Out 4 In 1 In 2 Mid 1 Mid 2 Out 1 Out 2 5 5 5 5 5 2 2 2 2 1 1 DC In 1 In 2 Mid 1 Mid 2 Out 1 Out 2 5 1 1 2 2 2 2 4 4 4 1 1 3 1 2 5 3 1 2 2 2 2 4 4 0 0 0 5 3 0 0 4 4 0 0 4 0 4 0 Out 3 Out 4
49
A recursive construction of flow Jul, 2011kshum 49 In 1 In 2 Mid 1 Mid 2 Out 1 Out 2 Out 3 Out 4 Out 3 Out 4 Stage s Stage s+1 g1g1 g2g2 g4g4 g3g3 h1h1 h2h2 h4h4 h3h3 1.Identify a set of cross- section flow pattern, say H. 2.For any cross-section flow pattern (h 1, h 2, h 3, h 4 ) in H stage s+1, we can find a flow in this segment of graph, such that (g 1, g 2, g 3, g 4 ) is also in H. 3.Each pattern corresponds to a submatrix of the transfer matrix. 4.By Schwartz-Zippel lemma, we can find the local encoding vectors so that all such determinants are non- zero, if the finite field is sufficiently large.
50
Summary Multiple node failures in medium-scale to large-scale storage system Formulation as a linear program Functional repair: Linear regenerating code over fixed finite field which matches the cut- set bound on repair-bandwidth exists. Exact repair: two families of explicit code constructions – Minimum-bandwidth point: d=k, r = n – d – Minimum-storage point: d=k, r arbitrary Jul, 2011 50 kshum
51
References Y. Wu and A. G. Dimakis, Reducing repair traffic for erasure coding-based storage via interference alignment, ISIT, Jul, 2009. Y. Hu, Y. Xu, X. Wang, C. Zhan and P. Li, Cooperative recovery of distributed storage systems from multiple losses with network coding, J. Sel. Area Comm., vol. 28, no. 2, pp.268-275, Feb, 2010. K. W. Shum, Cooperative Regenerating Codes for Distributed Storage Systems, ICC, Jun, 2011. A.-M. Kermarrec and N. Le Scouarnec and G. Straub, Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes, Netcod, Jul, 2011. K. W. Shum and Y. Hu, Existence of Minimum-Repair-Bandwidth Cooperative Regenerating Codes, Netcod, Jul, 2011. K. W. Shum and Y. Hu, Exact Minimum-Repair-Bandwidth Cooperative Regenerating Codes for Distributed Storage Systems, ISIT, Aug, 2011. Jul, 2011kshum 51
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.