Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.

Similar presentations


Presentation on theme: "Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011."— Presentation transcript:

1 Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

2 Multiple node failures Large-scale storage system – Google data center, example from Kannan’s talk. – 800000 servers, fail rate = 4% per year – Repair in 2 days – Mean number of failed servers in 2 days = 175. The lazy-repair policy in TotalRecall – A repair process is triggered only after the number of failed nodes has reached a certain threshold. Jul, 2011 2 kshum

3 Jointly repair multiple failures Jul, 2011 Hu et al. (JSAC, Feb 2010) 3 Can we further reduce the repair-bandwidth? Data exchange kshum Storage nodes Newcomers

4 Distributed storage (erasure coding) Jul, 2011 4 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 Data Collector Wu, Dimakis ISIT09 kshum

5 Naive Repair Jul, 2011 5 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 4 packets required. A1A2A1A2 B 1, B 2 A 1 +B 1, 2 A 1 +B 2 kshum

6 Repair with ``code alignment’’ Jul, 2011 6 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 A1A2A1A2 3 packets required. B 1 + B 2 A 1 +2 A 2 +B 1 + B 2 2 A 1 + A 2 +B 1 + B 2 Solve: P 1 = A 1 +2 A 2 P 2 = 2 A 1 + A 2 kshum

7 Multiple failures, separate repair Jul, 2011 7 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 8 packets in total 4 packets per newcomer B1B2B1B2 2 packets 2 A 1 +B 1 A 2 +B 2 2 packets kshum

8 Multiple failures, cooperative repair (I) Jul, 2011 8 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 6 packets in total 3 packets per newcomer A 1, A 2 2A 2 +B 2 A 1 +B 1 B 1,B 2 B1B2B1B2 2 A 1 +B 1 A 2 +B 2 kshum

9 Multiple failures, cooperative repair (II) Jul, 2011 9 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 6 packets in total 3 packets per newcomer A 1 +B 1 A1A1 A 1 A 1 +B 1 A2A2 2A 2 +B 2 A 2 2A 2 +B 2 B2B2 B2B2 2A 1 +B 1 A 2 +B 2 B1B1 kshum

10 Outline of the talk Is it optimal in terms of repair-bandwidth? What is the tradeoff between storage and repair-bandwidth for cooperative repair? Can we achieve the Pareto-optimal operating points on the tradeoff curve by linear network coding? – Exact repair – Functional repair Jul, 2011 10 kshum

11 In 2  Information flow graph Jul, 2011 11 S In 1 Out 1      Data Collector   Out 2 In 3 Out 3  In 4 Out 4  In 5 Out 5   Out 6  Out 7 11 11 11 In 6 In 7 11 11 11  Mid 6 Mid 7   22 22 kshum

12 Is this regenerating code optimal ? Jul, 2011 12 A1A2A1A2 B1B2B1B2 A 1 +B 1 2 A 2 +B 2 A 1, A 2, B 1, B 2 2 A 1 +B 1 A 2 +B 2 6 packets in total 3 packets per newcomer A 1 +B 1 A1A1 A 1 A 1 +B 1 A2A2 2A 2 +B 2 A 2 2A 2 +B 2 B2B2 B2B2 2A 1 +B 1 A 2 +B 2 A1A1 kshum

13 In 2  First cut Jul, 2011 13 B In 1 Out 1      Data Collector   Out 2 In 3 Out 3  In 4 Out 4   Out 6  Out 7 Mid 6 Mid 7   22 22 11 11 11 11 B  4  1 In 6 In 7 kshum

14  Second cut Jul, 2011 14 Out 1  Data Collector   Out 2 Out 3  Out 4  2 Out 1 2 Out 2 Mid 1 Mid 2   22 22 11 11 11 11 Out 3  Out 4 Mid 3 Mid 4  22 22 In 1 In 2 In 3 In 4 11 11 B  2+  1 +  2 kshum

15 A linear programming problem Minimize 2  1 +  2 (repair bandwidth) Subject to 4  4  1 4  2+  1 +  2  1,  2  0 Jul, 2011 15  1  1  2  1 22 11 1 1  At least 3 packets kshum

16 In 2  Non-homogeneous download traffic Jul, 2011 16 B In 1 Out 1      Data Collector   Out 2 In 3 Out 3  In 4 Out 4   Out 6  Out 7 Mid 6 Mid 7   22 22 aa dd cc bb B   a +  b +  c +  d In 6 In 7 kshum

17  Non-homogeneous traffic Jul, 2011 17 Out 1  Data Collector   Out 2 Out 3  Out 4  2 Out 1 2 Out 2 Mid 1 Mid 2   22 22 11 11 11 11 Out 3  Out 4 Mid 3 Mid 4  ii jj In 1 In 2 In 3 In 4 hh ff ee ff gg B  2+  f +  j kshum

18  Non-homogeneous traffic Jul, 2011 18 Out 1  Data Collector   Out 2 Out 3  Out 4  2 Out 1 2 Out 2 Mid 1 Mid 2   22 22 11 11 11 11 Out 3  Out 4 Mid 3 Mid 4  ii jj In 1 In 2 In 3 In 4 hh ff ee ff gg B  2+  f +  j B  2+  h +  i kshum

19  Non-homogeneous traffic Jul, 2011 19 Out 1  Data Collector   Out 2 Out 3  Out 4  2 Out 1 2 Out 2 Mid 1 Mid 2   22 22 11 11 11 11 Out 3  Out 4 Mid 3 Mid 4  ii jj In 1 In 2 In 3 In 4 hh ff ee ff gg B  2+  f +  j B  2+  h +  i B  2+  e +  j kshum

20  Non-homogeneous traffic Jul, 2011 20 Out 1  Data Collector   Out 2 Out 3  Out 4  2 Out 1 2 Out 2 Mid 1 Mid 2   22 22 11 11 11 11 Out 3  Out 4 Mid 3 Mid 4  ii jj In 1 In 2 In 3 In 4 hh ff ee ff gg B  2+  f +  j B  2+  h +  i B  2+  e +  j B  2+  g +  i kshum

21 The same LP problem Minimize Subject to Jul, 2011 21 1 1  At least 3 packets kshum

22 TRADEOFF BETWEEN STORAGE AND REPAIR-BANDWIDTH Jul, 2011 22 kshum

23 Storage vs Repair-bandwidth Jul, 2011 23 One-by-one repair Repairing 3 newcomers jointly File size = 420 d = 8 k = 4 d  DC k kshum (S., ICC 2011, Kermarrec, Le Scouamec and Straub, Netcod 2011.)

24 Fair comparison? Jul, 2011 24 One-by-one repair repair degree = 8 Cooperative repair Surviving nodes Number of connections per each newcomer = 8 Number of connections per each newcomer = 8+2 kshum

25 MBCR and MSCR Jul, 2011 25 One-by-one repair Cooperative repair Minimum bandwidth cooperative repair (MBCR) Minimum storage cooperative repair (MSCR) kshum

26 How much can we improve? Jul, 2011 26 One-by-one repair Repairing 10 newcomers jointly File size = 2275 d = 30 k = 5 d  DC k When d is large, joint repair does not have significant advantage over one-by-one repair. kshum

27 How much can we improve? Jul, 2011 27 One-by-one repair Repairing 10 newcomers jointly File size = 616 d = 8 k = 4 d  DC k Repair-bandwidth reduction is more prominent when d is not so large. kshum

28 AN EXPLICIT CONSTRUCTION FOR MINIMUM-BANDWIDTH COOPERATIVE REPAIR Jul, 2011 28 kshum

29 An explicit construction for MBCR Jul, 2011kshum 29 Minimum repair- bandwidth Storage per node  B = 8 information packets  n = 4 nodes Each node stores 5 packets. Repair r = 2 failures simultaneously No. of connections for each DC = k=2 No. of helpers for each failed node =d=2 (S., Hu, ISIT 2011.) Require d = k, r = n–d

30 Min-Bandwidth point Jul, 2011 30 kshum One-by-one repair Repairing 2 new nodes cooperatively

31 Data Distribution 8 data packets: A, B, C, D, E, F, G, H A, B, C, D, F+G C, D, E, F, H+A E, F, G, H, B+C G, H, A, B, D+E XOR 5 packets: 4 systematic, 1 parity-check Jul, 2011 31 kshum

32 Data collection A, B, C, D, F+G C, D, E, F, H+A E, F, G, H, B+C G, H, A, B, D+E Data collector A,B,C,D,E,F,G,H A, B, C, D E, F, G, H Jul, 2011 32 kshum

33 Data collection A, B, C, D, F+G C, D, E, F, H+A E, F, G, H, B+C G, H, A, B, D+E Data collector A B C D E F G H Triangular, Full-rank F+G H+A A B C D E F A, B, C, F+G D, E, F, H+A Jul, 2011 33 kshum

34 Exact Repair A, B, C, D, F+G C, D, E, F, H+A E, F, G, H, B+C G, H, A, B, D+E BADC GH EF F+G B+C F+G How to repair? Total repair-bandwidth=10 Jul, 2011 34 kshum

35 Exact Repair A, B, C, D, F+G C, D, E, F, H+A E, F, G, H, B+C G, H, A, B, D+E CD GH D+EEH+A B+C F+GF E F E F How to repair? Total repair-bandwidth=10 Jul, 2011 35 kshum

36 Min-Bandwidth point Jul, 2011 36 kshum One-by-one repair Repairing 2 new nodes cooperatively

37 AN EXPLICIT CONSTRUCTION FOR MINIMUM-STORAGE COOPERATIVE REPAIR Jul, 2011 37 kshum

38 An explicit construction for MSCR Jul, 2011kshum 38 Minimum repair- bandwidth Storage per node  B = 6 information packets  n nodes Each node stores 2 packets. Repair r = 2 failures simultaneously No. of connections for each DC = k=3 No. of helpers for each failed node =d=3 (S. ICC 2011.) Require d = k

39 The min-storage point Jul, 2011 39 Non-cooperative k=3,d=3, r =2,B=6 Cooperative storage cost per node = 2 repair bandwidth per node = 4 3  DC 3 kshum

40 Data retrieval Jul, 2011 40 MDS code with dimension k=3 Source data encode codeword Storage nodes …… Data collector decode  =2 kshum

41 Repair : phase 1 Jul, 2011 41 encode codeword Storage nodes lost decode newcomers kshum Source data

42 Repair: phase 2 Jul, 2011 42 encode codeword Storage nodes lost Re-encode exchange Repair bandwidth per node = 8/2 = 4 newcomers kshum

43 The construction is optimal Jul, 2011 43 Non-cooperative k=3,d=3, r =2,B=6 Cooperative storage cost per node = 2 repair bandwidth per node = 4 3  DC 3 kshum

44 EXISTENCE OF COOPERATIVE REGENERATING CODES UNDER FUNCTIONAL REPAIR Jul, 2011 44 kshum

45 Existence of optimal linear regenerating codes in general Sustainable storage system – Will it work after arbitrarily many repairs? Technical difficulty: The information flow graph is unbounded. Can we work over a fixed finite field, for unlimited number of regenerations? – Yes if we can construct an exact regenerating code. – The answer is also “yes” for cooperative functional repair in general. Jul, 2011kshum 45 (S., Hu, Netcod 2011.)

46 Trellis structure Jul, 2011kshum 46 m Message vector (row vector) … … … … Stage 0 Stage 1 Stage 2 mT 0 T 0 is the “transfer matrix” in stage 0 mT 0 T 1 T 1 is the “transfer matrix” in stage 1 T 2 is the “transfer matrix” in stage 2 mT 0 T 1 T 2

47 Flow in information flow graph Jul, 2011kshum 47 S Out 1 Out 2 Out 3 Out 4 In 1 In 2 Mid 1 Mid 2 Out 1 Out 2 5 5 5 5 5 5 2 2 2 2 1 1   DC   In 3 In 4 Mid 3 Mid 4 Out 3 Out 4 5 5 1 1   2 2 2 2 4 4 4 1 1 3 1 2 5 3 1 2 2 2 2 4 4 0 0 0 Out 3 Out 4 The cut-set bound says that the cut capacity is at least 8. Can we construct a flow with value 8?

48 Cross-sectional flow pattern Jul, 2011kshum 48 S Out 1 Out 2 Out 3 Out 4 In 1 In 2 Mid 1 Mid 2 Out 1 Out 2 5 5 5 5 5 2 2 2 2 1 1  DC   In 1 In 2 Mid 1 Mid 2 Out 1 Out 2 5 1 1   2 2 2 2 4 4 4 1 1 3 1 2 5 3 1 2 2 2 2 4 4 0 0 0 5 3 0 0 4 4 0 0 4 0 4 0 Out 3 Out 4

49 A recursive construction of flow Jul, 2011kshum 49 In 1 In 2 Mid 1 Mid 2 Out 1 Out 2 Out 3 Out 4 Out 3 Out 4 Stage s Stage s+1 g1g1 g2g2 g4g4 g3g3 h1h1 h2h2 h4h4 h3h3 1.Identify a set of cross- section flow pattern, say H. 2.For any cross-section flow pattern (h 1, h 2, h 3, h 4 ) in H stage s+1, we can find a flow in this segment of graph, such that (g 1, g 2, g 3, g 4 ) is also in H. 3.Each pattern corresponds to a submatrix of the transfer matrix. 4.By Schwartz-Zippel lemma, we can find the local encoding vectors so that all such determinants are non- zero, if the finite field is sufficiently large.

50 Summary Multiple node failures in medium-scale to large-scale storage system Formulation as a linear program Functional repair: Linear regenerating code over fixed finite field which matches the cut- set bound on repair-bandwidth exists. Exact repair: two families of explicit code constructions – Minimum-bandwidth point: d=k, r = n – d – Minimum-storage point: d=k, r arbitrary Jul, 2011 50 kshum

51 References Y. Wu and A. G. Dimakis, Reducing repair traffic for erasure coding-based storage via interference alignment, ISIT, Jul, 2009. Y. Hu, Y. Xu, X. Wang, C. Zhan and P. Li, Cooperative recovery of distributed storage systems from multiple losses with network coding, J. Sel. Area Comm., vol. 28, no. 2, pp.268-275, Feb, 2010. K. W. Shum, Cooperative Regenerating Codes for Distributed Storage Systems, ICC, Jun, 2011. A.-M. Kermarrec and N. Le Scouarnec and G. Straub, Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes, Netcod, Jul, 2011. K. W. Shum and Y. Hu, Existence of Minimum-Repair-Bandwidth Cooperative Regenerating Codes, Netcod, Jul, 2011. K. W. Shum and Y. Hu, Exact Minimum-Repair-Bandwidth Cooperative Regenerating Codes for Distributed Storage Systems, ISIT, Aug, 2011. Jul, 2011kshum 51


Download ppt "Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011."

Similar presentations


Ads by Google