Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alex Dimakis based on collaborations with Mahesh Sathiamoorthy Megas Asteris Dimitris Papailiopoulos Kannan Ramchandran Scott Chen Ramkumar Vadali Dhruba.

Similar presentations


Presentation on theme: "Alex Dimakis based on collaborations with Mahesh Sathiamoorthy Megas Asteris Dimitris Papailiopoulos Kannan Ramchandran Scott Chen Ramkumar Vadali Dhruba."— Presentation transcript:

1 Alex Dimakis based on collaborations with Mahesh Sathiamoorthy Megas Asteris Dimitris Papailiopoulos Kannan Ramchandran Scott Chen Ramkumar Vadali Dhruba Borthakur USC Network Coding for Cloud Storage facebook

2 2 Distributed storage systems Numerous disk failures per day. Failures are the norm rather than the exception Must introduce redundancy for reliability Replication or erasure coding?

3 33 how to store using erasure codes A B A B A+B B A+2B A A+B A B (3,2) MDS code, (single parity) used in RAID 5 (4,2) MDS code. Tolerates any 2 failures Used in RAID 6 k=2 n=3 n=4 File or data object

4 44 erasure codes are reliable A B A A B B A+B A+2B (4,2) MDS erasure code (any 2 suffice to recover) A B vs Replication File or data object

5 storing with an (n,k) code An (n,k) erasure code provides a way to: Take k packets and generate n packets of the same size such that Any k out of n suffice to reconstruct the original k Optimal reliability for that given redundancy. Well-known and used frequently, e.g. Reed-Solomon codes, Array codes, LDPC and Turbo codes. Each packet is stored at a different node, distributed in a network. 5

6 12345678910 123456789 123456789 current hadoop architecture 640 MB file => 10 blocks 3x replication is HDFS current default. Very large storage overhead. Very costly for BIG data

7 12345678910 P1P2P3P4 facebook introduced Reed- Solomon (HDFS RAID) 640 MB file => 10 blocks Older files are switched from 3-replication to (14,10) Reed Solomon.

8 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 Tolerates 2 missing blocks, Storage cost 3x 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 P1 P2 P3 P4 Tolerates 4 missing blocks, Storage cost 1.4x HDFS RAID. Uses Reed-Solomon Erasure Codes Source file Parity file Diskreduce (B. Fan, W. Tantisiriroj, L. Xiao, G. Gibson)

9 RS codes save 5PB

10 Limitations of Reed Solomon Currently only 8% of facebook’s data warehouse is RS encoded. (still significant saving). Our Goal: move to 40-50% of coded data. Save Petabytes. 10

11 11 Coding+Storage Networks = New open problems Issues: Communication Update complexity Repair communication Repair bits Read No of nodes accessed A B ? Network traffic

12 overview 12 Storing information using codes. The repair problem Exact Repair. The state of the art. Interference Alignment Different repair metrics The road to practice Storage Allocation problems

13 1 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 The repair problem Great, we can tolerate n-k=4 node failures.

14 1 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 The repair problem Great, we can tolerate 4 node failures. Most of the time we start with a single failure.

15 1 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 The repair problem 3’ ??? Great, we can tolerate 4 node failures. Most of the time we start with a single failure.

16 1 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 The repair problem 3’ Great, we can tolerate 4 node failures. Most of the time we start with a single failure. Read from any 10 nodes, send all data to 3’ who can repair the lost block.

17 1 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 The repair problem 3’ Great, we can tolerate 4 node failures. Most of the time we start with a single failure. Read from any 10 nodes, send all data to 3’ who can repair the lost block. High network bandwidth, High disk IO at 10 nodes.

18 18 If we have 1 failure, how do we rebuild the redundancy in a new disk? Naïve repair: send k blocks. Filesize B, B/k per block. 1 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 3’ The repair problem Do I need to reconstruct the Whole data object to repair one failure?

19 is repair frequent? 19 20 node failures * 15TB = 300TB if 8% RS coded, 588TB network traffic/day. (average total network: 2PB/day) ~30% of network traffic is repair in a normal day.

20 20 Ok, great, we can tolerate n-k disk failures without losing data. If we have 1 failure however, how do we rebuild the redundancy in a new disk? Naïve repair: send k blocks. Filesize B, B/k per block. 1 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 3’ The repair problem Do I need to reconstruct the Whole data object to repair one failure? Functional repair : 3’ can be different from 3. Maintains the any k out of n reliability property. Exact repair : 3’ is exactly equal to 3.

21 21 Ok, great, we can tolerate n-k disk failures without losing data. If we have 1 failure however, how do we rebuild the redundancy in a new disk? Naïve repair: send k blocks. Filesize B, B/k per block. 1 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 3’ The repair problem Do I need to reconstruct the Whole data object to repair one failure? Functional repair : 3’ can be different from 3. Maintains the any k out of n reliability property. Exact repair : 3’ is exactly equal to 3. Theorem: It is possible to functionally repair a code by communicating only As opposed to naïve repair cost of B bits. (Regenerating Codes)

22 Exact repair with 3GB a b c d a+c b+d b+c a+b+d a = (b+d) + (a+b+d) b = d + (b+d) a? b? 1GB

23 Systematic repair with 1.5GB a b c d a+c b+d b+c a+b+d a = (b+d) + (a+b+d) b = d + (b+d) a? b? 1GB Reconstructing all the data: 4GB Repairing a single node: 3GB 3 equations were aligned, solvable for a,b Reconstructing all the data: 4GB Repairing a single node: 3GB 3 equations were aligned, solvable for a,b

24 Repairing the last node a b c d a+c b+d b+c a+b+d b+c = (c+d) + (b+d) a+b+d = a + (b+d)

25 network coding: multicasting 25 data collector 2 S data collector 1 Each link carries one packet. You can use it once. Max number of packets I can send to dc2?

26 network coding: multicasting 26 data collector 2 S data collector 1 Each link carries one packet. You can use it once. Max number of packets I can send to dc2? use the red and the green path: 2 packets.

27 network coding: multicasting 27 min cut (s, dc2)? data collector 2 data collector 1 S

28 network coding: multicasting 28 Max flow= min-cut (Ford Fulkerson+ Elias Feinstein Shannon 1956) data collector 2 data collector 1 S

29 network coding: multicasting 29 Max flow from (S-dc(1))? Multicasting: maximum number of packets we can simultaneously send to many users. Cannot exceed min (mincut (S-dc(i)) data collector 2 data collector 1 S

30 network coding: multicasting 30 data collector 2 S data collector 1 Sending one packet to both is easy. Routing. Can we always achieve min (mincut (S-dc(i)) ?

31 the butterfly 31 data collector 2 data collector 1 S mincut(s,dc(1))=2 =mincut(s,dc(2)). Can we send (the same) two packets to both dc1,dc2 simultaneously?

32 the butterfly 32 data collector 2 data collector 1 S How does dc2 get the green packet?

33 the butterfly 33 data collector 2 data collector 1 S We need algebraic mixing of packets (network coding). A B A B B A A+B

34 4,2 MDS code is a multicasting NC 34 a b a b a+b a+2b S data collector data collector data collector data collector if all dc’s get both a,b, they can reconstruct the data object ∞ ∞ ∞ ∞

35 wait a minute 35 a b a b a+b a+2b S data collector can I just recover from a+b only? ∞

36 adding storage links 36 a b a b a+b a+2b S data collector ∞ b a+b a+2b a capacity= storage of node α

37 Let’s go back to this example. a b c d a+c b+d b+c a+b+d a? b? 1GB M=? α=? β=? d=?

38 Let’s go back to this example. a b c d a+c b+d b+c a+b+d a? b? 1GB M=4GB α=2GB β=? d=3

39 adding storage links 39 a b a b a+b a+2b S b a+b a+2b a capacity= storage of node α α =2 GB bb β β β data collector ∞ ∞

40 40 Proof idea: Information flow graph a e 2GB a bb cc dd α =2 GB data collector ∞ ∞ β β β 2+2 β ≥4 GB  β ≥1 GB Total repair comm. ≥3 GB S data collector

41 41 Proof sketch: reduction to multicasting a e a bb c dd data collector    S data collector data collector data collector functional repair = multicasting on the information flow graph. sufficient iff minimum of the min cuts is larger than file size M. (Ahlswede et al. Koetter & Medard, Ho et al.) data collector data collector c

42 quiz (5,3) MDS code M=1GB (storing a 1GB total file) k=3 n=5 (any 3 out of 5 must recover) 1 node lost. Newcomer can connect to d=4 nodes. α(ΜSR) (Minimum storage to have the any 3 out of 5 guarantee). What is the minimum repair bandwidth β? 42

43 quiz: Repairing a (5,3) MDS code 43 a b S b a capacity= storage of node α α =1/3 GB bb β β β data collector cc dd ee β

44 quiz 44 a b S b a capacity= storage of node α α =1/3 GB bb β β β data collector ∞ ∞ cc dd ee β ∞

45 quiz 45 a b S b a capacity= storage of node α α =1/3 GB bb β β β data collector ∞ ∞ cc dd ee β ∞ cut=1+β≥M

46 quiz 46 a b S b a capacity= storage of node α α =1/3 GB bb β β β data collector ∞ ∞ cc dd ee β ∞ cut=2/3+2β≥M 2/3+2β≥1 β≥1/6 GB

47 47 Ok, great, we can tolerate n-k disk failures without losing data. If we have 1 failure however, how do we rebuild the redundancy in a new disk? Naïve repair: send k blocks. Filesize B, B/k per block. 1 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 3’ The repair problem Do I need to reconstruct the Whole data object to repair one failure? Functional repair : 3’ can be different from 3. Maintains the any k out of n reliability property. Exact repair : 3’ is exactly equal to 3. Theorem: It is possible to functionally repair a code by communicating only As opposed to naïve repair cost of B bits. (Regenerating Codes)

48 Increasing storage reduces β 48 a b S b a capacity= storage of node α α =1/3+ε GB bb β β β data collector ∞ ∞ cc dd ee β ∞ cut=2/3+2ε+2β≥ M 2/3+2ε+2β≥1 β≥1/6-ε GB

49 49 The infinite graph for Repair x1x1 α α α α α β d α β d α β d α β d data collector k data collector x2x2 … xnxn

50 50 Theorem 3 : for any (n,k) code, where each node stores α bits, repairs from d existing nodes and downloads d β=γ bits, the feasible region is piecewise linear function described as follows: Storage-Communication tradeoff

51 51 Storage-Communication tradeoff Min-Storage Regenerating code Min-Bandwidth Regenerating code α (D, Godfrey, Wu, Wainwright, Ramchandran, IT Transactions (2010) ) γ=βd

52 52 The information flow graph is a generic tool to analyze distributed storage problems. Example: Cooperative repair [Shum & Hu] Cooperative repair

53 overview 53 Storing information using codes. The repair problem Exact Repair. The state of the art. Interference Alignment Different repair metrics Storage Allocation problems

54 54 Key problem: Exact repair a b c d e =a From Theorem 1, an (n,k) MDS code can be repaired by communicating What if we require perfect reconstruction? ? ? ?

55 55 Exact Repair-(4,2) example x1 x3 x2 x4 x1+x3 x2+x4 x1+2x3 2x2+3x4 x1? x2? x1+x2+x3+x4 2 -1 x1+2 3 -1 x2+x3+x4 2 -1 3 -1 x3+x4 (Wu and D., ISIT 2009) 1 1 1 1 Exact repair of the first node Trivial by communicating 4 blocks Can be done with 3?

56 x1?x1? 56 Repair vs Exact Repair x1x1 α α α α α β d α β d α β d α β d data collector k data collector x2x2 … xnxn Functional Repair= Multicasting Exact repair= Multicasting with intermediate nodes having (overlapping) requests. Cut set region might not be achievable Linear codes might not suffice (Dougherty et al.) Functional Repair= Multicasting Exact repair= Multicasting with intermediate nodes having (overlapping) requests. Cut set region might not be achievable Linear codes might not suffice (Dougherty et al.)

57 57 Exact Storage-Communication tradeoff? α Exact repair feasible? γ=βd

58 58 For (n,k=2) E-MSR repair can match cutset bound. [WD ISIT’09] (n=5,k=3) E-MSR systematic code exists (Cullina,D,Ho, Allerton’09) For k/n <=1/2 E-MSR repair can match cutset bound [Rashmi, Shah, Kumar, Ramchandran (2010)] E-MBR for all n,k, for d=n-1 matches cut-set bound. [Suh, Ramchandran (2010) ] What is known about exact repair

59 59 What can be done for high rates? Recently the interference alignment S.E. (Cadambe, Jafar, Maleki) and independently (Suh, Ramchandran) was shown to approach cut-set bound for E-MSR, for all (k,n,d). (However requires enormous field size and sub-packetization.) Shows that linear codes suffice to approach cut-set region for exact repair, for the whole range of parameters. Tamo et al., Papailiopoulos et al. and Cadambe et al. presented the first constructions of high rate exact regenerating codes at ISIT 2011. What is known about exact repair

60 60 Min-Storage Regenerating code (no known practical codes for high rates) Min-Bandwidth Regenerating code (practical) α γ=βd E-MSR Point E-MBR Point Exact Storage-Communication tradeoff?

61 61 Min-Storage Regenerating code (no known practical codes for high rates) Min-Bandwidth Regenerating code (practical) α γ=βd E-MSR Point E-MBR Point Exact Storage-Communication tradeoff?

62 62 Min-Storage Regenerating code (no known practical codes for high rates) Min-Bandwidth Regenerating code (practical) α γ=βd E-MSR Point E-MBR Point Exact Storage-Communication tradeoff? The ouzo problem: Characterize exact repair tradeoff region

63 overview 63 Storing information using codes. The repair problem Exact Repair. The state of the art. Interference Alignment Different repair metrics Storage Allocation problems

64 The coefficients of some variables lie in a lower dimensional subspace and can be canceled out. 64 Imagine getting three linear equations in four variables. In general none of the variables is recoverable. (only a subspace). A 1 +2A 2 + B 1 +B 2 =y 1 2A 1 +A 2 + B 1 +B 2 =y 2 B 1 +B 2 =y 3 Interference alignment How to form codes that have multiple alignments at the same time?

65 65 Exact Repair-(4,2) example x1 x3 x2 x4 x1+x3 x2+x4 x1+2x3 2x2+3x4 x1? x2? x1+x2+x3+x4 2 -1 x1+2 3 -1 x2+x3+x4 2 -1 3 -1 x3+x4 (Wu and D., ISIT 2009) 1 1 1 1 Exact repair of the first node Trivial by communicating 4 blocks Can be done with 3?

66 66 10 01 00 00 00 00 10 01 10 01 10 01 10 02 20 03 11 11 2 -1 3 -1 0011 1111 2 -1 23 -1 11 v2v2 v3v3 v4v4 = = = Exact Repair-interference alignment

67 67 10 01 00 00 00 00 10 01 10 01 10 01 10 02 20 03 11 11 2 -1 3 -1 Exact Repair-interference alignment = = = [Cadambe-Jafar 2008, Cadambe-Jafar-Maleki-2010]

68 We want this full rank 68 10 01 00 00 00 00 10 01 10 01 10 01 10 02 20 03 11 11 2 -1 3 -1 Exact Repair-interference alignment = = = Choose same V’ and V Make all A diagonal iid Want this in the span of V’

69 69 Exact Repair-interference alignment We have to choose V, V’ so that all the rows in Are contained in the rowspan of The T i matrices assumed iid diagonal, no assumption other than that they commute

70 70 Exact Repair-interference alignment We have to choose V, V’ so that all the rows in Are contained in the rowspan of Ok. Lets start by choosing V to be one vector w

71 Exact Repair-interference alignment And fold it back in… by repeating this ‘folding’ V and V’ overlap more and more.

72 A combinatorial view Look at the exponents of T 1,T 2 as lattice points [Papailiopoulos, D, Cadambe Allerton 2011]

73 A combinatorial view Look at the exponents of T 1,T 2 as lattice points

74 A combinatorial view Look at the exponents of T 1,T 2 as lattice points Overla p

75 A combinatorial view Cadambe-Jafar set V to be the interior of a hypercube

76 Open problem which set of dots overlaps with its two shifts maximally? (easy in 2D)

77 Given an error-correcting code find the repair coefficients that reduce communication (over a field) Given some channel matrices find the beamforming matrices that maximize the DoF (Cadambe and Jafar, Suh and Tse) Given some channel matrices find the beamforming matrices that maximize the DoF (Cadambe and Jafar, Suh and Tse) connecting storage and wireless Both problems reduce to rank minimization subject to full rank constraints. Polynomial reduction from one to the other. (Papailiopoulos & D. Asilomar 2010) Both problems reduce to rank minimization subject to full rank constraints. Polynomial reduction from one to the other. (Papailiopoulos & D. Asilomar 2010)

78 78 Storage codes through alignment techniques The symbol extension alignment technique of [Cadambe and Jafar] leads to exact regenerating codes Exact repair is a non-multicast problem where cut-set region is achievable but needs alignment. (unfortunately not practical) ergodic alignment should have a storage code equivalent? does real alignment have a finite-field equivalent?

79 overview 79 Storing information using codes. The repair problem Exact Repair. The state of the art. Interference Alignment Different repair metrics The road to practice Storage Allocation problems

80 [Locally decodable codes recent work by [Gopalan,Yekhanin et al],[Oggier et al])] Different metrics of interest 80 Many companies are investigating the use of erasure codes (Google, Microsoft, NetApp, Wuala, Cleversafe) since large amounts of data require higher reliability. Especially for archival storage. Several metrics of interest: 1. Bits communicated for repair (network traffic generated) 2. Bits read for repairs (open) 3. Number of Nodes used during a repair.

81 Locality of a code Example: (6,4)-MDS Code M = 4Mb file, M/k = 1 Mb per node Well… any k nodes can reconstruct everything data1 parity 1 parity 2 data2 data3 data4 data1 Lemma: An (n,k) MDS code has locality no less than k. 81 MDS Codes = worst locality

82 If we allow more storage, can we have i) high-rate, ii) the erasure property, iii) local and simple repairs? What is the Cost of Locality? 82 data1 parity 1 parity 2 data2data3data4 data1

83 Theorem 1 : (Locality, Storage) Locally Repairable Codes Theorem 2 : This is the optimal tradeoff between repair locality and storage 83

84 Simple example 84

85 (4,2) example n=4 nodes, each node stores 2 data packets and one fork (f=2). Any k=2 nodes can recover (even without using the forks)

86 (4,2) example n=4 nodes, each node stores 2 data packets and one fork (f=2). Any k=2 nodes can recover the file (f1,f2) (even without using the forks)

87 (4,2) example- exact repair ? ? ?

88 forks are used for local repairs Outer MDS codes used to provide the (n,k) safety Must ensure that a fork and its parents are stored in different nodes (nontrivial combinatorial placement problems).

89 File is Separated in m blocks A code (possibly MDS code) produces T blocks. Each coded block is stored in r=1.5 nodes. m Each storage node Stores d coded blocks. n Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. + + General construction

90 File is Separated in m blocks An MDS code produces T blocks. Each coded block is stored in r nodes. m Each storage node Stores d coded blocks. n Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. Claim: I can still do easy lookup repair. d packets lost + + General construction

91 File is Separated in m blocks An MDS code produces T blocks. Each coded block is stored in r nodes. m Each storage node Stores d coded blocks. n Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. Claim: I can still do easy lookup repair. 2d disk IO and communication [ Papailipoulos et al. to be submitted] d packets lost + + General construction

92 overview 92 Storing information using codes. The repair problem Exact Repair. The state of the art. Interference Alignment Different repair metrics The road to practice Storage Allocation problems

93 the road to practical use 93 [Hu, Yu, Li, Lee, Lui] CUHK Network Coding File System (NetCod 2011)]

94 the road to practical use 94 [Hu, Yu, Li, Lee, Lui] CUHK Network Coding File System (NetCod 2011)]

95 95 Hadoop Mapreduce Yahoo created an open-source version of GFS (called Hadoop Distributed File System HDFS) Plus an analytics infrastructure. Together the software is called Apache Hadoop Mapreduce Hundreds of companies are using Hadoop, tens of startups are developing tools for Hadoop. ( BigData ) It is open source, free, changing the world.

96 code design 96 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 RS p1 p2 p3 p4 10 x1 + + x2 + + x3 + + Local XORs allow single block recovery by transferring only 5 blocks (320MB) instead of 10 blocks (640 MB in HDFS RAID). 17 total blocks stored Storage overhead increased to 1.7x from 1.4x

97 code design 97 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 RS p1 p2 p3 p4 10 x1 + + x2 + + x3 + + Local XORs can be any local linear combinations (just invert in repair) Choose coefficients so that x1+x2=x3 (interference alignment) Do not store x3!

98 code design 98 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 RS p1 p2 p3 p4 10 x1 + + x2 + + x3 + + + + + + p2

99 code we implemented in HDFS 99 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 RS p1 p2 p3 p4 10 x1 + + x2 + + x3 + + Single block failures can be repaired by accessing 5 blocks. (vs 10) Stores 16 blocks 1.6x Storage overhead vs 1.4x in HDFS RAID. Implemented this in Hadoop (system available on github/madiator)

100 Java implementation 100

101 Some experiments 101

102 102 Some experiments 100 machines on Amazon ec2 50 machines running HDFS RAID (facebook version, (14,10) Reed Solomon code ) 50 running our version USC3XOR HDFS 3XOR Regenerating code 50 files uploaded on system, 640MB per file Killing nodes and measuring network traffic, disk IO, CPU, etc during node repairs.

103 103

104 104

105 105

106 what we observe 106 New storage code reduces bytes read by roughly 2.6x Network bandwidth reduced by approximately 2x We use 14% more storage. Similar CPU. In several cases 30-40% faster repairs. Study on larger scale-on going. Provides four more zeros of data availability compared to replication Gains can be much more significant if larger codes are used (i.e. for archival storage systems).

107 107 Conclusions and open problems There are several theoretical open problems in coding for distributed storage. Exact repair region ? (12 bottles of ouzo) Repairing codes with a small finite field limit ? Dealing with bit-errors (security) and privacy ? Network topology awareness (same rack/data center) ? Disk read bounds, Locally repairable codes ? Also there seems to be significant potential for use in real systems. (especially for large archival storage or across data centers) 107

108 108 Coding for Storage wiki

109 109 fin

110 A Storage allocation problem

111 Allocations for one object0.1

112 Problem Description Can be generalized to other node failure models Nonconvex problem. Harder than it looks.

113 Allocations for one objectA B C

114 Symmetric allocations can be suboptimal – † Given n = 5 storage nodes, budget T = 12/5, and p = 0.9, the nonsymmetric allocation performs better than the optimal symmetric allocation Finding the optimal symmetric allocation is also nontrivial Distributed storage allocations

115 Leong, D. Ho, Netcod 2009, ICC, Globecom 2010 Distributed storage allocations Results can be obtained for different access models. For iid model. Theorem : Maximal spreading x i = T/n, for all i in [1,n], has asymptotically zero gap from optimality if Tp>1 Conjecture : There is a phase transition from minimal spreading to maximal spreading being optimal, as n grows.

116 Storage allocations and combinatorics 116 The storage allocation problem was recently shown to be equivalent to an old conjecture by Erdos on uniform hypergraphs. A storage counterexample from Leong,D,Ho turns out to be a counterexample to the strong fractional Erdos Conjecture on uniform hypergraphs (Alon et al. 2012).

117 On-going implementations 117 Network Coding File System (NCFS) @ CUHK -New file system over FUSE -Uses Exact MBR codes by Rashmi et al. -Open source, available http://ansrlab.cse.cuhk.edu.hk/software/ncfs/ Our own implementation over Hadoop (HDFS RAID). -Implementing locally repairable codes -Java open source implementation- easy to add new codes and experiment

118 118 Open Problems in distributed storage Cut-Set region matches exact repair region ? Repairing codes with a small finite field limit ? Dealing with bit-errors (security) and privacy ? (Dikaliotis,D, Ho, ISIT’10) What is the role of (non-trivial) network topologies ? Cooperative repair (Shum et al.) Lookup repair region ? Disk IO region ? What are the limits of interference alignment techniques ? Repairing existing codes used in storage (e.g. EvenOdd, B- Code, Reed-Solomon etc) ? Real world implementation, benefits over HDFS for Mapreduce ? Archival storage, Storage in Flash SSDs, Cloud Storage? 118

119 overview 119 Storing information using codes. The repair problem Exact Repair. The state of the art. The role of Interference Alignment Future directions: security through coding

120 120 coding allows secret sharing a b c d Four coded blocks are stored in four different cloud storage providers Any two can be used to recover the data Any cloud storage provider knows nothing about the data. [Shamir, Blakley 1979] Distributed coding theory problems?

121 121 Security during Repair ? a b c e Incorrect linear equations d Repair bandwidth in the presence of byzantine adversaries?

122 122 Exact Repair-(4,2) example x1 x3 x2 x4 x1+x3 x2+x4 x1+2x3 2x2+3x4 x1? x2? x1+x2+x3+x4 2 -1 x1+2 3 -1 x2+x3+x4 2 -1 3 -1 x3+x4 (Wu and D., ISIT 2009) 1 1 1 1

123 The ring code n=5 k=3 Any 3 nodes must suffice to recover the data. set x 5 =x 1 +x 2 +x 3 +x 4 not an MDS code (has rate 1/2 * 4/5) lower than k/n= 3/5

124 The ring code 124 n=5 k=3 Any 3 nodes know m=4 packets. An MDS code produces T=5 blocks. Each coded block is stored in r=2 nodes.

125 The ring code 125 An MDS code produces T blocks. m=4 n=5

126 The ring code: lookup repair n=5 k=3 node 1 fails. just read from d=2 other nodes. Minimizing d is proportional to total disk IO.


Download ppt "Alex Dimakis based on collaborations with Mahesh Sathiamoorthy Megas Asteris Dimitris Papailiopoulos Kannan Ramchandran Scott Chen Ramkumar Vadali Dhruba."

Similar presentations


Ads by Google