Self-repairing Homomorphic Codes for Distributed Storage Systems [1] Tao He Software Engineering Laboratory Department of Computer Science,

Slides:



Advertisements
Similar presentations
1 A triple erasure Reed-Solomon code, and fast rebuilding Mark Manasse, Chandu Thekkath Microsoft Research - Silicon Valley Alice Silverberg Ohio State.
Advertisements

Digital Fountains: Applications and Related Issues Michael Mitzenmacher.
Lect.3 Modeling in The Time Domain Basil Hamed
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
current hadoop architecture
Cyclic Code.
Alex Dimakis based on collaborations with Dimitris Papailiopoulos Arash Saber Tehrani USC Network Coding for Distributed Storage.
© 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.
Henry C. H. Chen and Patrick P. C. Lee
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Digital Fountain Codes V. S
15-853:Algorithms in the Real World
Information and Coding Theory
BASIC Regenerating Codes for Distributed Storage Systems Kenneth Shum (Joint work with Minghua Chen, Hanxu Hou and Hui Li)
Simple Regenerating Codes: Network Coding for Cloud Storage Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li University.
Yuchong Hu1, Henry C. H. Chen1, Patrick P. C. Lee1, Yang Tang2
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
Beyond the MDS Bound in Distributed Cloud Storage
Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Compressive Oversampling for Robust Data Transmission in Sensor Networks Infocom 2010.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Informed Content Delivery Across Adaptive Overlay Networks J. Byers, J. Considine, M. Mitzenmacher and S. Rost Presented by Ananth Rajagopala-Rao.
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.
Fountain Codes Amin Shokrollahi EPFL and Digital Fountain, Inc.
Redundant Data Update in Server-less Video-on-Demand Systems Presented by Ho Tsz Kin.
10th Canadian Workshop on Information Theory June 7, 2007 Rank-Metric Codes for Priority Encoding Transmission with Network Coding Danilo Silva and Frank.
Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.
Mario Vodisek 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Erasure Codes for Reading and Writing Mario Vodisek ( joint work.
Network Coding for Distributed Storage Systems IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
1 Solid State Storage (SSS) System Error Recovery LHO 08 For NASA Langley Research Center.
Repairable Fountain Codes Megasthenis Asteris, Alexandros G. Dimakis IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 5, MAY /5/221.
1 Failure Correction Techniques for Large Disk Array Garth A. Gibson, Lisa Hellerstein et al. University of California at Berkeley.
© 2012 A. Datta & F. Oggier, NTU Singapore Redundantly Grouped Cross-object Coding for Repairable Storage Anwitaman Datta & Frédérique Oggier NTU Singapore.
Sequential Soft Decision Decoding of Reed Solomon Codes Hari Palaiyanur Cornell University Prof. John Komo Clemson University 2003 SURE Program.
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
Great Theoretical Ideas in Computer Science.
Chih-Ming Chen, Student Member, IEEE, Ying-ping Chen, Member, IEEE, Tzu-Ching Shen, and John K. Zao, Senior Member, IEEE Evolutionary Computation (CEC),
1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological.
Threshold Phenomena and Fountain Codes Amin Shokrollahi EPFL Joint work with M. Luby, R. Karp, O. Etesami.
CprE 545 project proposal Long.  Introduction  Random linear code  LT-code  Application  Future work.
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.
Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei.
1 Yuan Luo Xi’an Jan Optimum Distance Profiles of Linear Block Codes Shanghai Jiao Tong University.
Great Theoretical Ideas in Computer Science.
2007/03/26OPLAB, NTUIM1 A Proactive Tree Recovery Mechanism for Resilient Overlay Network Networking, IEEE/ACM Transactions on Volume 15, Issue 1, Feb.
Ahmed Osama Research Assistant. Presentation Outline Winc- Nile University- Privacy Preserving Over Network Coding 2  Introduction  Network coding 
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Tufts Wireless Laboratory School Of Engineering Tufts University Paper Review “An Energy Efficient Multipath Routing Protocol for Wireless Sensor Networks”,
Exact Regenerating Codes on Hierarchical Codes Ernst Biersack Eurecom France Joint work and Zhen Huang.
20/10/ Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu Institute of Network Coding Please.
A Fast Repair Code Based on Regular Graphs for Distributed Storage Systems Yan Wang, East China Jiao Tong University Xin Wang, Fudan University 1 12/11/2013.
Coding and Algorithms for Memories Lecture 13 1.
Sequential Soft Decision Decoding of Reed Solomon Codes Hari Palaiyanur Cornell University Prof. John Komo Clemson University 2003 SURE Program.
Secret Sharing in Distributed Storage Systems Illinois Institute of Technology Nexus of Information and Computation Theories Paris, Feb 2016 Salim El Rouayheb.
RS – Reed Solomon Error correcting code. Error-correcting codes are clever ways of representing data so that one can recover the original information.
A Tale of Two Erasure Codes in HDFS
Double Regenerating Codes for Hierarchical Data Centers
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
Computing and Compressive Sensing in Wireless Sensor Networks
Presented by Haoran Wang
Section 7 Erasure Coding Overview
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Feng Zhang †⋄, Jidong Zhai ⋄, Xipeng Shen #, Onur Mutlu ⋆, Wenguang.
Dr. Zhijie Huang and Prof. Hong Jiang University of Texas at Arlington
Towards an Archival Intermemory
Presentation transcript:

Self-repairing Homomorphic Codes for Distributed Storage Systems [1] Tao He Software Engineering Laboratory Department of Computer Science, Sun Yat-Sen University With thanks to Frederique Oggier, Anwitaman Datta Week 17 (Jun 15) of Advanced Topics on Computer Networking, Spring, 2011 Sun Yat-Sen University, Guangzhou, China [1] “Self-repairing Homomorphic Codes for Distributed Storage Systems, Frederique Oggier (Nanyang Technological University, Singapore); Anwitaman Datta (NTU, Singapore) IEEE INFOCOM 2011, Shanghai, China, April 10-15, 2010.

Outline  Background  Related work  Motivation  Evaluation  Reed-Solomon Codes  Self-repairing  Evaluation Details  Discussion

Background

Networked storage systems  Gain prominence in recent years. Decentralized peer-to-peer storage systems Dedicated infrastructure based data-centers Storage area networks

Redundancy in networked storage systems  Why need redundancy? Storage node failures User attrition in a peer-to-peer system  Solutions Replication Erasure coding techniques

What is erasure codes?  Erasure codes provide a storage efficient alternative to replication based redundancy in (networked) storage systems.

What is erasure codes? (cont’)

Related work

 Regenerating codes (RGC) [2] First reconstruct the whole object Use classical erasure codes as a black box  Hierarchical codes(HC) [3] Encode two bits into three by XOR operation Any two encoded bits can recover the third one [2] A. G. Dimakis, P. Brighten Godfrey, M. J.Wainwright, K. Ramchandran, “The Benefits of Network Coding for Peer-to-Peer Storage Systems”, Workshop on Network Coding, Theory, and Applications (Netcod), [3] A. G. Dimakis, P. Brighten Godfrey, Y. Wu, M. O. Wainwright, K. Ramchandran, “Network Coding for distributed Storage Systems”, available online at

Weakness of related work  RGCs communicate a lot of nodes RGCs need to communicate with at least k other nodes to recreate any fragment, and the minimal overhead is achieved if only one fragment is missing, and information is downloaded from all other n-1 nodes.  HCs do not have symmetric roles Replenish a specific fragment in HCs depends on which specific fragments are missing, and not solely on how many.

Motivation

 Minimize the number of nodes necessary to reduce the reconstruction of a missing block  Translate into lower bandwidth consumption  Implement faster and parallel replenishment

The authors’ approach  Self-repairing codes(SRC) -- A Reed-Solomon Codes-like approach Encoded fragments can be repaired directly from other subsets of encoded fragments without having to reconstruct first the original data. A fragment is repaired from a fixed number of encoded fragments  Depending only on how many encoded blocks are missing  Independent of which specific blocks are missing

Evaluation

 Static resilience of SRCs with respect to traditional erasure codes, and observe that SRCs incur marginally larger storage overhead.  Advantages Low communication overheads for reconstruction Lower latency by facilitating repairs in parallel

Reed-Solomon Codes [4] [4] I. S. Reed and G. Solomon, “Polynomial Codes Over Certain Finite Fields”, Journal of the Society for Industrial and Appl. Mathematics, no 2, vol. 8, SIAM, 1960.

Reed-Solomon Codes

Reed-Solomon Codes (cont’) Generation Encoding  Key idea: Utilize equations to keep redundancy

Reed-Solomon Codes – An example Division Encoding Generation

Reed-Solomon Codes – An example Division Encoding Generation

Reed-Solomon Codes – An example Division Encoding Generation

Reed-Solomon Codes – An example Division Encoding Generation

Reed-Solomon Codes – An example  To decode, based on the linear algebra, we can solve linear equations:  By reconstructing the values of a and b  We get

Self-repairing

 Now for n = 7, and say, 1,w 1,w 2,w 4,w 5,w 8, w 10, after encoding, we get: (p(1), p(w), p(w 2 ), p(w 4 ), p(w 5 ), p(w 8 ), p(w 10 )), storing in each node. ( w is a root of an irreducible monic polynomial of degree m over F 2 )

Self-repairing (cont’)  Suppose node 5 which stores p(w 5 ) goes offline.  A new comer can get p(w 5 ) by asking for p(w 2 ) and p(w), since

Self-repairing (cont’)  Table I shows other examples of missing fragments and which pairs can reconstruct them, depending on if 1, 2, or 3 fragments are missing at the same time.

Evaluation Details

Static Resilience Analysis  Static resilience of a distributed storage system is defined as the probability that an object, once stored in the system, will continue to stay available without any further maintenance, even when a certain fraction of individual member nodes of the distributed system become unavailable.

 Recall that using the above coding strategy, an object o of length M is decomposed into k fragments of length M/k:  which are further encoded into n fragments of same length:  We thus have n nodes each possessing a binary vector of length M/k, which can be represented as an n × M/k binary matrix: A network matrix representation

Evaluation Metrics  R(x, d, r) as the number of x × d sub-matrices with rank r, voluntarily including all the possible permutations of the rows in the counting.  ρ(x, d, r) be the fraction of sub-matrices of dimension x×d with rank r out of all possible sub-matrices of the same dimension. Then

Evaluation Metrics(cont’)  Using an HSRC(n, k), the probability pobj of recovering the object is  If we use a (n, k) erasure code, then the probability that the object is recoverable is:

Experiment results

Experiment results(cont’)

Discussion

Bandwidth usage  Figure 2 shows the average amount of network traffic to transfer encoded fragments per lost fragment when the various lazy variants of repair are used, namely parallel and sequential repairs with SRC, and (by default, sequential) repair when using EC.

Repairs of in parallel  A final advantage of SRC which we further showcase next is the possibility to carry out repairs of different fragments independently and in parallel (and hence, quickly)  SRC allows for fast reconstruction of missing blocks. Orchestration of such distributed reconstruction to fully utilize this potential in itself poses interesting algorithmic and systems research challenges

Conclusion  The authors propose a new family of codes, called self-repairing codes The characteristics of distributed networked storage systems Low-bandwidth consumption for repairs Parallel and independent replenishment of lost redundancy

Q & A

Thank you! Contact me via