236601 - Coding and Algorithms for Memories Lecture 13 1.

Slides:



Advertisements
Similar presentations
Distributed Storage Systems Vinodh Venkatesan IBM Zurich Research Lab / EPFL.
Advertisements

RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Availability in Globally Distributed Storage Systems
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
current hadoop architecture
Alex Dimakis based on collaborations with Dimitris Papailiopoulos Arash Saber Tehrani USC Network Coding for Distributed Storage.
CSCE430/830 Computer Architecture
Henry C. H. Chen and Patrick P. C. Lee
1 NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System Yuchong Hu 1, Chiu-Man Yu 2, Yan-Kit Li 2 Patrick P. C.
BASIC Regenerating Codes for Distributed Storage Systems Kenneth Shum (Joint work with Minghua Chen, Hanxu Hou and Hui Li)
Self-repairing Homomorphic Codes for Distributed Storage Systems [1] Tao He Software Engineering Laboratory Department of Computer Science,
Coding and Algorithms for Memories Lecture 12 1.
Simple Regenerating Codes: Network Coding for Cloud Storage Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li University.
Coding for Modern Distributed Storage Systems: Part 1. Locally Repairable Codes Parikshit Gopalan Windows Azure Storage, Microsoft.
Yuchong Hu1, Henry C. H. Chen1, Patrick P. C. Lee1, Yang Tang2
Abstract HyFS: A Highly Available Distributed File System Jianqiang Luo, Mochan Shrestha, Lihao Xu Department of Computer Science, Wayne State University.
1 STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures in Practical Storage Systems Mingqiang Li and Patrick P. C.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
Availability in Globally Distributed Storage Systems
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
Beyond the MDS Bound in Distributed Cloud Storage
A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,
Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck May 5, 2000 IEEE.
Alex Dimakis based on collaborations with Dimitris Papailiopoulos Viveck Cadambe Kannan Ramchandran USC Tutorial on Distributed Storage Problems and Regenerating.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology April 1, 2004 MEDIA FAILURES Lecture based on [GUW, ]
More Codes Never Enough. 2 EVENODD Code Basics of EVENODD code  each storage node as a single column # of data nodes k = p (prime) # of total nodes n.
Codes with local decoding procedures Sergey Yekhanin Microsoft Research.
Mario Vodisek 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Erasure Codes for Reading and Writing Mario Vodisek ( joint work.
Failures in the System  Two major components in a Node Applications System.
Network Coding for Distributed Storage Systems IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu.
Network Coding Distributed Storage Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong 1.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Introduction to Hadoop and HDFS
Application of Finite Geometry LDPC code on the Internet Data Transport Wu Yuchun Oct 2006 Huawei Hisi Company Ltd.
Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters Runhui Li, Patrick P. C. Lee, Yuchong Hu th Annual IEEE/IFIP International.
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.
Array BP-XOR Codes for Reliable Cloud Storage Systems Yongge Wang UNC Charlotte, USA IEEE ISIT(International Symposium on Information Theory)
Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei.
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
Coding and Algorithms for Memories Lecture 14 1.
The concept of RAID in Databases By Junaid Ali Siddiqui.
20/10/ Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu Institute of Network Coding Please.
A Fast Repair Code Based on Regular Graphs for Distributed Storage Systems Yan Wang, East China Jiao Tong University Xin Wang, Fudan University 1 12/11/2013.
Secret Sharing in Distributed Storage Systems Illinois Institute of Technology Nexus of Information and Computation Theories Paris, Feb 2016 Salim El Rouayheb.
Alex Dimakis based on collaborations with Mahesh Sathiamoorthy Megas Asteris Dimitris Papailiopoulos Kannan Ramchandran Scott Chen Ramkumar Vadali Dhruba.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Seminar On Rain Technology
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
A Tale of Two Erasure Codes in HDFS
rain technology (redundant array of independent nodes)
Double Regenerating Codes for Hierarchical Data Centers
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
Repair Pipelining for Erasure-Coded Storage
Presented by Haoran Wang
Section 7 Erasure Coding Overview
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Raymond Leung and Jack Y.B. Lee Department of Information.
Dineshkumar Bhaskaran Aricent Technologies
RAID RAID Mukesh N Tekwani
The Basics of Apache Hadoop
ICOM 6005 – Database Management Systems Design
Maximally Recoverable Local Reconstruction Codes
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
CSE 451: Operating Systems Autumn 2010 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
On Sequential Locally Repairable Codes
UNIT IV RAID.
CSE 451: Operating Systems Autumn 2009 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

Coding and Algorithms for Memories Lecture 13 1

Large Scale Storage Systems 2 Big Data Players: Facebook, Amazon, Google, Yahoo,… Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) Failures are the norm

Node failures at Facebook 3 Date XORing Elephants: Novel Erasure Codes for Big Data M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, VLDB 2013

Problem Setup Disks are stored together in a group (rack) Disk failures should be supported Requirements: – Support as many disk failures as possible – And yet… Optimal and fast recovery Low complexity 4

Reed Solomon Codes 5

Advantages: – Support the maximum number of disk failures – Are very comment in practice and have relatively efficient encoding/decoding schemes Disadvantages – Require to work over large fields Solution: EvenOdd Codes – Need to read all the disks in order to recover even a single disk failure – not efficient rebuild Solution: ZigZag Codes 6

The Repair Problem P1 P3 P4 P2 A disk is lost – Repair job starts Access, read, and transmit data of disks! Overuse of system resources during single repair Goal: Reduce repair cost in a single disk repair Facebook’s storage Scheme: – 10 data blocks – 4 parity blocks – Can tolerate any four disk failures RS code

ZigZag Codes Designed by Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck The goal: construct codes correcting the max number of erasures and yet allow efficient reconstruction if only a single drive fails 8

ZigZag Codes Lower bound: The min amount of data required to be read to recover a single drive failure – (n,k) code: n drives, k information, and n-k redundancy – M- size of a single drive in bits For (n,n-2) code it is required to read at least 1/2 from the remaining drives, that is at least (1/2)(n-1)M bits – The last example is optimal In general, for (n,n-r) code it required to read at least 1/r from the remaining drives (1/r)(n-1)M 9

ZigZag Codes Example 10 info 1info 2info 3 Row parity ZigZag parity

Network Coding for Distributed Storage Goal – show the following: In general, for (n,n-r) code it required to read at least 1/r from the remaining drives (1/r)(n-1)M Network Coding for Distributed Storage Dimakis, Godfrey, Wu, Wainwright, Ramchandran File of size M is partitioned into k pieces of size M/k The k pieces are encoded into n encoded pieces using an (n,k) MDS code 11

Network Coding for Distributed Storage File of size M is partitioned into k pieces of size M/k The k pieces are encoded into n encoded pieces using an (n,k) MDS code 12 y1y1 y1y1 y2y2 y2y2 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3 x4x4 x4x4

Network Coding for Distributed Storage File of size M is partitioned into k pieces of size M/k The k pieces are encoded into n encoded pieces using an (n,k) MDS code 13 y1y1 y1y1 y2y2 y2y2 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3 x4x4 x4x4 x5x5 x5x5 β=? β β

Network Coding for Distributed Storage File of size M is partitioned into k pieces of size M/k The k pieces are encoded into n encoded pieces using an (n,k) MDS code 14 S S x 1 ou t x 2 ou t x 3 ou t x 4 ou t x 5 in β=? β β x 1 in x 2 in x 3 in x 4 in ∞ ∞ ∞ ∞ α=1 DC x 5 ou t ∞ ∞

ZigZag Codes Example 15 aba+ba+2d cdc+dc+b