Henry C. H. Chen and Patrick P. C. Lee

Slides:



Advertisements
Similar presentations
PROOFS OF RETRIEVABILITY VIA HARDNESS AMPLIFICATION Yevgeniy Dodis, Salil Vadhan and Daniel Wichs.
Advertisements

1 On the Speedup of Single-Disk Failure Recovery in XOR-Coded Storage Systems: Theory and Practice Yunfeng Zhu 1, Patrick P. C. Lee 2, Yuchong Hu 2, Liping.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Secure Data Storage in Cloud Computing Submitted by A.Senthil Kumar( ) C.Karthik( ) H.Sheik mohideen( ) S.Lakshmi rajan( )
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
current hadoop architecture
Alex Dimakis based on collaborations with Dimitris Papailiopoulos Arash Saber Tehrani USC Network Coding for Distributed Storage.
CSCE430/830 Computer Architecture
1 NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System Yuchong Hu 1, Chiu-Man Yu 2, Yan-Kit Li 2 Patrick P. C.
HAIL (High-Availability and Integrity Layer) for Cloud Storage
BASIC Regenerating Codes for Distributed Storage Systems Kenneth Shum (Joint work with Minghua Chen, Hanxu Hou and Hui Li)
Data Integrity Proofs in Cloud Storage Sravan Kumar R, Ashutosh Saxena Communication Systems and Networks (COMSNETS), 2011 Third International Conference.
Yuchong Hu1, Henry C. H. Chen1, Patrick P. C. Lee1, Yang Tang2
Abstract HyFS: A Highly Available Distributed File System Jianqiang Luo, Mochan Shrestha, Lihao Xu Department of Computer Science, Wayne State University.
1 STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures in Practical Storage Systems Mingqiang Li and Patrick P. C.
PORs: Proofs of Retrievability for Large Files
1 Toward I/O-Efficient Protection Against Silent Data Corruptions in RAID Arrays Mingqiang Li and Patrick P. C. Lee The Chinese University of Hong Kong.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 10 09/15/2011 Security and Privacy in Cloud Computing.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Beyond the MDS Bound in Distributed Cloud Storage
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.
An Integrated Framework for Dependable Revivable Architectures Using Multi-core Processors Weiding Shi, Hsien-Hsin S. Lee, Laura Falk, and Mrinmoy Ghosh.
HAIL (High-Availability and Integrity Layer) for Cloud Storage Kevin Bowers and Alina Oprea RSA Laboratories Joint work with Ari Juels.
Efficient Proactive Security for Sensitive Data Storage Arun Subbiah Douglas M. Blough School of ECE, Georgia Tech {arun,
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds
Network Coding for Distributed Storage Systems IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu.
Network Coding Distributed Storage Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong 1.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.
1 Convergent Dispersal: Toward Storage-Efficient Security in a Cloud-of-Clouds Mingqiang Li 1, Chuan Qin 1, Patrick P. C. Lee 1, Jin Li 2 1 The Chinese.
1 The Google File System Reporter: You-Wei Zhang.
RAID Shuli Han COSC 573 Presentation.
1 Failure Correction Techniques for Large Disk Array Garth A. Gibson, Lisa Hellerstein et al. University of California at Berkeley.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
Securing Every Bit: Authenticated Broadcast in Wireless Networks Dan Alistarh, Seth Gilbert, Rachid Guerraoui, Zarko Milosevic, and Calvin Newport.
A Survey on Secure Cloud Data Storage ZENG, Xi CAI, Peng
LT Codes-based Secure and ReliableCloud Storage Service
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
1 CloudVS: Enabling Version Control for Virtual Machines in an Open- Source Cloud under Commodity Settings Chung-Pan Tang, Tsz-Yeung Wong, Patrick P. C.
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.
Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei.
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
1 Enabling Efficient and Reliable Transitions from Replication to Erasure Coding for Clustered File Systems Runhui Li, Yuchong Hu, Patrick P. C. Lee The.
Data Integrity Proofs in Cloud Storage Author: Sravan Kumar R and Ashutosh Saxena. Source: The Third International Conference on Communication Systems.
Exact Regenerating Codes on Hierarchical Codes Ernst Biersack Eurecom France Joint work and Zhen Huang.
NC-Audit: Auditing for Network Coding Storage Anh Le and Athina Markopoulou University of California, Irvine.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Database Laboratory Regular Seminar TaeHoon Kim Article.
Cloud Computing Vs RAID Group 21 Fangfei Li John Soh Course: CSCI4707.
Seminar On Rain Technology
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
CS Introduction to Operating Systems
A Tale of Two Erasure Codes in HDFS
Rekeying for Encrypted Deduplication Storage
Double Regenerating Codes for Hierarchical Data Centers
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
Repair Pipelining for Erasure-Coded Storage
Presented by Haoran Wang
Section 7 Erasure Coding Overview
RAID RAID Mukesh N Tekwani
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
TECHNICAL SEMINAR PRESENTATION
UNIT IV RAID.
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

Henry C. H. Chen and Patrick P. C. Lee Enabling Data Integrity Protection in Regenerating-Coding-Based Cloud Storage Henry C. H. Chen and Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong SRDS’12

Cloud Storage Cloud storage is an emerging service model for remote backup and data synchronization Is cloud storage fully reliable?

Problems in Cloud Storage

Cloud Storage Requirements Data integrity protection Detect any corrupted data chunks stored on cloud servers Fault tolerance Tolerate any cloud server failures Efficient recovery Recover any lost/corrupted data chunks with minimal overhead

Related Work Single server Multiple servers Provable Data Possession (PDP) [Ateniese et al. ’07] and Proof of Retrievability (POR) [Juels et al. ’07] Verify data integrity by spot-checking a fraction of large files via cryptographic means Multiple servers MR-PDP [Curtmola et al. ’08] Target replication HAIL [Bowers et al. ’09] Use erasure codes (e.g., Reed-Solomon codes) to provide less storage overhead than replication

Regenerating Codes Regenerating codes [Dimakis et al., 10] minimize repair traffic while maintaining same fault tolerance as erasure codes Implication: recover faster than erasure codes Idea: Do not reconstruct the whole file as in erasure codes Instead, read only the chunks (smaller than whole file) that are needed to recover the lost chunks Build on network coding NCCloud [Hu et al., ’12] provides an implementable design of regenerating codes Functional minimum storage regenerating (FMSR) codes Designed for long-term archival storage

Reed-Solomon Codes Conventional repair: Proxy Repair traffic = M File of size M A Server 1 A B Proxy Server 2 B B Server 3 A+B A A A+B Server 4 A+2B n = 4, k = 2 (n,k) MDS property: any k out of n servers can rebuild original file. Conventional repair: Reconstruct whole file and generate data in new server

[Hu et al., FAST’12] FMSR in NCCloud FMSR codes Repair traffic = 0.75M File of size M A Server 1 P1 B P2 C D Proxy Server 2 P3 P4 P5 P3 Server 3 P6 P1’ P1’ P5 P2’ P2’ Server 4 P7 P7 P8 n = 4, k = 2 Code chunk Pi = linear combination of original native chunks Repair in FMSR: Download one code chunk from each surviving server Reconstruct new code chunks via random linear combination

Encoding matrix rank = k(n-k) FMSR Property Servers (clouds) Proxy n(n-k) chunks P1 P1 k(n-k) chunks P2 P2 File A P3 P3 partition B encode P4 distribute P4 C P5 P5 D P6 P6 P7 P8 P7 P8 n=4, k=2 c1,1 c1,2 c1,3 c1,4 . c8,1 c8,2 c8,3 c8,4 P1 A P2 P3 B P4 P5 C P6 P7 D P8 Encoding matrix rank = k(n-k) Native chunks Code chunks

FMSR Codes FMSR codes Can achieve up to 50% repair traffic saving for general k = n-2 (i.e., RAID-6 tolerance) It is functional, i.e., fault tolerance is preserved It is suitable to long-term archival storage whose read frequency is rare Can we enable integrity protection atop regenerating codes, while preserving repair traffic saving?

Our Contributions Build a FMSR-DIP code, which enables data integrity protection, fault tolerance, and efficient recovery Export tunable parameters from FMSR-DIP to trade performance and security Implement and evaluate FMSR-DIP atop a real cloud storage testbed

FMSR-DIP Goals Threat model: Byzantine, mobile adversary [Bowers et al. ’09] exhibits arbitrary behavior corrupts different subsets of servers over time Design goals: Preserve advantages of FMSR codes Work on thin clouds (i.e., only basic PUT/GET operations need to be supported) Support byte sampling to minimize cost

FMSR-DIP: Overview NCCloud Servers / clouds FMSR code chunks FMSR-DIP code chunks file upload download NCCloud FMSR-DIP Storage interface Users Four operations: Upload, Check, Download and Repair

8 FMSR code chunks, 3 bytes each FMSR-DIP: Upload 8 FMSR code chunks, 3 bytes each

Apply error-correcting code (ECC) to each chunk individually FMSR-DIP: Upload Apply error-correcting code (ECC) to each chunk individually

XOR each byte with a pseudorandom value FMSR-DIP: Upload XOR each byte with a pseudorandom value

For each chunk, calculate the MAC of the first 3 bytes FMSR-DIP: Upload For each chunk, calculate the MAC of the first 3 bytes

FMSR-DIP: Upload Upload the chunks to clouds Encrypt the metadata from NCCloud (which contains the encoding matrix) Append all MACs to metadata Replicate metadata on all servers

FMSR-DIP: Check Pick a row to check

XOR with the previous pseudorandom values, and check their consistency FMSR-DIP: Check XOR with the previous pseudorandom values, and check their consistency

FMSR-DIP: Check Recall that from FMSR encoding: A x = P A = encoding matrix with rank = k(n-k) x = row of native chunks P = row of code chunks P is a linear combination of x Rank checking: Check if rank(A | P) = rank(A) = k(n-k)

Download chunks from any 2 servers and verify with their MACs FMSR-DIP: Download Download chunks from any 2 servers and verify with their MACs

Remove pseudorandom values and pass to NCCloud for decoding FMSR-DIP: Download Remove pseudorandom values and pass to NCCloud for decoding

FMSR-DIP: Repair

Download 1 chunk from all other servers and verify with their MACs FMSR-DIP: Repair Download 1 chunk from all other servers and verify with their MACs

Remove pseudorandom values and pass to NCCloud FMSR-DIP: Repair Remove pseudorandom values and pass to NCCloud

NCCloud generates new chunks FMSR-DIP: Repair NCCloud generates new chunks

Process the newly generated chunks as before FMSR-DIP: Repair Process the newly generated chunks as before

Upload chunks and update metadata on all servers FMSR-DIP: Repair Upload chunks and update metadata on all servers

FMSR-DIP Optimizations By default, FMSR-DIP operates in units of bytes FMSR-DIP can also operate in blocks A block is a sequence of bytes Better checking performance, but less security We export different trade-off parameters that tune between performance and security We conduct preliminary security analysis on FMSR-DIP See details in paper and technical report

FMSR-DIP: Experiments Testbed environment Openstack Swift 1.4.2 1 proxy connected to storage servers over LAN NCCloud and FMSR-DIP deployed on proxy NCCloud uses RAMDisk as storage Storage scheme (4,2)-FMSR Goal: evaluate FMSR-DIP overhead over FMSR codes

Running Time vs. File Size UPLOAD FMSR-DIP overhead comparable to network transfer time in a LAN environment File size DOWNLOAD File size REPAIR File size

The Check Operation Bottleneck in network transfer 1% check Download block size 256KB download block size Checking percentage

Conclusions Propose a design for efficient data integrity protection using FMSR on thin clouds Implement and evaluate the efficiency of the design Source code: http://ansrlab.cse.cuhk.edu.hk/software/fmsrdip/

Backup

Encoding matrix rank = k(n-k) Recall: FMSR Encoding P1 P2 P3 P4 P5 P6 P7 P8 A B C D c1,1 c1,2 c1,3 c1,4 c2,1 c2,2 c2,3 c2,4 c3,1 c3,2 c3,3 c3,4 c4,1 c4,2 c4,3 c4,4 c5,1 c5,2 c5,3 c5,4 c6,1 c6,2 c6,3 c6,4 c7,1 c7,2 c7,3 c7,4 c8,1 c8,2 c8,3 c8,4 Encoding matrix rank = k(n-k) Native chunks Code chunks