HAIL (High-Availability and Integrity Layer) for Cloud Storage

Slides:



Advertisements
Similar presentations
PROOFS OF RETRIEVABILITY VIA HARDNESS AMPLIFICATION Yevgeniy Dodis, Salil Vadhan and Daniel Wichs.
Advertisements

RAID Redundant Arrays of Independent Disks Courtesy of Satya, Fall 99.
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
What is RAID Redundant Array of Independent Disks.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB 265.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID Redundant Array of Independent Disks
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
CSCE430/830 Computer Architecture
Henry C. H. Chen and Patrick P. C. Lee
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
PORs: Proofs of Retrievability for Large Files
WHAT IS RAID? Christopher J Dutra Seton Hall University.
Availability in Globally Distributed Storage Systems
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 10 09/15/2011 Security and Privacy in Cloud Computing.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
RAID Technology CS350 Computer Organization Section 2 Larkin Young Rob Deaderick Amos Painter Josh Ellis.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
Efficient Data Dissemination and Survivable Data Storage Lihao Xu
HAIL (High-Availability and Integrity Layer) for Cloud Storage Kevin Bowers and Alina Oprea RSA Laboratories Joint work with Ari Juels.
Copyright © Clifford Neuman - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE CS582: Distributed Systems Lecture 15 - October.
File System Security Jason Eick and Evan Nelson. What does a file system do? A file system is a method for storing and organizing computer files and the.
Writing on Wind and Water*: Storage Security in the Cloud Ari Juels Chief Scientist RSA © 2011 RSA Laboratories Workshop on Cryptography and Security in.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Failures in the System  Two major components in a Node Applications System.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
Network Coding Distributed Storage Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong 1.
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
©2001 Pål HalvorsenINFOCOM 2001, Anchorage, April 2001 Integrated Error Management in MoD Services Pål Halvorsen, Thomas Plagemann, and Vera Goebel University.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
CSI-09 COMMUNICATION TECHNOLOGY FAULT TOLERANCE AUTHOR: V.V. SUBRAHMANYAM.
RAID REDUNDANT ARRAY OF INEXPENSIVE DISKS. Why RAID?
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
A Multimedia Presentation by Louis Balzani. o Source of extreme power o High elasticity o Large data centers generate 5-7x savings.
RAID Disk Arrays Hank Levy. 212/5/2015 Basic Problems Disks are improving, but much less fast than CPUs We can use multiple disks for improving performance.
Data Integrity Proofs in Cloud Storage Author: Sravan Kumar R and Ashutosh Saxena. Source: The Third International Conference on Communication Systems.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Data architecture challenges for CERN and the High Energy.
Ari Juels, Burton S. Kaliski Jr 14th ACM conference on Computer and communications security,2007 Cited:793 Presenter: 張哲豪 Date:2014/11/24.
Database Laboratory Regular Seminar TaeHoon Kim Article.
Seminar On Rain Technology
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
CSE 451: Operating Systems Spring 2010 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center 534.
CS Introduction to Operating Systems
A Tale of Two Erasure Codes in HDFS
RAID Redundant Arrays of Independent Disks
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
CSE 451: Operating Systems Spring 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center.
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
Data Orgnization Frequently accessed data on the same storage device?
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
RAID Redundant Array of Inexpensive (Independent) Disks
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID RAID Mukesh N Tekwani April 23, 2019
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
Presentation transcript:

HAIL (High-Availability and Integrity Layer) for Cloud Storage Alina Oprea Joint with Kevin Bowers and Ari Juels RSA Laboratories

Cloud storage Cloud Storage Provider Client Mostly static data: Storage server Web server Mostly static data: Back-up Archival Is my data available ? Client

Proofs of Retrievability (PORs) Cloud Storage Provider Corrects small corruption F Encoding Client k

Proofs of Retrievability (PORs) Cloud Storage Provider F F Challenge Response Requires integrity checks on server or client Detects large corruption Client k

When PORs fail F F Unrecoverable Cloud Storage Provider decoder Challenge Response Unrecoverable Client k

HAIL Goals Resilience against cloud provider failure or temporary unavailability Amazon S3 went down several times, once for 8 hours Linkup lost 45% of its customer data Use multiple cloud providers to construct a reliable cloud storage service out of unreliable components RAID (Reliable Array of Inexpensive Disks) for cloud storage Provide clients verification capabilities Efficient proofs of file availability by interacting with cloud providers

Replicate across multiple providers Amazon S3 Google EMC Atmos F F F Naïve approach F Sample and check consistency across providers Client

Roadmap Adversarial model for HAIL Small-corruption attack on replication scheme Encoding layer for each replica individually Reduce storage overhead by dispersal Increasing file lifetime with secret keys

Adversarial model Static: corrupts a fixed number b of the n total providers over time Create enough redundancy in the file to handle this (b+1 replicas) Is this realistic? Mobile (proactive): corrupts b out of n providers in each epoch Separate each server into code base and storage base At the beginning of an epoch code base of all servers is cleaned (through reboot, for instance) All servers might have residual data corruption Reactive design: check integrity and redistribute

Attack on replication scheme Amazon S3 Google EMC Atmos F F F F F F File can not be recovered after [n/b] epochs The probability that client samples the corrupted block is low Client

Replication with POR F F F F Amazon S3 Google EMC Atmos Client POR POR ECC Cons: requires integrity checks for each replica Client

Replication with POR F F F F Amazon S3 Google EMC Atmos Client Sample and check consistency across providers Client

Replication with POR F F F F Amazon S3 Google EMC Atmos Client єd єd єd Large storage overhead due to replication File lifetime still limited by [n/b] (єc/ єd) єc correction threshold of POR encoding єd detection threshold of POR Sample and check consistency across providers Client

Reduce storage overhead F decode m fragments n fragments dispersal (n,m) F Client

Dispersal code parity blocks (n,m) F F Dispersal code parity blocks Client

Dispersal code parity blocks Stripe POR encoding F Dispersal code parity blocks How to increase file lifetime? Check that stripe is a codeword in dispersal code POR encoding to correct small corruption Client

Increasing file lifetime with MACs P1 P2 P3 P4 P5 MAC MAC MAC MAC MAC Can we reduce storage overhead? Client

Integrity-protected dispersal code m hk1(m) UHF hk2(m) + PRF Reed-Solomon dispersal code Client

Integrity-protected dispersal code m + PRF MACs embedded into parity symbols Client

Current work and open problems Proofs of Retrievability Lower bounds akin to Naor and Rothblum’s lower bounds for memory checking What is the cost of file updates? HAIL K. Bowers, A. Juels and A. Oprea – “HAIL (High-Availability and Integrity Layer) for Cloud Storage”, CCS 2009 Different adversarial models Investigate alternative constructions Supporting file updates

Proofs of Retrievability (PORs) Cloud Storage Provider F F A Challenge Response Requires integrity checks on server or client Detects large corruption Client k

POR requirements F F Cloud Service Provider Client Efficient file encoding Low storage overhead Low bandwidth for challenge and response Efficient proof construction and verification Efficient file recoverability [Juels, Kaliski 07] [Shacham, Waters 08] [Dodis et al. 09] The requirements in designing a POR protocol are: That the file encoding is efficient There is low storage overhead on both client and server The ch/res protocol is efficient in both computation and bandwidth It is also efficient to recover the file from a fraction of correct server responses. There are a number of POR constructions: JK 07, SW08, Dodis et al. 09 that offer different tradeoffs in these metrics. I will not get into details here, I just wanted to point out that the POR problem has been studied a lot and there are good solutions for it. But do PORs solve our problem? Client k

Reed-Solomon parity blocks HAIL P1 P2 P3 P4 P5 F Reed-Solomon parity blocks POR encoding protects against small corruption Protects static files availability against mobile adversary Client

HAIL P1 P2 P3 P4 P5 Aggregates stripes for efficient integrity checking Periodic checking and reconstruction upon failure MACs embedded into parity symbols Client