A Simulation Analysis of Reliability in Primary Storage Deduplication

Slides:



Advertisements
Similar presentations
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
Advertisements

Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
CS 6560: Operating Systems Design
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
CSCE430/830 Computer Architecture
Henry C. H. Chen and Patrick P. C. Lee
File Systems.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Parity Declustering for Continous Operation in Redundant Disk Arrays Mark Holland, Garth A. Gibson.
1 Toward I/O-Efficient Protection Against Silent Data Corruptions in RAID Arrays Mingqiang Li and Patrick P. C. Lee The Chinese University of Hong Kong.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.
1 Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud Chun-Ho Ng, Mingcao Ma, Tsz-Yeung Wong, Patrick P. C. Lee, John C. S. Lui.
6/5/ TRAP-Array: A Disk Array Architecture Providing Timely Recovery to Any Point-in-time Authors: Qing Yang,Weijun Xiao,Jin Ren University of Rhode.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.
Multi-level Selective Deduplication for VM Snapshots in Cloud Storage Wei Zhang*, Hong Tang †, Hao Jiang †, Tao Yang*, Xiaogang Li †, Yue Zeng † * University.
DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.
Data Deduplication in Virtualized Environments Marc Crespi, ExaGrid Systems
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
UC Santa Cruz Providing High Reliability in a Minimum Redundancy Archival Storage System Deepavali Bhagwat Kristal Pollack Darrell D. E. Long Ethan L.
1 Convergent Dispersal: Toward Storage-Efficient Security in a Cloud-of-Clouds Mingqiang Li 1, Chuan Qin 1, Patrick P. C. Lee 1, Jin Li 2 1 The Chinese.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
1 CloudVS: Enabling Version Control for Virtual Machines in an Open- Source Cloud under Commodity Settings Chung-Pan Tang, Tsz-Yeung Wong, Patrick P. C.
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng, Patrick P. C. Lee The Chinese University of Hong Kong.
Three-Dimensional Redundancy Codes for Archival Storage J.-F. Pâris, U. of Houston D. D. E. Long, U. C. Santa Cruz W. Litwin, U. Paris-Dauphine.
Resilience at Scale: The importance of real world data Bianca Schroeder Computer Science Department University of Toronto.
RAID Systems Ver.2.0 Jan 09, 2005 Syam. RAID Primer Redundant Array of Inexpensive Disks random, real-time, redundant, array, assembly, interconnected,
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.
Deploying disk deduplication for Hyper-v 3.0 Žigmund Maťašovský.
RAID.
Storage HDD, SSD and RAID.
A Tale of Two Erasure Codes in HDFS
OPERATING SYSTEMS CS 3502 Fall 2017
Double Regenerating Codes for Hierarchical Data Centers
Information Leakage in Encrypted Deduplication via Frequency Analysis
Transactions and Reliability
Hardware & Software Reliability
A Simulation Analysis of Reliability in Erasure-coded Data Centers
Section 7 Erasure Coding Overview
Zhirong Shen+, Patrick Lee+, Jiwu Shu$, and Wenzhong Guo*
Storage Virtualization
Main Memory Database Systems
Disks.
HashKV: Enabling Efficient Updates in KV Storage via Hashing
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Software Metrics “How do we measure the software?”
by Xiang Mao and Qin Chen
Department of Electrical Engineering
Specialized Cloud Architectures
Memory management Explain how memory is managed in a typical modern computer system (virtual memory, paging and segmentation should be described.
RAID RAID Mukesh N Tekwani April 23, 2019
Jingwei Li*, Patrick P. C. Lee #, Yanjing Ren*, and Xiaosong Zhang*
Fan Ni Xing Lin Song Jiang
Seminar on Enterprise Software
Presentation transcript:

A Simulation Analysis of Reliability in Primary Storage Deduplication Min Fu1, Patrick P. C. Lee2, Dan Feng1, Zuoning Chen3, Yu Xiao1 1Huazhong University of Science and Technology 2The Chinese University of Hong Kong 3National Engineering Research Center for Parallel Computer, China

Storage Deduplication Modern storage systems adopt deduplication for storage and bandwidth savings backup and archival storage [Zhu, FAST'08] primary storage (e.g., file system) [Srinivasan, FAST'12] virtualization storage [Clements, USENIX ATC'09] cloud storage [Li, USENIX ATC'15]

Deduplication Basics Deduplication removes content redundancy to improve storage efficiency Divides data into variable-size chunks Chunking is done by Rabin Fingerprinting Identifies each chunk with a fingerprint A fingerprint can be a cryptographic hash Fingerprints are stored in a key-value index Stores one chunk copy with same fingerprint; other identical chunks refer to the copy by pointers A file recipe lists all pointers to all chunks of a file

Deduplication Basics Storage savings are huge Rough estimates of deduplication ratio (i.e., raw storage to actual storage) Workloads Dedup ratio Backup workload 20:1 VM images 50:1 Primary workloads 2:1

How about Reliability? Reliability is critical in storage systems, e.g., Characterizing reliability issues of HDDs and SSDs [Schroeder, FAST'07 and '16] Improving reliability by redundancy [Patterson, SIGMOD'88] and monitoring [Ma, FAST'16] How deduplication changes reliability? Only few studies in the field; no discussion on: Loss variations: impact of data loss depends on granularities (e.g., chunks/files) Repair strategies: reliability depends on when important data copies are repaired first

Open Questions Does deduplication improve reliability? Deduplication reduces data loss probability as it reduces storage footprints Does deduplication degrade reliability? Each data loss may imply loss of multiple chunks/files Accuracy of MTTDL is debatable, and is unexplored in deduplication storage

Our Contributions A new simulation framework for deduplication storage reliability, including New reliability metrics from NOMDL [Greenan, HotStorage’10] Disk model from field data Deduplication model from real file system snapshots Comprehensive comparison of storage system reliability with and without deduplication A Deliberate Copy Technique (DCT) to improve reliability of deduplication storage

Our Simulation Framework Extended from Greenan’s simulator: Disk model: generates disk events, e.g., whole-disk failure and disk repair RAID state machine: monitors data loss events, e.g., uncorrectable sector errors and unrecoverable disk failures Deduplication model: calculates magnitude of data loss in deduplication storage and outputs reliability metrics

Our Metrics NOMDL [Greenan, HotStorage'10] Expected amount of data loss per usable terabyte Extend NOMDL for deduplication storage Logical amount of data loss normalized to logical system capacity 4 reliability metrics: Chunk-level and file-level Non-weighted (in number of chunks/files) and weighted (in amounts of data)

Disk Model Disk errors: Whole-disk failures: RAID repairs the failed disk using operational disks Latent Sector Errors (LSEs): RAID uses disk scrubbing to periodically correct LSEs Disk model parameters are derived from near-line 1 TB SATA Disk A model [Elerath, ToS 2014]

Data Loss Events Two types of data loss events: Unrecoverable Disk Failures (UDF) e.g., two simultaneous whole-disk failures in RAID-5 Uncorrectable Sector Errors (USE) e.g., a single whole-disk failure paired with LSEs in other operational disks in RAID-5 Physical data loss depends on repair progress e.g., if the 2nd whole-disk failure in RAID-5 occurs after the 1st failed disk is repaired by 40%, we lose 60% of data in physical view

Deduplication Model Goal: calculates logical amount of data loss for each data loss event A USE corrupts a physical sector and the associated physical chunk Logical amount of data loss depends on the chunk's reference count and the files referencing it A UDF assumes a log-structured chunk layout, and two repair strategies: forward and backward We expect forward repair has better reliability than backward repair

Deliberate Copy Technique Our insight is that highly referenced chunks only account for small fraction of physical capacity Chunk reference counts show long-tailed distribution Deliberate Copy Technique (DCT) improves reliability of deduplication storage: Allocate a small, dedicated physical area in RAID for storing extra copies of highly referenced chunks Repair first the physical area during RAID reconstruction Can be done offline; no impact on write path

Experimental Setup RAID-6 with 16 1TB SATA disks 10-year mission time 1025 billion iterations of simulations to obtain enough data loss events for small relative errors 1,389,250 UDFs and 332,993,652 USEs 18 deduplication models derived from 18 real file system snapshots 9 from Stony Brook University and 9 from our lab DCT is configured with 1% physical capacity

Chunk-level USE Observation (1) -- Deduplication will not significantly alter the expected amount of corrupted chunks by USEs when compared to without deduplication

File-level USE Observation (2) -- In the presence of individual chunk corruptions caused by USEs, deduplication decreases the expected amounts of corrupted files, mainly due to the intra-file redundancy found in individual snapshots Intra-file redundancy is more common than inter-file redundancy

Logical Repair Progress Forward repair Observation (3) -- The logical repair progress is affected by the placement of highly referenced chunks and the severity of chunk fragmentation Backward repair The Y-axis shows the relative logical repair progress to without deduplication. Negative values indicates slower rate in restore. The backward repair restores files in significantly slower rate

Chunk-level UDF Observation (4) -- If we do not carefully place highly referenced chunks and repair them preferentially, deduplication can lead to more corrupted chunks in the presence of UDFs. Observation (5) -- DCT significantly reduces the expected amounts of corrupted chunks by UDFs. As expected, the backward repair shows significantly worse reliability than without deduplication. With DCT, both forward and backward repair achieves better or comparable reliability than without deduplication

File-level UDF Observation (6) -- Deduplication is significantly more vulnerable to UDFs in terms of the file-level metrics if popular small files and chunk fragmentation are not carefully handled. Observation (7) -- DCT reduces the expected amounts of corrupted files remarkably, but a defragmentation algorithm is necessary to further improve reliability in the weighted file-level metric The above figure shows the file-level not-weighted metric, and the bottom shows the file-level weighted metric. the backward repair is worse than the forward repair, because popular small files are more possibly in the log beginning and the log end is much more fragmented. In not-weighted metric, DCT is very useful because it achieves better or comparable reliability than without deduplication. Although DCT improves weighted metric, a defragmentation algorithm is still required.

Conclusions A new simulation framework for quantitative reliability analysis Characterizes impacts of loss variance and repair strategies A DCT technique to improve reliability in deduplication storage Source code: https://github.com/fomy/simd