DATA DEDUPLICATION By: Lily Contreras April 15, 2010.

Slides:



Advertisements
Similar presentations
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Advertisements

A new standard in Enterprise File Backup. Contents 1.Comparison with current backup methods 2.Introducing Snapshot EFB 3.Snapshot EFB features 4.Organization.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Barracuda Backup Service Data Backup and Disaster Recovery.
Chapter 11: File System Implementation
Barracuda Networks Confidential1 Barracuda Backup Service Integrated Local & Offsite Data Backup.
Chapter 5 Configuring the RMAN Environment. Objectives Show command to see existing settings Configure command to change settings Backing up the controlfile.
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC Data Domain : Data Protection and Deduplication.
DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.
DISCLAIMER: This material is based on work supported by the National Science Foundation and the Department of Defense under grant No. CNS Any.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Data Deduplication in Virtualized Environments Marc Crespi, ExaGrid Systems
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.
Chapter Sixteen Data Recovery and Fault Tolerance.
Understanding the Benefits and Costs of Deduplication Mahmoud Abaza, and Joel Gibson School of Computing and Information Systems, Athabasca University.
Recovery Manager Overview Target Database Recovery Catalog Database Enterprise Manager Recovery Manager (RMAN) Media Options Server Session.
Chapter 7 Making Backups with RMAN. Objectives Explain backup sets and image copies RMAN Backup modes’ Types of files backed up Backup destinations Specifying.
Demystifying Deduplication. Global SMB Event Marketing 2 APPROACH: What is deduplication? Eliminate redundant data Start with the backup environment as.
Introduction Journal Analysis and Optimization Journaling Uses and Benefits Understanding Costs and Implications Ongoing Management and Administration.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
Guide to Computer Forensics and Investigations Fourth Edition
CS246 Data & File Structures Lecture 1 Introduction to File Systems Instructor: Li Ma Office: NBC 126 Phone: (713)
Lesson 3 Data Storage. Objectives Define data storage Identify the difference between short-term and long-term data storage Understand cloud storage and.
Backups CSCI N321 – System and Network Administration Copyright © 2000, 2011 by Scott Orr and the Trustees of Indiana University.
Continuous Backup for Business CrashPlan PRO offers a paradigm of backup that includes a single solution for on-site and off-site backups that is more.
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng, Patrick P. C. Lee The Chinese University of Hong Kong.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Virtual Tape Library
Witold Litwin Université Paris Dauphine Darrell LongUniversity of California Santa Cruz Thomas SchwarzUniversidad Católica del Uruguay Combining Chunk.
Emerging Technologies Understanding Deduplication Kevin Carpenter Account Manager Upstate NY Phil Benincasa System Engineer Upstate NY.
Chapter 5 Index and Clustering
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Luminex Virtual Tape Storage System Brian Sullivan Director of Computer Operations Broward County Public Schools 1.
CHAPTER 3-3: PAGE MAPPING MEMORY MANAGEMENT. VIRTUAL MEMORY Key Idea Disassociate addresses referenced in a running process from addresses available in.
ExaGrid Stress-free backup storage
1 © Copyright 2009 EMC Corporation. All rights reserved. Backup Challenges in VMware Environments.
TechTarget Backup School exagrid.com | 1 Backup School ExaGrid / Commvault Stress-free backup storage.
Chapter 6 Discovering Computers Fundamentals Storage.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
TechTarget Backup School exagrid.com | 1 ExaGrid Stress-free backup storage
CommVault Architecture
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
Deploying disk deduplication for Hyper-v 3.0 Žigmund Maťašovský.
ProStoria DATA-AS-A-SERVICE FOR DEVOPS. Agenda: ProStoria presentation Contact data.
BACKUP AND RESTORE. The main area to be consider when designing a backup strategy Which information should be backed up Which technology should be backed.
It’s September, Do You Know If You Have Securely Backed-Up Your Data?
Basic Guide to Computer Backups Eric Moore Computer Users Group of Greeley September 13, 2008.
File-System Management
Short History of Data Storage
Basic Guide to Computer Backups
Integrating Disk into Backup for Faster Restores
Memory Management Virtual Memory.
Demystifying Deduplication
Agenda Backup Storage Choices Backup Rule
Deduplication in Storage Systems
SAN and NAS.
9/11/2018 4:02 PM BRK2161 Maximize storage efficiency and conquer distributed file access with Windows Server and Azure Files Will Gries, PM Fabian Uhse,
Chapter 5 EnCase Concepts.
2018 Huawei H Real Questions Killtest
LO2: Understand Computer Software
ICOM 6005 – Database Management Systems Design
Lesson 3 Data Storage.
PRESENTER GUIDANCE: These charts provide data points on how IBM BaaS mid-market benefits a client with the ability to utilize a variety of backup software.
Similarity based deduplication
RAID RAID Mukesh N Tekwani April 23, 2019
Quick Tips #1 – Wan accelerator seeding for backup jobs
Presentation transcript:

DATA DEDUPLICATION By: Lily Contreras April 15, 2010

What is data deduplication? Often called intelligent compression or single instance storage. In the deduplication process duplicate data is deleted leaving only one copy of the data to be stored. Data deduplication turns the incoming data into segments, uniquely identifies the data segments, and compares these segments to the data that has already been stored. If the incoming data is new data then it is stored on disk, but if it is a duplicate of what has already been stored then it is not stored again and a reference is created to it. “Only one unique instance of the data is actually retained on storage media, such as disk or tape. Redundant data is replaced with a pointer to the unique data copy.”

What is data deduplication? Data deduplication operates at different levels such as the file, block, and bit level. If a file is updated, only the changed data is saved. For example, if only a few bytes of a document or presentation are changed, only the changed blocks or bytes are saved. The changes will not create an entirely new file. This behavior makes block and bit deduplication far more efficient. Deduplication works by comparing chunks of data to detect duplicates. Each chunk of data is assigned a unique identification calculated by the software, typically using cryptographic hash functions. When a new hash number is created it is compared with the index of other existing hash numbers. If that hash number is already in the index then the data is considered a duplicate and does not need to be stored again. Otherwise the new hash number is added to the index and the new data is stored.

Deduplication Methods In-line deduplication is the most efficient and economic method. Hash calculations are created as the data is entered in real time. If the target device identifies a block that has already been stored then it simply references to the existing block. An advantage that in-line deduplication has over post-process deduplication is that it requires less storage as data is not duplicated. Inline deduplication significantly reduces the raw disk capacity needed in the system since the full, not-yet-deduplicated data set is never written to disk. “It optimizes time-to-DR (disaster recovery) far beyond all other methods since it does not need to wait to absorb the entire data set and then deduplicate it before it can begin replicating to the remote site.” However, “because hash calculations and lookups takes so long, it can mean that the data ingestion can be slower thereby reducing the backup throughput of the device.” Post-process deduplication first stores new data on the storage device which is later analyzed for deduplication. One of its advantages is that it does not need to wait for hash calculations and lookup to be completed before storing the data. However one of the problems with post-process deduplication is the fact that it may unnecessarily store duplicate data for a short period of time which can be big problem if storage capacity is near its limit. Perhaps the major drawback is the inability to predict when the process shall be completed.

Benefits of Data Deduplication Eliminates redundant data. Drives down cost. Improves backup and recovery service levels. Changes the economics of disk versus tape. Reduces carbon footprint.

Problems with Data Deduplication Hash collisions Intensive computation power required Effect of compression Effect of encryption

How to choose a data deduplication solution? Consider the broader implications of deduplication. Think about how deduplication can be used to eliminate tape in your environment. Data created by humans dedupes well but data that is created by computers does not dedupe well. Compare multiple products. Ensure ease of integration into your existing environment.

References _gci ,00.html 5_gci ,00.html P_IDCBR_10 P_IDCBR_10 w=article&id=8477:how-to-choose-a-deduplication- solution&catid=99:cover-story&Itemid= w=article&id=8477:how-to-choose-a-deduplication- solution&catid=99:cover-story&Itemid=