DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.

Slides:



Advertisements
Similar presentations
SOLID STATE HARD DISKS Presented by Dean Casey. Solid State Hard Disks The solid state hard disk uses a solid state memory to store its data. The solid.
Advertisements

Paging: Design Issues. Readings r Silbershatz et al: ,
P3- Represent how data flows around a computer system
Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.
10 REASONS Why it makes a good option for your DB IN-MEMORY DATABASES Presenter #10: Robert Vitolo.
Virtual Memory Deung young, Moon ELEC 5200/6200 Computer Architecture and Design Lectured by Dr. V. Agrawal Lectured by Dr. V.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Instructions Slides 3,4,5 are general questions that you should be able to answer. Use slides 6-27 to answer the questions. Write your answers in a separate.
Chapter 3 – Computer Hardware Computer Components – Hardware (cont.) Lecture 3.
Just a really fast drive Jakub Topič, I3.B
Digital Graphics and Computers. Hardware and Software Working with graphic images requires suitable hardware and software to produce the best results.
1-1 Introduction to Computer Science Computer Hardware Components: CPU, Memory, and I/O What is the typical configuration of a computer sold today? Minia.
Process Management A process is a program in execution. It is a unit of work within the system. Program is a passive entity, process is an active entity.
An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.
CPU The CPU is a microchip that is installed on a motherboard and acts as the computer’s brain - performing calculations and coordinating the hardware.
Basic Input Output System
Unit 5, Lesson 13 Storage Technologies and Devices AOIT Computer Systems Copyright © 2008–2013 National Academy Foundation. All rights reserved.
Chapter 3 Computer Hard ware
Computing Hardware Starter.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Computer Parts There are many parts that work together to make a computer work.
DATA DEDUPLICATION By: Lily Contreras April 15, 2010.
SOLID STATE DRIVES By: Vaibhav Talwar UE84071 EEE(5th Sem)
GCSE Computing Memory Powerpoint Templates.
3 Computing System Fundamentals
OCR GCSE Computing © Hodder Education 2013 Slide 1 OCR GCSE Computing Chapter 2: Memory.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Witold Litwin Université Paris Dauphine Darrell LongUniversity of California Santa Cruz Thomas SchwarzUniversidad Católica del Uruguay Combining Chunk.
There are many parts that work together to make a computer work. System Unit Computer Parts.
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
How Personal Computers Work An example presentation.
Computer Parts There are many parts that work together to make a computer work.
CLOUD BASED STORAGE Amy. Cloud Based Storage Cloud based storage is “the storage of data online in the cloud”
Memory and Storage Aldon Tom. What is Memory? Memory is a solid-state digital device that stores data values. Memory holds running programs and the data.
Memory The term memory is referred to computer’s main memory, or RAM (Random Access Memory). RAM is the location where data and programs are stored (temporarily),
Lesson 4 – Cache, virtual and flash memory
INTRODUCTION TO COMPUTERS. A computer system is an electronic device used to input data, process data, store data for later use and produce output in.
COMPUTER PARTS INSIDE - OUTSIDE. Computer Parts There are many parts that work together to make a computer work.
File-System Management
GCSE COMPUTER SCIENCE Computers 1.2 Memory, Storage and Binary.
Storage HDD, SSD and RAID.
Free Transactions with Rio Vista
Activity 1 6 minutes Research Activity: What is RAM? What is ROM?
Chapter 2 Memory and process management
Memory Key Revision Points.
Chapter 11: File System Implementation
Chapter 2: Computer-System Structures
The CPU is the brain of the computer
Chapter 1: Introduction
Memory chips Memory chips have two main properties that determine their application, storage capacity (size) and access time(speed). A memory chip contains.
Lesson Objectives Aims You should be able to:
Computer Architecture
RAID RAID Mukesh N Tekwani
14/11/2018 RAM and ROM.
Page Replacement.
CIS16 Application Development – Programming with Visual Basic
Fundamentals of Computer Organisation and Architecture
MICROPROCESSOR MEMORY ORGANIZATION
Introduction to Computer Architecture
Free Transactions with Rio Vista
Computer Application Waseem Gulsher
What is Computer A computer is a general purpose device that can be programmed to carry out a finite set of arithmetic or logical operations. Since a sequence.
Chapter 2: Operating-System Structures
2.C Memory GCSE Computing Langley Park School for Boys.
CSC3050 – Computer Architecture
Chapter 14: File-System Implementation
RAID RAID Mukesh N Tekwani April 23, 2019
Year 10 Computer Science Hardware - CPU and RAM.
Chapter 2: Operating-System Structures
Presentation transcript:

DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN

Data Deduplication Intelligent Compression Addresses storage space requirements Solid State Devices A cutting edge storage technology Addresses I/O performance requirements Data Deduplication + SSD => Perfect match ?

Abundant storage requirements in storage systems due to aplenty of redundant data Increased storage cost and performance degradations SSDs can be more cost effective than managing a group of mechanical hard drives

What have we done? Deduplication in YAFFS2 (Yet Another Flash File System)- NAND flash file system Deduplication addresses the problems caused by redundant data and it has been implemented using Content based fingerprinting Properties of flash are harnessed to reduce the overheads and implementation complexity We show that the write time for duplicate data and storage space has been greatly reduced

SSD ENTERS THE PICTURE High performance storage SSDs use microchips which retain data in non- volatile memory chips As of 2010, most SSDs use NAND-based flash memory, which retains memory even without power It has been the single biggest change to drive technology in recent years, with the storage medium showing up in data centers,laptops and in memory cards in mobile devices

Some properties of Flash/SSD Faster access time than a disk, because the data can be randomly accessed and does not rely on a read/write interface head synchronizing with a rotating disk SSD also provides greater physical resilience to physical vibration, shock and extreme temperature fluctuations because of the absence of moving parts

DESIGN Source deduplication- Takes place within a file system Content based finger printing Files are divided into chunks Fingerprinting for the chunks is carried out before every write operation Multiple redundant copies are indirected to the same device location Read performance is not affected

A note on choice of hash function The hash functions used include standards such as SHA-1, SHA-256 and others. These provide a far lower probability of data loss than the risk of an undetected/uncorrected hardware error in most cases Some cite the computational resource intensity of the process as a drawback of data deduplication To improve performance, We can utilize weak hashes. Weak hashes are much faster to calculate but there is a greater risk of a hash collision. Systems that utilize weak hashes will subsequently calculate a strong hash or compare the actual data and will use it as the determining factor to whether it is actually the same data or not. We can afford to compare the actual data as a read is fast enough in SSD than wasting precious CPU power

DESIGN contd.. The chunk fingerprints and the corresponding chunk IDs are maintained as in memory structures A back store typically has large number of chunks This led to the idea of storing Hashes on the device and maintaining a cache of it in memory A combination of LFU(Least frequently Used) and LRU(Least Recently Used) cache replacement policies should yield good results. We have implemented LFU.

Implementation We chose YAFFS2 to implement deduplication. Popular commercially used robust file system for NAND Our testing environment is an android emulator which runs the virtual CPU called Goldfish Goldfish executes ARM926T instructions and has hooks for input and output -- such as reading key presses from or displaying video output in the emulator

Implementation Primarily, we tweaked the function yaffs_WriteChunkDataToObject that writes chunk data to the NAND During every chunk write : Determine the fingerprint for the chunk Check if a fingerprint exists in the chunk cache If it is not present,fetch the fingerprint & corresponding chunk ids from device

Implementation Contd... If a chunk id is present corresponding to the hash, remove Least frequently used entry from chunk cache and replace it with the entry obtained from device Update meta-data for this chunk to point to the existing chunk ID corresponding to its fingerprint value If no chunk id is present for the hash, write the chunk to NAND and update the hash entry

RESULTS

CONCLUSION Decade’s most important data storage technology Deduplication on SSDs would be at the fore front of back up solutions in future These two technologies together can control storage costs without sacrificing reliability or performance De-dupe technology continues to spread, and as SSD costs drop, those benefits will become even more apparent.