Witold Litwin Riad Mokadem Thomas Schwartz Disk Backup Through Algebraic Signatures.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

What's new?. ETS4 for Experts - New ETS4 Functions - improved Workflows - improvements in relation to ETS3.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Fabián E. Bustamante, Spring 2007
Memory Management: Overlays and Virtual Memory
Storing Data: Disks and Files: Chapter 9
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
1 Interoperability of a Scalable Distributed Data Manager with an Object-relational DBMS Thesis presentation Yakham NDIAYE November, 13 the 2001 November,
Rim Moussa University Paris 9 Dauphine Experimental Performance Analysis of LH* RS Parity Management Workshop on Distributed Data Structures: WDAS 2002.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
Network+ Guide to Networks, Fourth Edition
MSN 2004 Network Memory Servers: An idea whose time has come Glenford Mapp David Silcott Dhawal Thakker.
Andrew File System (AFS)
File System Implementation
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 2: Managing Hardware Devices.
Virtual Memory Management B.Ramamurthy. Paging (2) The relation between virtual addresses and physical memory addres- ses given by page table.
1 Virtual Memory Management B.Ramamurthy Chapter 10.
WDAS Workshop, Lausanne, Jul. 9th1 Implementing SD-SQL Server: a Scalable Distributed Database System Soror SAHRI Witold LITWIN
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems.
1 Pattern Matching Using n-grams With Algebraic Signatures Witold Litwin[1], Riad Mokadem1, Philippe Rigaux1 & Thomas Schwarz[2] [1] Université Paris Dauphine.
Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
Network+ Guide to Networks, Fourth Edition Chapter 1 An Introduction to Networking.
1 Solid State Storage (SSS) System Error Recovery LHO 08 For NASA Langley Research Center.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 2: Managing Hardware Devices.
VectorWise The world’s fastest database GIUA, 13 September 2011.
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
1 Pattern Matching Using n-gram Sampling Of Cumulative Algebraic Signatures : Preliminary Results Witold Litwin[1], Riad Mokadem1, Philippe Rigaux1 & Thomas.
MOUNT10: Company, Products and Solutions ABAKUS Distribution, a.s. Jaroslav Techl
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
1 Scalable Distributed Data Structures Part 2 Witold Litwin Paris 9
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
CS 153 Design of Operating Systems Spring 2015 Final Review 2.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Memory Management: Overlays and Virtual Memory. Agenda Overview of Virtual Memory –Review material based on Computer Architecture and OS concepts Credits.
BMTS 242: Computer and Systems Lecture 2: Memory, and Software Yousef Alharbi Website
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
COMPUTER SYSTEMS ARCHITECTURE A NETWORKING APPROACH CHAPTER 12 INTRODUCTION THE MEMORY HIERARCHY CS 147 Nathaniel Gilbert 1.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
CS 540 Database Management Systems
Understanding and Improving Server Performance
Memory COMPUTER ARCHITECTURE
RAID RAID Mukesh N Tekwani
Page Replacement.
Building a Database on S3
ບົດທີ 4 ຄອມພິວເຕີ ແລະ ການປະມວນຜົນຂ່າວສານຂໍ້ມູນ
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
CDA 5155 Caches.
Network+ Guide to Networks, Fourth Edition
Erasure Correcting Codes for Highly Available Storage
Data Protection and String Search in SDDS-2005
RAID RAID Mukesh N Tekwani April 23, 2019
Cache Memory and Performance
The Gamma Database Machine Project
Virtual Memory 1 1.
Presentation transcript:

Witold Litwin Riad Mokadem Thomas Schwartz Disk Backup Through Algebraic Signatures For A Scalable Distributed Data Structure in SDDS-2002 System

2 Plan Introduction The SDDS-2002 Backup Scheme Experimental performance analysis. Conclusion.

3 Introduction u Need for RAM SDDS storage to the disk u File Backup u Failure of a server u File Eviction u Sharing of RAM u Among different SDDS files u With other apps

4 Introduction u Write to the disk only the parts (pages) changed since last backup u “Dirty bit” approach inapplicable u Page signature calculus: a possibility provided that: u Fast u Precise u Scalable u Shorter signatures may become longer without total recalculus u Not the case of SHA-1 nor of any other previous proposed schema

5 The SDDS-2002 Backup Scheme File Backup Client … … … … Server RAM Buckets     Server Disks Store command Multicast) Distributed Storing

6 The SDDS-2002 Backup Scheme File Load Client … … …     Load command Multicast) Server RAM Buckets Server Disks Distributed Loding

7 Internal Organization of Bucket in SDDS Data File Index : a few Kbytes up to MByte Data file : Dozens of Mbytes up to GBytes

8 Page Granularity u Carefull choice u Smaller page u More individual writes if many random updates u Less data transferred if a few updades u Larger pages u Vice versa u Optimal size ? u Good question u Our choice u 16 KB for data u Although 64 KB pages proved best for data page signature calculus speed u 256 B for index

9 Page Signature Algebraic Signatures Algebraic Signatures Galois Field GF () Galois Field GF (2 16 ) Log / Antilog multiplication Log / Antilog multiplication Page P has 2-byte symbols p 1, p 2, ….p n Page P has 2-byte symbols p 1, p 2, ….p n The signature formula is : The signature formula is : for each for each p’ i = antilog p’ i for each = :for each  = : ,  2,  3 … Sign  ( P )=  p’ i  i i = 1..n Sign  (P)= (Sign  ( P ), Sign  2 ( P ),…Sign  m ( P )) We put m = 2 to SDDS-2002 i=1,2...n

10 Experimental Performance Analysis Hardware Configuration  1.8 GHz P4 Servers  800 MHz P3 Client  500 MHz P3 Name Server  1 Gbs Ethernet  Windows 2000 Server OS

11 Experimental Performance SDDS-2002 Initial File Store Time (No Signature Calculus) File servers Time(Sec) File Size: 393MO Records

12 Initial File Store Time (Time Series) Number of record Storage Time (Ms)

13 File Load Time (Sec) # of servers File Size : 393MO Practically the same as the 1 st backup time

14 File Storage Performance Analysis Bucket size (MB) Number of record Signature calculus (ms) Signature Calculus per/MB (ms) Total store time (ms) Store time for 0 % change (ms) Gain (%) Store time for 5 % change (ms) Gain (%)

15 SHA-1 / Algebraic Signatures Bucket size (Mb) Number of record Algebraic signature calculus (ms) SHA-1 calculus (ms) Initial Store time with SHA-1 (ms) Initial Store time with alg. sign. (ms) SHA-1 Store time for 5 % change (ms) Alg. sign Store time for 5 % change (ms) Gain (%)

16 Algebraic / SHA-1 Signature Calculus Time

17 Implementation in SDDS 2002 Interactive Client Interface Userinterface

18 Implementation in SDDS 2002 Execution Listing at the Server } 1st Request for storage : New File Signature Calculus (375 ms) Disk write of all pages (4922 ms) 2nd Request for storage : No changes found (375 ms) 3rd Request for storage : 1 page changed ( ms)

19 Conclusion The algebraic signature based file backup works The algebraic signature based file backup works Present in SDDS-2002 prototype Present in SDDS-2002 prototype Offers advantages over the traditional approach Offers advantages over the traditional approach No change to existing code No change to existing code No run-time overhead No run-time overhead Future work Future work Signatures Signatures Calculus, Alg. Properties, Apps…Calculus, Alg. Properties, Apps… Automatic SDDS File eviction Automatic SDDS File eviction

Thank You for Your Attention