Witold Litwin Riad Mokadem Thomas Schwartz Disk Backup Through Algebraic Signatures For A Scalable Distributed Data Structure in SDDS-2002 System
2 Plan Introduction The SDDS-2002 Backup Scheme Experimental performance analysis. Conclusion.
3 Introduction u Need for RAM SDDS storage to the disk u File Backup u Failure of a server u File Eviction u Sharing of RAM u Among different SDDS files u With other apps
4 Introduction u Write to the disk only the parts (pages) changed since last backup u “Dirty bit” approach inapplicable u Page signature calculus: a possibility provided that: u Fast u Precise u Scalable u Shorter signatures may become longer without total recalculus u Not the case of SHA-1 nor of any other previous proposed schema
5 The SDDS-2002 Backup Scheme File Backup Client … … … … Server RAM Buckets Server Disks Store command Multicast) Distributed Storing
6 The SDDS-2002 Backup Scheme File Load Client … … … Load command Multicast) Server RAM Buckets Server Disks Distributed Loding
7 Internal Organization of Bucket in SDDS Data File Index : a few Kbytes up to MByte Data file : Dozens of Mbytes up to GBytes
8 Page Granularity u Carefull choice u Smaller page u More individual writes if many random updates u Less data transferred if a few updades u Larger pages u Vice versa u Optimal size ? u Good question u Our choice u 16 KB for data u Although 64 KB pages proved best for data page signature calculus speed u 256 B for index
9 Page Signature Algebraic Signatures Algebraic Signatures Galois Field GF () Galois Field GF (2 16 ) Log / Antilog multiplication Log / Antilog multiplication Page P has 2-byte symbols p 1, p 2, ….p n Page P has 2-byte symbols p 1, p 2, ….p n The signature formula is : The signature formula is : for each for each p’ i = antilog p’ i for each = :for each = : , 2, 3 … Sign ( P )= p’ i i i = 1..n Sign (P)= (Sign ( P ), Sign 2 ( P ),…Sign m ( P )) We put m = 2 to SDDS-2002 i=1,2...n
10 Experimental Performance Analysis Hardware Configuration 1.8 GHz P4 Servers 800 MHz P3 Client 500 MHz P3 Name Server 1 Gbs Ethernet Windows 2000 Server OS
11 Experimental Performance SDDS-2002 Initial File Store Time (No Signature Calculus) File servers Time(Sec) File Size: 393MO Records
12 Initial File Store Time (Time Series) Number of record Storage Time (Ms)
13 File Load Time (Sec) # of servers File Size : 393MO Practically the same as the 1 st backup time
14 File Storage Performance Analysis Bucket size (MB) Number of record Signature calculus (ms) Signature Calculus per/MB (ms) Total store time (ms) Store time for 0 % change (ms) Gain (%) Store time for 5 % change (ms) Gain (%)
15 SHA-1 / Algebraic Signatures Bucket size (Mb) Number of record Algebraic signature calculus (ms) SHA-1 calculus (ms) Initial Store time with SHA-1 (ms) Initial Store time with alg. sign. (ms) SHA-1 Store time for 5 % change (ms) Alg. sign Store time for 5 % change (ms) Gain (%)
16 Algebraic / SHA-1 Signature Calculus Time
17 Implementation in SDDS 2002 Interactive Client Interface Userinterface
18 Implementation in SDDS 2002 Execution Listing at the Server } 1st Request for storage : New File Signature Calculus (375 ms) Disk write of all pages (4922 ms) 2nd Request for storage : No changes found (375 ms) 3rd Request for storage : 1 page changed ( ms)
19 Conclusion The algebraic signature based file backup works The algebraic signature based file backup works Present in SDDS-2002 prototype Present in SDDS-2002 prototype Offers advantages over the traditional approach Offers advantages over the traditional approach No change to existing code No change to existing code No run-time overhead No run-time overhead Future work Future work Signatures Signatures Calculus, Alg. Properties, Apps…Calculus, Alg. Properties, Apps… Automatic SDDS File eviction Automatic SDDS File eviction
Thank You for Your Attention