Presentation is loading. Please wait.

Presentation is loading. Please wait.

Andrew Hanushevsky17-Mar-991 Pursuit of a Scalable High Performance Multi-Petabyte Database 16th IEEE Symposium on Mass Storage Systems Andrew Hanushevsky.

Similar presentations


Presentation on theme: "Andrew Hanushevsky17-Mar-991 Pursuit of a Scalable High Performance Multi-Petabyte Database 16th IEEE Symposium on Mass Storage Systems Andrew Hanushevsky."— Presentation transcript:

1 Andrew Hanushevsky17-Mar-991 Pursuit of a Scalable High Performance Multi-Petabyte Database 16th IEEE Symposium on Mass Storage Systems Andrew Hanushevsky SLAC Computing Services Marcin Nowak CERN Produced under contract DE-AC03-76SF00515 between Stanford University and the Department of Energy

2 Andrew Hanushevsky17-Mar-992 High Energy Experiments n BaBar at SLAC u High precision investigation of B-meson decays u Explore the asymmetry between matter and antimatter F Where did all the antimatter go? n ATLAS at CERN u Probe the Higgs boson energy range u Explore the more exotic reaches of physics

3 Andrew Hanushevsky17-Mar-993 High Energy Physics Quantitative Challenge n ExperimentBaBar/SLACATLAS/CERN n StartsMay 1999May 2005 n Data Volume0.2 petabytes/yr5.0 petabytes/yr u Total amount 2.0 petabytes 100 petabytes n Aggregate xfr rate200 MB/sec disk100 GB/sec disk n 60 MB/sec tape 1 GB/sec tape n Processing power5,000 SPECint95250,000 SPECint95 u SPARC Ultra 10’s 526 27,000 n Physicists8003,000 n Locations 87 250 n Countries 9 50

4 Andrew Hanushevsky17-Mar-994 Common Elements n Data will be stored in an Object Oriented database u Objectivity/DB F Has theoretical ability to scale to size of experiments n Most data will be kept offline u HPSS F Heavy duty, industrial strength mass storage system n BaBar will be blazing the path u First large scale experiment to use this combination u The year of the hare will be a very interesting time

5 Andrew Hanushevsky17-Mar-995 Objectivity/DB n Client/Server Application u Primary access is through the Advanced Multithreaded Server (AMS) F Can have any number of AMS’ u AMS serves “pages” (512 to 64K byte blocks) F Similar to other remote filesystem interfaces (e.g., NFS) u Objectivity client can read and write database “pages” via AMS F Pages range from 512 bytes to 64K in powers of 2 (e.g., 1K, 2K, 4K, etc.) ams protocol ufs protocol

6 Andrew Hanushevsky17-Mar-996 High Performance Storage System # Bitfile Server # Name Server # Storage Servers # Physical Volume Library # Physical Volume Repositories # Storage System Manager # Migration/Purge Server # Metadata Manager # Log Daemon # Log Client # Startup Daemon # Encina/SFS # DCE Control Network Data Network

7 Andrew Hanushevsky17-Mar-997 The Obvious Solution DatabaseServersComputeFarm Mass Storage System NetworkSwitch External Collaborators But… the devil is in the details

8 Andrew Hanushevsky17-Mar-998 Capacity and Transfer Rate 1 2 4 8 16 64 128 1024 32 512 Tape Cartridge Capacity 889094980204 Tape Transfer Rate GB Capacity Year 00969206 256 MB/Sec 3 6 12 24 48 96 192 384 Disk System Capacity Disk Transfer Rate

9 Andrew Hanushevsky17-Mar-999 The Capacity Transfer Rate Gap n Density growing faster than ability to transfer data u We can store the data just fine, but do we have the time to look at it? n There are solutions short of poverty u Stripped tape? F Only if you want a lot of headaches u Intelligent staging F Primary access on RAID devices F Cost/Performance is still a problem F Need to address UFS scaling problem u Replication - a fatter pipe? F Data synchronization problem F Load balancing issues n Whatever the solution is, you’ll need lot of them

10 Andrew Hanushevsky17-Mar-9910 Part of the solution: Together Alone n HPSS u Highly scalable, excellent I/O performance for large files but F High latency for small block transfers (i.e., Objectivity/DB) n AMS u Efficient database protocol and highly flexible but F Limited security, tied to local filesystem n Need to synergistically mate these systems

11 Andrew Hanushevsky17-Mar-9911 Opening up new vistas: The Extensible AMS oofs interface System specific interface

12 Andrew Hanushevsky17-Mar-9912 n Veritas Volume Manager u Catenates disk devices to form very large capacity logical devices n Veritas File System u High performance (60+ MB/Sec) journaled file system for fast recovery n Combination used as HPSS staging target u Allows for fast streaming I/O and efficient small block transfers As big as it gets: Scaling The File System

13 Andrew Hanushevsky17-Mar-9913 Not out of the woods yet: Other Issues n Access Patterns u Random vs sequential n Staging latency n Scalability n Security

14 Andrew Hanushevsky17-Mar-9914 No prophets here: Supplying Performance Hints n Need additional information for optimum performance u Different from Objectivity clustering hints F Database clustering F Processing mode (sequential/random) F Desired service levels n Information is Objectivity independent u Need a mechanism to tunnel opaque information n Client supplies hints via oofs_set_info() call u Information relayed to AMS in a transparent way u AMS relays information to underlying file system via oofs()

15 Andrew Hanushevsky17-Mar-9915 Where’s the data? Dealing With Latency... n Hierarchical filesystems may have high latency bursts u Mounting a tape file n Need mechanism to notify client of expected delay u Prevents request timeout u Prevents retransmission storms u Also allows server to degrade gracefully F Can delay clients when overloaded n Defer Request Protocol u Certain oofs() requests can tell client of expected delay F For example, open() u Client waits indicated amount of time and tries again

16 Andrew Hanushevsky17-Mar-9916 Many out of one: Dynamically Replicated Databases n Dynamically distributed databases u Single machine can’t manage over a terabyte of disk cache u No good way to statically partition the database n Dynamically varying database access paths u As load increases, add more copies F Copies accessed in parallel u As load decreases, remove copies to free up disk space n Objectivity catalog independence u Copies managed outside of Objectivity F Minimizes impact on administration

17 Andrew Hanushevsky17-Mar-9917 If There are many, which One Do I Go To? n Request Redirect Protocol u oofs () routines supply alternate AMS location n oofs routines responsible for update synchronization u Typically, read/only access provided on copies u Only one read/write copy conveniently supported F Client must declare intention to update prior to access F Lazy synchronization possible n Good mechanism for largely read/only databases n Load balancing provided by an AMS collective u Has one distinguished member recorded in the catalogue

18 Andrew Hanushevsky17-Mar-9918 The AMS Collective redirect Collective members are effectively interchangeable Distinguished Members

19 Andrew Hanushevsky17-Mar-9919 Keeping the hackers at bay: Object Oriented Security n No performance is sufficient if you have to always recompute u Need mechanism to provide security to thwart hackers n Protocol Independent Authentication Model u Public or private key F PGP, RSA, Kerberos, etc. Can be negotiated at run-time n Automatically called by client and server kernels u Supplied via replaceable shared libraries n Client Objectivity Kernel creates security objects as needed u Security objects supply context-sensitive authentication credentials n Works only with Extensible AMS via oofs interface

20 Andrew Hanushevsky17-Mar-9920 Overall Effects n Extensible AMS u Allows use of any type of filesystem via oofs layer n Generic Authentication Protocol u Allows proper client identification n Opaque Information Protocol u Allows passing of hints to improve filesystem performance n Defer Request Protocol u Accommodates hierarchical filesystems n Redirection Protocol u Accommodates terabyte+ filesystems u Provides for dynamic load balancing

21 Andrew Hanushevsky17-Mar-9921 Dynamic Load Balancing Hierarchical Secure AMS Dynamic Selection

22 Andrew Hanushevsky17-Mar-9922 Summary n AMS is capable of high performance u Ultimate performance limited by disk speeds F Should be able to deliver average of 20 MB/Sec per disk n The oofs interface + other protocols greatly enhance performance, scalability, usability, and security n 5 + TB of SLAC data has been processed using AMS+HPSS u Some AMS problems u No HPSS problems n SLAC will be using this combination to store physics data u BaBar experiment will produce over a 2 PB database in 10 years  2,000,000,000,000,000 = 2  10 15 bytes  200,000 3590 Tapes

23 Andrew Hanushevsky17-Mar-9923 Now for the reality n Full AMS features not yet implemented u SLAC/Objectivity design has been completed F oofs OO interface, OO security, protocols (I.e., DRP, RRP, and GAP) u oofs and ooss layers are completely functional F HPSS integration is full-featured and complete u Protocol development has been fully funded at SLAC F DRP, RRP, and GAP u Initial feature set to be deployed late summer F DRP, GAP, and limited RRP u Full asynchronous replication within 2 years n CERN & SLAC approaches similar u But quite different in detail….

24 Andrew Hanushevsky17-Mar-9924 CERN staging approach: RFIO/RFCP + HPSS File & catalog management Stage-in requests UNIX FS I/O DB pages AMS RFIO daemon HPSS Server Disk Pool RFIO calls Migration daemon RFCP (RFIO copy) Disk Server (Solaris) HPSS Mover Tape Robot

25 Andrew Hanushevsky17-Mar-9925 SLAC staging approach: PFTP + HPSS File & catalog management Stage-in requests UNIX FS I/O DB pages AMS Gateway daemon HPSS Server Disk Pool Gateway Requests Migration daemon PFTP (data) Disk Server (Solaris) HPSS Mover Tape Robot PFTP (control)

26 Andrew Hanushevsky17-Mar-9926 SLAC ultimate approach: Direct Tape Access File & catalog management Stage-in requests UNIX FS I/O DBpages AMS HPSS Server Disk Pool Migration daemon Direct Transfer Disk Server (Solaris) HPSS Mover Tape Robot Native API (rpc)

27 Andrew Hanushevsky17-Mar-9927 CERN 1TB Test Bed current approximation future 1Gb switched ether star topology

28 Andrew Hanushevsky17-Mar-9928 SLAC Configuration approximate

29 Andrew Hanushevsky17-Mar-9929 SLAC Detailed Configuration


Download ppt "Andrew Hanushevsky17-Mar-991 Pursuit of a Scalable High Performance Multi-Petabyte Database 16th IEEE Symposium on Mass Storage Systems Andrew Hanushevsky."

Similar presentations


Ads by Google