The GSI Mass Storage System TAB GridKa, FZ Karlsruhe Sep. 4, 2002 Horst Göringer, GSI Darmstadt
FZ Karlsruhe Sep. 4, Mass GSI: Overview recent history current (old) system functionality structure usage new system the requirements structure status Outlook
FZ Karlsruhe Sep. 4, Mass GSI: History till 1995: ATL Memorex 5400, capacity 1000 Gbyte IBM 3480 cartridges IBM HSM (MVS) 1995: IBM 3494 ATL, ADSM (AIX) ADSM interface not acceptable: - no cross-platform support for clients (AIX, VMS) - ADSM "node-based" - no guaranteed availability for staged files - ADSM commands not so easy to use => use ADSM API for GSI mass storage system (1996) 2000: broad discussions on future system 2001: decision to enhance GSI mass storage system
FZ Karlsruhe Sep. 4, Mass GSI: Server current server hardware: IBM RS6000 H50, 2 processors, 1 GB memory Gigabit Ethernet (max rate ~21 Mbyte/s) IBM 3494 Automatic Tape Library 8 IBM 3590E tapes (capacity GByte uncompressed) current capacity ATL 60 TByte (currently used: 27 TB exp. data, 12 TB backup) ~350 GByte internal staging disks current server software: AIX V4.3 Tivoli Storage Manager Server V4.1.3
FZ Karlsruhe Sep. 4, Mass GSI: Logical Structure
FZ Karlsruhe Sep. 4, Mass GSI: Functionality 1.command interface archive retrieve query stage delete pool_query ws_query 2. RFIO API functions available for open, close, read, seek,... RFIO client accesses GSI mass storage write functions: till end 2002
FZ Karlsruhe Sep. 4, Mass GSI: Functionality identical client interface on all GSI platforms (Linux, AIX, VMS) unique name space security policy client tape support (ANSI Label) server log files for - error reports - statistical analysis GSI software: C with sockets integrated in Alien
FZ Karlsruhe Sep. 4, Mass GSI: Stage Pool Manager administers several Stage Pools with different attributes: file life time max space user access currently active pools: RetrievePool: no guaranteed life time StagePool: min life time guaranteed future pools: ArchivePool pools dedicated to user groups
FZ Karlsruhe Sep. 4, Mass GSI: Stage Pool Manager administers an additional stage meta data DB locks each access to a pool handles disk clean requests from different sources: from process serving a user from watch demon initiates and controls disk clean processes
FZ Karlsruhe Sep. 4,
FZ Karlsruhe Sep. 4, Mass GSI: Upgrade Requirements scalable system needed (data capacity and max data rate) 1. higher bandwith => several data movers oeach with access to each tape device and robot oeach with own disk pools 2. one master administering the complete meta data DB 3. hardware independency This means: fully parallel data streams separation of control flow and data flow
FZ Karlsruhe Sep. 4, Mass GSI: Upgrade Requirements enabling technologies: Storage Area Network Tivoli Storage Manager (successor of ADSM)
FZ Karlsruhe Sep. 4, Mass GSI: New Structure
FZ Karlsruhe Sep. 4, Mass GSI: New Hardware new hardware: tape robot StorageTek L700 (max 68 TByte) 8 IBM 3850 Ultrium LTO tape drives (capacity 100 GByte uncompressed) 2 TSM server (Intel PC, fail-safe Windows 2000 cluster) 4 (8) data mover (Intel PC, Windows 2000) SAN components Brokat switch 4100 (16 ports, each 1 Mbit/s) purpose: verification of new concept hardware test: SAN, ATL, tape drives, tape volumes later: new backup system for user data
FZ Karlsruhe Sep. 4, Mass GSI: Status hardware, TSM/Storage Agent: seems to work (tests still running) new GSI software (for Unix) nearly ready command client RFIO client (read only) server package (master and slaves on data movers) stage pool manager (master and slaves on data movers) in Oct 2002: to be used for production with current AIX server
FZ Karlsruhe Sep. 4, Mass GSI: Current Plans in 2003: DAQ connection to mass storage n event builders will write in parallel via RFIO to dedicated archive disk pools enhanced performance and stability requirements in 2003/2004: new ATL (several 100 TByte) to fulfill current requirements for next years
FZ Karlsruhe Sep. 4, Mass GSI: Outlook the yearly increment of experiment data grows rapidly: an order of magnitude next years after 2006 Alice experiment running "Future Project" of GSI => the mass storage system must be scalable in both, storage capacity, and data rates the system must be flexible to follow the development of new hardware Our new concept fulfills these requirements! TSM is a powerful storage manager satisfying our needs now and also in the next future high flexibility with GSI made user interface
Mass GSI Appendix More Details
FZ Karlsruhe Sep. 4, Mass GSI: DAQ Connection
FZ Karlsruhe Sep. 4, Mass GSI : the new System not only upgrade - entrance into new hardware and new platform! new server platform Windows 2000 new tape robot new tape drives and media new network more work necessary due to missing practice unknown problems lower quality of tape drives and media presumably more operation failures => costs reduced by cheaper components, but more manpower necessary (in development and operation) however, we have many options for the future!
FZ Karlsruhe Sep. 4, Mass GSI : the new System SW enhancements and adaptions (cont‘d): adaption adsmcli server to new concept => tsmcli division of functionality into several processes code restructuring and adaptions communication between processes data mover selection (load balancing) enhancement disk pool manager subpools on each data mover => n slave disk pool managers communication master - slaves enhancement metadata database subpool selection DAQ pool handling
FZ Karlsruhe Sep. 4, Mass GSI : the new System Potential risks: new server platform Windows 2000 new tape robot new tape drives and media new network more work necessary due to missing practice unknown problems lower quality of tape drives and media
FZ Karlsruhe Sep. 4, Current Status: File Representation file archived is defined by o archive name o path name (independent from local file system) o file name (identical with name in local file system) user access handled by access table for all supported client platforms files already archived are not overwritten o except explicitly required local files are not overwritten (retrieve)
FZ Karlsruhe Sep. 4, Current Status: Local Tape Handling support of standard ANSI Label tapes on client side tape volumes portable between client platforms (AIX, Linux, VMS) enhanced error handling: ocorrupt files do not affect others when handling a file list omissing EOF mark is handled user friendly: archive complete tape volume invoking one command
FZ Karlsruhe Sep. 4, Current Status: why Disk Pools disk pools help to avoid 1. tape mount and load times 2. concurrent access to same tape volume 3. blocking of fast tape drives in robot this is useful if files are needed several times in a short time the network connection of clients is slow large working sets are retrieved file after file large working sets are accessed in parallel from a compute farm
FZ Karlsruhe Sep. 4,
FZ Karlsruhe Sep. 4, Disk Pool Manager: the current Pools RetrievePool: o files stored via adsmcli retrieve - inhibit with option stage=no o files stored when read via API o no guaranteed life time StagePool: o files stored via adsmcli stage o min life time guaranteed (3 days currently) current space: o hardware shared o overall: 350 GByte o StagePool: 100 GByte max o the RetrievePool uses space unused by the StagePool (`lent space`)
FZ Karlsruhe Sep. 4, API Client for Mass Storage API client: there are functions available e.g. to open/close files in mass storage to read/write buffers in remote files to shift the file pointer in remote files => data stream: analysis program - mass storage fully controlled by user useful if only selective access to (small) parts of file required oparts of ROOT tree ontuples local disk space unsufficient requirement for GSI API client: compatible with CERN/HEP RFIO package RFIO interface available in CERN applications
FZ Karlsruhe Sep. 4, API Client: Logical Structure
FZ Karlsruhe Sep. 4, API Client: RFIO at GSI RFIO functions developed: needed for ROOT: rfio_open, rfio_read, rfio_close, rfio_lseek additionally (e.g. for analysis programs): rfio_fopen, rfio_fread, rfio_fclose file name: file representation in GSI mass storage system currently available at GSI: –enhanced adsmcli server already in production –sample C program using rfio_f... (read) on Linux: /GSI/staging/rfio –ROOT with RFIO client (read) –GO4 viewer with RFIO client (read) in future: write functionality (rfio_write, rfio_fwrite) presumably more operation failures (also data movers) => costs reduced by cheaper components more manpower necessary (development and operation)
FZ Karlsruhe Sep. 4, API Client: ROOT with RFIO ROOT at GSI since version with RFIO API For RFIO usage in ROOT: load shared library libRFIO.so in your ROOT session for file open: use class TRFIOFile instead of TFile prefix the file representation in the mass storage system with `rfio:` in GO4 viewer: no prefix to the file representation needed
FZ Karlsruhe Sep. 4,
FZ Karlsruhe Sep. 4,
FZ Karlsruhe Sep. 4, Mass GSI : The current Bottlenecks data capacity: in April 2001: tape robot nearly completely filled (30 TByte uncompressed) since April 27, 2001: new tape drives IBM 3590E o write with double density: > 20 GByte/volume o copy all volumes => ~ 30 TByte free capacity current requirements GSI (TByte): experiment backup accumulated 2001: : : additionally: multiple instances of experiment data!