The GSI Mass Storage for Experiment Data DVEE-Palaver GSI Darmstadt Feb. 15, 2005 Horst Göringer, GSI Darmstadt

Slides:



Advertisements
Similar presentations
Best Practices for Backing Up Your System
Advertisements

Introduction to DBA.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Vorlesung Speichernetzwerke Teil 2 Dipl. – Ing. (BA) Ingo Fuchs 2003.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
1 I/O Management in Representative Operating Systems.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Backup Rationalisation Reorganisation of the CERN Computer Centre Backups David Asbury IT/DS Friday 6 December 2002.
CT NIKHEF June File server CT system support.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
Hosted by Case Study - Storage Consolidation Steve Curry Yahoo Inc.
The GSI Mass Storage System TAB GridKa, FZ Karlsruhe Sep. 4, 2002 Horst Göringer, GSI Darmstadt
Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
CHEP 2000: 7-11 February, 2000 I. SfiligoiData Handling in KLOE 1 CHEP 2000 Data Handling in KLOE I.Sfiligoi INFN LNF, Frascati, Italy.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
Problem-solving on large-scale clusters: theory and applications Lecture 4: GFS & Course Wrap-up.
High Availability in DB2 Nishant Sinha
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Reliability of KLOE Computing Paolo Santangelo for the KLOE Collaboration INFN LNF Commissione Scientifica Nazionale 1 Roma, 13 Ottobre 2003.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Tivoli Workload Scheduler for Applications PeopleSoft Integration
An Introduction to GPFS
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
CommVault Architecture
Server Performance, Scaling, Reliability and Configuration Norman White.
CTA: CERN Tape Archive Rationale, Architecture and Status
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Compute and Storage For the Farm at Jlab
CASTOR: possible evolution into the LHC era
Databases and DBMSs Todd S. Bacastow January 2005.
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
Computing Infrastructure for DAQ, DM and SC
An Introduction to Computer Networking
CS703 - Advanced Operating Systems
UNIT IV RAID.
Chapter 2: Operating-System Structures
MAINTAINING SERVER AVAILIBILITY
Prepared by Jaroslav makovski
INFNGRID Workshop – Bari, Italy, October 2004
Chapter 2: Operating-System Structures
Lecture 4: File-System Interface
IBM Tivoli Storage Manager
STATEL an easy way to transfer data
Presentation transcript:

The GSI Mass Storage for Experiment Data DVEE-Palaver GSI Darmstadt Feb. 15, 2005 Horst Göringer, GSI Darmstadt

Horst GöringerGSI DVEE Palaver Overview  different views  current status  last enhancements: - write cache - on-line connection to DAQ  future plans  conclusions

Horst GöringerGSI DVEE Palaver GSI Mass Storage System Gsi mass STORagE system gstore

Horst GöringerGSI DVEE Palaver gstore: storage view

Horst GöringerGSI DVEE Palaver gstore: hardware view 3 automatic tape libraries (ATL): (1) IBM 3494 (AIX) 8 tape drives IBM 3590 (14 MByte/s) ca volumes (47 TByte, 13 TByte backup) 1 data mover (adsmsv1) access via adsmcli, RFIO read read cache 1.1 TByte StagePool, RetrievePool

Horst GöringerGSI DVEE Palaver gstore: hardware view (2) StorageTek L700 (Windows 2000) 8 tape drives LTO2 ULTRIUM (35 MByte/s) ca 170 volumes (32 TByte) 8 data mover (gsidmxx), connected via SAN access via tsmcli, RFIO read cache 2.5 TByte StagePool, RetrievePool write cache ArchivePool: 0.28 TByte DAQPool: 0.28 TByte

Horst GöringerGSI DVEE Palaver gstore: hardware view (3) StorageTek L700 (Windows 2000) 4 tape drives LTO1 ULTRIUM (15 MByte/s) ca. 80 volumes (10 TByte): backup copy of 'irrecoverable' archives...raw mainly for backup of user data (~ 30 TByte)

Horst GöringerGSI DVEE Palaver gstore: software view 2 major components: TSM (Tivoli Storage Manager) commercial handles tape drives and robots data base GSI software (~ 80,000 lines of code) C, sockets, threads - interface to user (tsmcli / adsmcli, RFIO) - interface to TSM (TSM API client) - cache administration

Horst GöringerGSI DVEE Palaver gstore user view: tsmcli tsmcli subcommands: archive file* archive path retrieve file* archive path query file* archive path* stage file* archive path delete file archive path ws_query file* archive path pool_query pool* *: any combination of wildcard characters (*,?) allowed soon: file may contain list of files (with wildcard chars)

Horst GöringerGSI DVEE Palaver gstore user view: RFIO rfio_[f]open rfio_[f]read rfio_[f]write rfio_[f]close rfio_[f]stat rfio_lseek GSI extensions (for on-line DAQ connection): rfio_[f]endfile rfio_[f]newfile

Horst GöringerGSI DVEE Palaver gstore server view: query

Horst GöringerGSI DVEE Palaver gstore server view: archive to cache

Horst GöringerGSI DVEE Palaver gstore server view: archive from cache

Horst GöringerGSI DVEE Palaver gstore server view: retrieve from tape

Horst GöringerGSI DVEE Palaver server view: retrieve from write cache

Horst GöringerGSI DVEE Palaver gstore: overall server view

Horst GöringerGSI DVEE Palaver server view: gstore design concepts strict separation of control and data flow no bottleneck for data scalable in capacity (tape and disk) I/O bandwidth hardware independent (as long as TSM support) platform independent unique name space

Horst GöringerGSI DVEE Palaver server view: cache administration multithreaded servers for read and write cache each with own metadata DB main tasks: - lock/unlock files - select data movers and file systems - collect actual infos on disk space soon: data mover and disk load -> load balancing - trigger asynchronous archiving - disk cleaning several disk pools with different attributes: StagePool, RetrievePool, ArchivePool, DAQPool,...

Horst GöringerGSI DVEE Palaver usage profile: batch farm batch farm: ~120 double processor nodes => highly parallel mass storage access (read and write) read requests: 'good' user: stage all files before use wildcard chars 'bad' user: read lots of single files from tape 'bad' system: stage disk/DM crashes during analysis write requests: via write cache distribute as uniformly as possible

Horst GöringerGSI DVEE Palaver usage profile: experiment DAQ several continous data streams from DAQ keep same DM during life time of data stream only via RFIO GSI extensions necessary: rfio_[f]endfile, rfio_[f]newfile disks faster emptied than filled: network -> disk: ~10 MByte/s disk -> tape: ~30 MByte/s => time to stage for on-line analysis enough disk buffer necessary for case of problems (robot, TSM,...)

Horst GöringerGSI DVEE Palaver current plans: new hardware more and safer disks: write cache: all RAID 4 TByte (ArchivePool, DAQPool) read cache: +7.5 TByte new RAID StagePool, RetrievePool, new pools, e.g. with longer file life time 5 new data movers: new fail-safe entry server hosts query server, cache administration servers -> query performance! take-over in case of host failure metadata DBs mirrored on 2nd host

Horst GöringerGSI DVEE Palaver current plans: merge tsmcli /adsmcli new command gstore: replaces tsmcli and adsmcli unique name space (already available) users need not care in which robot data reside new archive: policy computing center

Horst GöringerGSI DVEE Palaver brief excursion: future of IBM 3494? still heavily used rather full hardware highly reliable should be decided this year!

Horst GöringerGSI DVEE Palaver usage IBM 3494 (AIX)

Horst GöringerGSI DVEE Palaver brief excursion: future of IBM 3494? 2 extreme options (and more in between): no more money investment use as long as possible in a few years: move data to other robot upgrade tape drives and connect to SAN 3590 (~30 GB, 14 MB/s) -> 3592 (300 GB, 40 MB/s) new media: => 700 TByte capacity access with available data movers via SAN new fail-safe TSM server (Linux?)

Horst GöringerGSI DVEE Palaver current plans: load balancing acquire actual info on no. of read/write processes for each disk, data mover, pool new write request: select resource with lowest load new read request: avoid 'hot spots' -> create additional instances of stage file new option '-randomize' for stage/retrieve distribute equally to different data movers / disks split into n (parallel) jobs

Horst GöringerGSI DVEE Palaver current plans: new org. of DMs Linux platform more familar environment (shell scripts, Unix commands,...) case sensitive file names current mainstream OS for experiment DV '2nd level' data movers no SAN connection disks filled via ('1st level') DMs with SAN connection for stage pools with guaranteed life time of files

Horst GöringerGSI DVEE Palaver current plans: new org. of DMs integration of selected group file servers as '2nd level' data movers disk space (logically) reserved for owners pool policy according to owners many advantages: no NFS => much faster I/O files physically distributed over several servers load balancing of gstore disk cleaning disadvantages: only for exp. data, access via gstore interface

Horst GöringerGSI DVEE Palaver current plans: user interface a large number of user requests: - longer file names - option to rename files - more specific return codes -... program code consolidation improved error recovery after HW failures support for successor of alien GRID support - gstore as Storage Element (SE) - Storage Resource Manager (SRM) -> new functionalities, e.g. reserve resources

Horst GöringerGSI DVEE Palaver Conclusions GSI concept for mass storage successfully verified hardware and platform independent scalable in capacity and bandwidth to keep up with - requirements of future batch farm(s) - data rates of future experiments gstore able to manage very different usage profiles but still a lot of work... to fully reach all discussed plans