CASTOR2@CNAF Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.

CASTOR2@CNAF Status and plans
Giuseppe Lo Re INFN-CNAF 8/05/2007

Outline Introduction to CASTOR2 CNAF Setup Production experience
Monitoring & notifications Open issues and future plans

Introduction to CASTOR2 (1)
System for disk caches and transparent tape media management Provides a name space that looks like a file system (/castor/cnaf.infn.it/xyz) Assumes a backend mass-storage, not designed to work as a stand-alone disk cache CASTOR2 tries to address the limitations of CASTOR1: Stager catalogue performances when #file more than 200k Static migration streams and not enough optimization for migration/recall (ex. requests for file in the same tape are not always processed together and in the right order) Code very old and difficult to maintain

Introduction to CASTOR2 (2)
LSF master rtcpclientd MigHunter stager cleaning rhserver dlfserver rmmaster expertd Database centric Requests are queued in Oracle stager database Requests are dispatched as jobs to LSF Disk server autonomy as far as possible In charge of local resources: file system selection and execution of garbage collection Support for various access protocols Currently: rfio, root, gridftp Under way: xrootd, gridftp v2 All components log to Distributed Logging Facility Note: Central services and tape-layer largely unchanged Stager database DLF database Oracle DB servers Disk pools

CNAF Setup (1) SAN 13 tape servers
Core services are on machines with scsi disks, hardware raid1, redundant power supplies tape servers and disk servers have lower level hardware, like WNs Sun Blade v100 with 2 internal ide disks with software raid-1 running ACSLS 7.0 OS Solaris 9.0 ~40 disk servers attached to a SAN full redundancy FC 2Gb/s or 4Gb/s (latest...) connections (dual controller HW and Qlogic SANsurfer Path Failover SW or Vendor Specific Software) STK L5500 silos (5500 slots, partitioned wth 2 form-factor slots, about 2000 LTO2 for and B, 200GB cartridges, tot capacity ~1.1PB non compressed ) 6 LTO B drives, 2 Gbit/s FC interface, MB/s rate (3 more 9940B going to be acquired). SAN 13 tape servers STK FlexLine 600, IBM FastT900, EMC Clarion …

CNAF Setup (2) Name server Orcale DB (Oracle 9.2)
CASTOR core services v on 4 machines Name server Orcale DB (Oracle 9.2) castor-6: rhserver, stager, rtcpclientd, MigHunter, cleaningDaemon Stager Oracle DB (Oracle 10.2) 2 SRMv1 endpoints, DNS balanced: srm://castorsrm.cr.cnaf.infn.it:8443 (used for disk pools with tape backend) srm://sc.cr.cnaf.infn.it: (used for disk-only pool for atlas) srm://srm-lhcb durable.sc.cr.cnaf.infn.it (used as disk-only pool for lhcb) 1 SRMv22 endpoint: srm://srm-v2.cr.cnaf.infn.it:8443 castorlsf01: Master LSF, rmmaster, expert dlf01: dlfserver, Cmonit, Oracle for DLF castor-8: nsdaemon, vmgr, vdqm, msgd, cupvd 2 more machines for the Name Server and Stager DB

CNAF Setup (3) Svc class Exp Disk pool Garbage Collector Size (TB)
alice ALICE alice1 yes 22 cms CMS cms1 100 atlas ATLAS atlas1 22.5 atlasdisk atlas1disk no 77 lhcb LHCb lhcb1 13 lhcbdisk lhcb1disk 33 argo ARGO argo1 8.3 argo_download argo2 2.2 ams AMS ams1 2.7 pamela PAMELA pamela1 3.6 magic MAGIC archive1 1.6 babar BABAR lvd LVD virgo VIRGO cdf CDF

Production experience (1)
Improvements after castor1>2 upgrade: Scalability: no catalog limits, at present O(106) files in the stager catalogue. Equivalent to ≥10 castor1 independent stagers better logic in tape recall that minimize the # mount/dismount ops States for filesystem and disk servers, easier to handle temporary unavailability/maintenance interventions. Limitations that cause instability/inefficiency: rmmaster/LSF plugin melt-down when # PEND jobs ≥ 1000 prepareToGet and putDone generate a “useless” LSF job stager catalogue management/cleaning up, admin interface insufficient, many SQL operations in the Oracle DB of the stager. Easy to make mistakes. gsiftp is an external protocol. No way to tune parameters on the protocol (rfio, gsiftp) basis in order to limit disk servers load properly. Only possible modify the # LSF slots but it is not enough. GridFtp v2 will come as internal protocol, like rfio, but is not near. doc still poor

Production experience (2)
CSA06: ~70 TB in 1 month, up to 4-6 TB/day Main problem: instability LoadTest07: Much better in throughput (200 MB/s) and stability thanks to the increase of the number of disk server and of the experience of admins. But stability is still an issue.

Monitoring & notifications (1)

ping ssh local disk space raid1 status daemons # LSF PEND jobs disk and tape free space # gridftp conections

Open issues and future plans
Upgrade from to x => new LSF plug-in no 1000 limits of PEND requests in the CASTOR queue no useless prepareToGet and putDone LSF jobs expected within May for the Tier1s. Just now testing at CERN by ATLAS SRM v2.2, already running in test mode. In production after CASTOR2 upgrade. No more DISK1TAPE0 CASTOR2 still needs many changes to provides this storage class. For example there isn’t the idea of “no space left of device” there are no clear priorities/plans, meeting tomorrow to define priorities. All CMS TAPE0DISK1 disk servers moved to TAPE1DISK0, ATLAS and LHCb still with TAPE0DISK1 but they will have to move to Storm+GPFS for this use. Reduce the point of failures from 5 to 2 Oracle RAC for DBs Stager and Master LSF in the same machine Additional data access protocols, in particular xrootd. Handles concurrent accesses with a single open/close from the CASTOR point of view Less number of operations to access a file Smaller access time: keep an in memory catalogue with the files locations and keeps the connection open for recurrent requests Load balance between disk servers Will come with version 2.1.3

diskservers tapeservers Max 301.3 MB/s Average 106.9 MB/s

Gen 2006 – first installation, v2
Gen 2006 – first installation, v , sc4 dteam throughput phase, 180 MB/s disk-disk, 70 Mb/s. Giu-Set 2006 migration castor1-castor2 Set > v Nov 2006 v > Mag-Giu 2007 v > v2.1.3-x …. ?

XROOTD as internal protocol
Stager Client XROOT redirector Disk Server XROOTD C++ API Get/Put Schedules only File Open/Close Transfer /castor/cern.ch/… XROOT client request (1) (6) (2) (5) (4) (3) redirection open (8) (7) If concurrent accesses to one file Steps 2, 3 are skipped Steps 6, 8 are only issued once

The LSF problem DB LSF plugin is not multithreaded !
Pending jobs ____ best filesystem linear scan insert DB Stager Scheduler plugin stager fs svc LSF plugin is not multithreaded ! so everything is sequential including DB access The latency of the network to the DB kills us DB itself is idle would not see it if multithreaded Due to linear scan, limited queue length currently ~ 2000 jobs rmmaster rmnode Disk Servers

The new architecture DB Usage of shared memory Database
linear scan best filesystem Pending jobs ____ Scheduler plugin insert Stager backup, persistency DB RmMaster rmNode Usage of shared memory shared between RmMaster and plugin containing the rmNode stats Database used for regular backups and for initialization when restarting Best filesystem implemented in C++ inside the plugin Disk Servers

CASTOR2@CNAF Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.

Similar presentations

Presentation on theme: "CASTOR2@CNAF Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CASTOR2@CNAF Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.

Similar presentations

Presentation on theme: "CASTOR2@CNAF Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007."— Presentation transcript:

Similar presentations

About project

Feedback