CASTOR2@CNAF Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.

Slides:



Advertisements
Similar presentations
CERN Castor external operation meeting – November 2006 Olof Bärring CERN / IT.
Advertisements

16/9/2004Features of the new CASTOR1 Alice offline week, 16/9/2004 Olof Bärring, CERN.
CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
Castor F2F Meeting Barbara Martelli Castor Database CNAF.
16/4/2004Storage Resource Sharing with CASTOR1 Olof Barring, Benjamin Couturier, Jean-Damien Durand, Emil Knezo, Sebastien Ponce (CERN) Vitali Motyakov.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
CERN IT Department CH-1211 Genève 23 Switzerland t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
CASTOR Databases at RAL Carmine Cioffi Database Administrator and Developer Castor Face to Face, RAL February 2009.
Functional description Detailed view of the system Status and features Castor Readiness Review – June 2006 Giuseppe Lo Presti, Olof Bärring CERN / IT.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
CERN - IT Department CH-1211 Genève 23 Switzerland t CASTOR Status March 19 th 2007 CASTOR dev+ops teams Presented by Germán Cancio.
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
New stager commands Details and anatomy CASTOR external operation meeting CERN - Geneva 14/06/2005 Sebastien Ponce, CERN-IT.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
Fabric Monitoring at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Joint OSG & EGEE Operations WS, Culham (UK)
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
CERN IT Department CH-1211 Genève 23 Switzerland t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
CERN IT Department CH-1211 Genève 23 Switzerland t HEPiX Conference, ASGC, Taiwan, Oct 20-24, 2008 The CASTOR SRM2 Interface Status and plans.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
CASTOR CNAF TIER1 SITE REPORT Geneve CERN June 2005 Ricci Pier Paolo
Operational experiences Castor deployment team Castor Readiness Review – June 2006.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
CASTOR Status at RAL CASTOR External Operations Face To Face Meeting Bonny Strong 10 June 2008.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
CASTOR Operations Face to Face 2006 Miguel Coelho dos Santos
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
CASTOR in SC Operational aspects Vladimír Bahyl CERN IT-FIO 3 2.
Bonny Strong RAL RAL CASTOR Update External Institutes Meeting Nov 2006 Bonny Strong, Tim Folkes, and Chris Kruk.
Storage at TIER1 CNAF Workshop Storage INFN CNAF 20/21 Marzo 2006 Bologna Ricci Pier Paolo, on behalf of INFN TIER1 Staff
Storage & Database Team Activity Report INFN CNAF,
CASTOR new stager proposal CASTOR users’ meeting 24/06/2003 The CASTOR team.
GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 1 Mass Storage at CERN GDB meeting, 12. January 2005.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR and EOS status and plans Giuseppe Lo Presti on behalf.
Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
Castor dev Overview Castor external operation meeting – November 2006 Sebastien Ponce CERN / IT.
Servizi core INFN Grid presso il CNAF: setup attuale
status, usage and perspectives
CASTOR: possible evolution into the LHC era
Jean-Philippe Baud, IT-GD, CERN November 2007
Dynamic Extension of the INFN Tier-1 on external resources
DCS Status and Amanda News
GEMSS: GPFS/TSM/StoRM
High Availability Linux (HA Linux)
Giuseppe Lo Re Workshop Storage INFN 20/03/2006 – CNAF (Bologna)
Service Challenge 3 CERN
Bernd Panzer-Steindel, CERN/IT
Luca dell’Agnello INFN-CNAF
CERN-Russia Collaboration in CASTOR Development
Castor services at the Tier-0
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
CTA: CERN Tape Archive Overview and architecture
Oracle Database Monitoring and beyond
ACAT 2007 April Nikhef Amsterdam
CASTOR: CERN’s data management system
Presentation transcript:

CASTOR2@CNAF Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007

Outline Introduction to CASTOR2 CNAF Setup Production experience Monitoring & notifications Open issues and future plans

Introduction to CASTOR2 (1) System for disk caches and transparent tape media management Provides a name space that looks like a file system (/castor/cnaf.infn.it/xyz) Assumes a backend mass-storage, not designed to work as a stand-alone disk cache CASTOR2 tries to address the limitations of CASTOR1: Stager catalogue performances when #file more than 200k Static migration streams and not enough optimization for migration/recall (ex. requests for file in the same tape are not always processed together and in the right order) Code very old and difficult to maintain

Introduction to CASTOR2 (2) LSF master rtcpclientd MigHunter stager cleaning rhserver dlfserver rmmaster expertd Database centric Requests are queued in Oracle stager database Requests are dispatched as jobs to LSF Disk server autonomy as far as possible In charge of local resources: file system selection and execution of garbage collection Support for various access protocols Currently: rfio, root, gridftp Under way: xrootd, gridftp v2 All components log to Distributed Logging Facility Note: Central services and tape-layer largely unchanged Stager database DLF database Oracle DB servers Disk pools

CNAF Setup (1) SAN 13 tape servers Core services are on machines with scsi disks, hardware raid1, redundant power supplies tape servers and disk servers have lower level hardware, like WNs Sun Blade v100 with 2 internal ide disks with software raid-1 running ACSLS 7.0 OS Solaris 9.0 ~40 disk servers attached to a SAN full redundancy FC 2Gb/s or 4Gb/s (latest...) connections (dual controller HW and Qlogic SANsurfer Path Failover SW or Vendor Specific Software) STK L5500 silos (5500 slots, partitioned wth 2 form-factor slots, about 2000 LTO2 for and 3500 9940B, 200GB cartridges, tot capacity ~1.1PB non compressed ) 6 LTO2 + 7 9940B drives, 2 Gbit/s FC interface, 20-30 MB/s rate (3 more 9940B going to be acquired). SAN 13 tape servers STK FlexLine 600, IBM FastT900, EMC Clarion …

CNAF Setup (2) Name server Orcale DB (Oracle 9.2) CASTOR core services v2.1.1-9 on 4 machines Name server Orcale DB (Oracle 9.2) castor-6: rhserver, stager, rtcpclientd, MigHunter, cleaningDaemon Stager Oracle DB (Oracle 10.2) 2 SRMv1 endpoints, DNS balanced: srm://castorsrm.cr.cnaf.infn.it:8443 (used for disk pools with tape backend) srm://sc.cr.cnaf.infn.it:8443 (used for disk-only pool for atlas) srm://srm-lhcb durable.sc.cr.cnaf.infn.it (used as disk-only pool for lhcb) 1 SRMv22 endpoint: srm://srm-v2.cr.cnaf.infn.it:8443 castorlsf01: Master LSF, rmmaster, expert dlf01: dlfserver, Cmonit, Oracle for DLF castor-8: nsdaemon, vmgr, vdqm, msgd, cupvd 2 more machines for the Name Server and Stager DB

CNAF Setup (3) Svc class Exp Disk pool Garbage Collector Size (TB) alice ALICE alice1 yes 22 cms CMS cms1 100 atlas ATLAS atlas1 22.5 atlasdisk atlas1disk no 77 lhcb LHCb lhcb1 13 lhcbdisk lhcb1disk 33 argo ARGO argo1 8.3 argo_download argo2 2.2 ams AMS ams1 2.7 pamela PAMELA pamela1 3.6 magic MAGIC archive1 1.6 babar BABAR lvd LVD virgo VIRGO cdf CDF

Production experience (1) Improvements after castor1>2 upgrade: Scalability: no catalog limits, at present O(106) files in the stager catalogue. Equivalent to ≥10 castor1 independent stagers better logic in tape recall that minimize the # mount/dismount ops States for filesystem and disk servers, easier to handle temporary unavailability/maintenance interventions. Limitations that cause instability/inefficiency: rmmaster/LSF plugin melt-down when # PEND jobs ≥ 1000 prepareToGet and putDone generate a “useless” LSF job stager catalogue management/cleaning up, admin interface insufficient, many SQL operations in the Oracle DB of the stager. Easy to make mistakes. gsiftp is an external protocol. No way to tune parameters on the protocol (rfio, gsiftp) basis in order to limit disk servers load properly. Only possible modify the # LSF slots but it is not enough. GridFtp v2 will come as internal protocol, like rfio, but is not near. doc still poor

Production experience (2) CSA06: ~70 TB in 1 month, up to 4-6 TB/day Main problem: instability LoadTest07: Much better in throughput (200 MB/s) and stability thanks to the increase of the number of disk server and of the experience of admins. But stability is still an issue.

Monitoring & notifications (1)

Monitoring & notifications (2)

Monitoring & notifications (3) ping ssh local disk space raid1 status daemons # LSF PEND jobs disk and tape free space # gridftp conections

Open issues and future plans Upgrade from 2.1.1-9 to 2.1.3-x => new LSF plug-in no 1000 limits of PEND requests in the CASTOR queue no useless prepareToGet and putDone LSF jobs expected within May for the Tier1s. Just now testing at CERN by ATLAS SRM v2.2, already running in test mode. In production after CASTOR2 upgrade. No more DISK1TAPE0 CASTOR2 still needs many changes to provides this storage class. For example there isn’t the idea of “no space left of device” there are no clear priorities/plans, meeting tomorrow to define priorities. All CMS TAPE0DISK1 disk servers moved to TAPE1DISK0, ATLAS and LHCb still with TAPE0DISK1 but they will have to move to Storm+GPFS for this use. Reduce the point of failures from 5 to 2 Oracle RAC for DBs Stager and Master LSF in the same machine Additional data access protocols, in particular xrootd. Handles concurrent accesses with a single open/close from the CASTOR point of view Less number of operations to access a file Smaller access time: keep an in memory catalogue with the files locations and keeps the connection open for recurrent requests Load balance between disk servers Will come with version 2.1.3

diskservers tapeservers Max 301.3 MB/s Average 106.9 MB/s

Gen 2006 – first installation, v2 Gen 2006 – first installation, v2.0.1-3, sc4 dteam throughput phase, 180 MB/s disk-disk, 70 Mb/s. Giu-Set 2006 migration castor1-castor2 Set 2006 2.0.1-3 -> v2.1.0-6 Nov 2006 v2.1.0-6 -> 2.1.1-9 Mag-Giu 2007 v2.1.1-9 -> v2.1.3-x …. ?

XROOTD as internal protocol Stager Client XROOT redirector Disk Server XROOTD C++ API Get/Put Schedules only File Open/Close Transfer /castor/cern.ch/… XROOT client request (1) (6) (2) (5) (4) (3) redirection open (8) (7) If concurrent accesses to one file Steps 2, 3 are skipped Steps 6, 8 are only issued once

The LSF problem DB LSF plugin is not multithreaded ! Pending jobs ____ best filesystem linear scan insert DB Stager Scheduler plugin stager fs svc LSF plugin is not multithreaded ! so everything is sequential including DB access The latency of the network to the DB kills us DB itself is idle would not see it if multithreaded Due to linear scan, limited queue length currently ~ 2000 jobs rmmaster rmnode Disk Servers

The new architecture DB Usage of shared memory Database linear scan best filesystem Pending jobs ____ Scheduler plugin insert Stager backup, persistency DB RmMaster rmNode Usage of shared memory shared between RmMaster and plugin containing the rmNode stats Database used for regular backups and for initialization when restarting Best filesystem implemented in C++ inside the plugin Disk Servers