CASTOR CNAF TIER1 SITE REPORT Geneve CERN 13-14 June 2005 Ricci Pier Paolo

Slides:



Advertisements
Similar presentations
LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.
Advertisements

CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003.
INFN CNAF TIER1 Castor Experience CERN 8 June 2006 Ricci Pier Paolo
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
CT NIKHEF June File server CT system support.
16/4/2004Storage Resource Sharing with CASTOR1 Olof Barring, Benjamin Couturier, Jean-Damien Durand, Emil Knezo, Sebastien Ponce (CERN) Vitali Motyakov.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
INFN – Tier1 Site Status Report Vladimir Sapunenko on behalf of Tier1 staff.
INFN Tier1 Andrea Chierici INFN – CNAF, Italy LCG Workshop CERN, March
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
ASGC 1 ASGC Site Status 3D CERN. ASGC 2 Outlines Current activity Hardware and software specifications Configuration issues and experience.
Soluzioni HW per il Tier 1 al CNAF Luca dell’Agnello Stefano Zani (INFN – CNAF, Italy) III CCR Workshop May
Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Federico Ruggieri INFN-CNAF GDB Meeting 10 February 2004 INFN TIER1 Status.
CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
Tier1 status at INFN-CNAF Giuseppe Lo Re INFN – CNAF Bologna Offline Week
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
Storage and Storage Access 1 Rainer Többicke CERN/IT.
Fabric Monitoring at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Joint OSG & EGEE Operations WS, Culham (UK)
CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.
Site report HIP / CSC HIP : Helsinki Institute of Physics CSC: Scientific Computing Ltd. (Technology Partner) Storage Elements (dCache) for ALICE and CMS.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
CERN IT Department CH-1211 Genève 23 Switzerland t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
SA1 operational policy training, Athens 20-21/01/05 Presentation of the HG Node “Isabella” and operational experience Antonis Zissimos Member of ICCS administration.
W.A.Wojcik/CCIN2P3, Nov 1, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
CASTOR Status at RAL CASTOR External Operations Face To Face Meeting Bonny Strong 10 June 2008.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
The Italian Tier-1: INFN-CNAF Andrea Chierici, on behalf of the INFN Tier1 3° April 2006 – Spring HEPIX.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Bonny Strong RAL RAL CASTOR Update External Institutes Meeting Nov 2006 Bonny Strong, Tim Folkes, and Chris Kruk.
Storage at TIER1 CNAF Workshop Storage INFN CNAF 20/21 Marzo 2006 Bologna Ricci Pier Paolo, on behalf of INFN TIER1 Staff
Storage & Database Team Activity Report INFN CNAF,
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.
Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.
status, usage and perspectives
LCG Service Challenge: Planning and Milestones
NL Service Challenge Plans
Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.
IT-DB Physics Services Planning for LHC start-up
LCG 3D Distributed Deployment of Databases
Service Challenge 3 CERN
The INFN TIER1 Regional Centre
Christof Hanke, HEPIX Spring Meeting 2008, CERN
The INFN Tier-1 Storage Implementation
CC-IN2P3 Pierre-Emmanuel Brinette IN2P3-CC Storage Team
ACAT 2007 April Nikhef Amsterdam
Storage resources management and access at TIER1 CNAF
CASTOR: CERN’s data management system
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Presentation transcript:

CASTOR CNAF TIER1 SITE REPORT Geneve CERN June 2005 Ricci Pier Paolo

13-14 June 2005Geneve CERN2 TIER1 CNAF PRESENTATION Hardware and software status of our CASTOR installation and management tools Usage from LHC experiment of our installation Comments

13-14 June 2005Geneve CERN3 MENPOWER At present there are 4 people at TIER1 CNAF involved in administering our CASTOR installations and front-ends: Ricci Pier Paolo (50% also activity in SAN/NAS HA disk storage management and test, Oracle adm) Lore Giuseppe (50% also activity in ALICE exp. as Tier1 reference, SAN HA disk storage management and test, managing Grid frontend to our resources) Vilucchi Elisabetta (50% involved in Oracle and RLS development and adm. and SAN disk storage management and test) Also we have 1 CNAF FTE working with the development team at CERN (started March 2005) Lopresti

13-14 June 2005Geneve CERN4 HARDWARE STATUS At present our CASTOR ( ) system is: 1 STK L5500 SILOS partitioned with 2 form-factor slots About 2000 slots LTO-2 form About 3500 slots 9940B form 6 LTO-2 DRIVES with 2Gb/s FC interface B DRIVES with 2Gb/s FC interface 2 more have been just acquired END OF JUNE INSTALLED Sun Blade v100 with 2 internal ide disks with software raid-0 running ACSLS LTO-2 Imation TAPES B Imation TAPES

13-14 June 2005Geneve CERN5 HARDWARE STATUS (2) 8 Tapeservers, 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA, STK CSC Development Toolkit provided by CERN (with licence agreement with STK) ssi,tpdaemon and rtcpd. The 8 tapeservers are direct connected direcly with the FC drive output: DRIVE LTO-2 0,0,10,0->tapesrv-0.cnaf.infn.it DRIVE LTO-2 0,0,10,1->tapesrv-1.cnaf.infn.it DRIVE LTO-2 0,0,10,2->tapesrv-2.cnaf.infn.it DRIVE LTO-2 0,0,10,3->tapesrv-3.cnaf.infn.it DRIVE LTO-2 0,0,10,4->tapesrv-4.cnaf.infn.it DRIVE LTO-2 0,0,10,5->tapesrv-5.cnaf.infn.it DRIVE 9940B 0,0,10,6->tapesrv-6.cnaf.infn.it DRIVE 9940B 0,0,10,7->tapesrv-7.cnaf.infn.it 2 MORE WILL BE INSTALLED SOON (tapesrv-8 tapesrv-9) with the 2 new 9940B USING THE 9940B have drastically reduced the error rate (we report only one 9940 tape marker RDONLY due to SCSI error and NEVER had “hanged” DRIVES in 6 months of activity).

13-14 June 2005Geneve CERN6 HARDWARE STATUS (3) castor.cnaf.infn.it Central Machine 1 IBM x345 2U machine 2x3GHz Intel Xeon, raid1 with double power supply O.S. Red Hat A.S. 3.0 Machine running all central CASTOR services (Nsdaemon, vmgrdaemon, Cupvdaemon, vdqmdaemon, msgdaemon) and the ORACLE client for the central database castor-4.cnaf.infn.it ORACLE Machine 1 IBM x345 O.S. Red Hat A.S. 3.0 Machine running ORACLE DATABASE 9.i rel 2 2 more x345 machines are in standby and are used for storing all the backup information of the ORACLE db (.exp.dbf) and can be used for replacing the above machines if needed... castor-1.cnaf.infn.it Monitoring Machine 1 DELL 1650 R.H 7.2 Machine running monitoring CASTOR service (Cmon daemon) NAGIOS central service for monitoring and notification. Also contains the command rtstat e tpstat that are usually runned with the –S option over the tapeserver

13-14 June 2005Geneve CERN7 HARDWARE STATUS (4) Stagers with diskserver: 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA accessing our SAN and runnig Cdbdaemon, stgdaemon end rfiod.1 STAGER to EACH LHC Experiment disksrv-1.cnaf.infn.it ATLAS stager with 2TB locally disksrv-2.cnaf.infn.it CMS stager with 3.2TB locally disksrv-3.cnaf.infn.it LHCB stager with 3.2TB locally disksrv-4.cnaf.infn.it ALICE stager with 3.2TB locally disksrv-5.cnaf.infn.it TEST and PAMELA stager disksrv-6.cnaf.infn.it stager with 2TB locally (archive purpose LVD,alice TOF,CDF,VIRGO,AMS,BABAR,ARGO and other HEP experiment...) Diskservers: 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA accessing our SAN and runnig rfiod. Red Hat 3.0 Cluster has been tested but not used in production for the rfiod.

13-14 June 2005Geneve CERN8 HARDWARE STATUS (5) Storage Element front-end for CASTOR castorgrid.cr.cnaf.infn.it (DNS alias load balaced over 4 machines for WAN gridftp ) SRM v.1 is installed and in production in the above machines.

13-14 June 2005Geneve CERN9 TIER1 INFN CNAF Storage Linux SL 3.0 clients ( nodes) WAN or TIER1 LAN STK180 with 100 LTO-1 (10Tbyte Native) STK L5500 robot (5500 slots) 6 IBM LTO-2, 2 (4) STK 9940B drives PROCOM 3600 FC NAS Gbyte PROCOM 3600 FC NAS Gbyte NAS1,NAS4 3ware IDE SAS Gbyte AXUS BROWIE About 2200 GByte 2 FC interface 2 Gadzoox Slingshot port FC Switch STK BladeStore About GByte 4 FC interfaces Infortrend 4 x 3200 GByte SATA A16F-R1A2-M1 NFS-RFIO-GridFTP oth... W2003 Server with LEGATO Networker (Backup) CASTOR HSM servers H.A. Diskservers with Qlogic FC HBA 2340 IBM FastT900 (DS 4500) 3/4 x GByte 4 FC interfaces 2 Brocade Silkworm port FC Switch Infortrend 5 x 6400 GByte SATA A16F-R1211-M2 + JBOD SAN 2 (40TB) SAN 1 (200TB) + 200TB end of June HSM (400 TB) NAS (20TB) NFS RFIO

13-14 June 2005Geneve CERN10 CASTOR HSM STK L drives LTO2 (20-30 MB/s) 2 drives 9940B (25-30 MB/s) 1300 LTO2 (200 GB native) B (200 GB native) TOTAL CAPACITY with 200GB 250 TB LTO-2 (400TB) 130 TB 9940B (700TB) Sun Blade v100 with 2 internal ide disks with software raid-1 running ACSLS 7.0 OS Solaris CASTOR (CERN)Central Services server RH AS3.0 8 tapeserver Linux RH AS3.0 HBA Qlogic stager with diskserver RH AS TB Local staging area EXPERIMENTStaging area (TB) Tape pool (TB native) ALICE812(LTO-2) ATLAS620(MIXED) CMS21(9940B) LHCb1830(LTO-2) BABAR,AMS+oth24(9940B) Point to Point FC 2Gb/s connections 1 ORACLE 9i rel 2 DB server RH AS or more rfio diskservers RH AS 3.0 min 20TB staging area (variable) SAN 1 WAN or TIER1 LAN SAN 2 Indicates Full rendundancy FC 2Gb/s connections (dual controller HW and Qlogic SANsurfer Path Failover SW)

13-14 June 2005Geneve CERN11 CASTOR Grid Storage Element GridFTP access through the castorgrid SE, a dns cname pointing to 3 server. Dns round-robin for load balancing During LCG Service Challenge2 introduced also a load average selection: every M minutes the ip of the most loaded server is replaced in the cname (see graph)

13-14 June 2005Geneve CERN12 NOTIICATION (Nagios)

13-14 June 2005Geneve CERN13 LHCb CASTOR tape pool # processes on a CMS disk SE eth0 traffic through a CASTOR LCG SE MONITORING (Nagios)

13-14 June 2005Geneve CERN14 DISK ACCOUNTING Pure disk space (TB)CASTOR disk space (TB)

13-14 June 2005Geneve CERN15 CASTOR USAGE The access to the castor system is 1) Grid using our SE frontends (from WAN) 2) Rfio using castor rpm and rfio commands installed on our WN and UI (from LAN) Only the 17% (65TB / 380TB) of the total HSM space was effectively used by the experiments in a 1.5 years period because 1)As TIER1 storage we offer “pure” disk as primary storage over SAN (preferred by the experiments) (GSIftp,nfs,xrootd,bbftp,GPFS ….) 2)The lack of an optimization in parallel stage-in operation (pre-stage) and reliability/performance problem arisen in LTO-2 give in general very bad performance when reading from castor so experiments in general ask for “pure” disk resources (next year requests are NOT for tape HW).

13-14 June 2005Geneve CERN16 COMMENTS As said we have a lot of disk space to manage and no definitive solution (xrootd, gpfs, dcache to be tested etc...) 1) CASTOR have already an SRM interface working. Is CASTOR-2 enough reliable and scalable to manage pure diskpool spaces? We think that it should be conceived also for this use (dcache and xrootd). 2) The limits in the rfio protocol/new stager performance could seriusly limit the potential performance scalability in a pure CASTOR diskpool. (i.e. a single open() calls need to query many database). A single diskserver with rfiod can fulfil only a limited number of request. In our site we have a limited number of diskserver with a big amount of space each (10-15TB) and the limit of the rfiod caused access failure to jobs. (we use rfiod for DIRECT access to local filesystem outside castor i.e. CMS) SOLUTION TO FAILURES=> Possibility to use swap memory SOLUTION TO PERFORMANCE=> More RAM? Other? rfio can be modified to our site-specific use?

13-14 June 2005Geneve CERN17 COMMENTS 1) We need the authorization method in CASTOR to be compatible also with LDAP not only on the password and group files. 2) Useful also include rfstage (or something similar in the official release?) 3) HA. We are planning to use the 2 stand-by machines as HA for CASTOR central services and vdqm replica Oracle 9.i rel 2 stand-by database (dataguard) or RAC

13-14 June 2005Geneve CERN18 CONCLUSION Possible to have collaborations with other groups (in order to expand the dev. team at CERN) TIER1 and LHC computing with IHEP THANK YOU FOR THE ATTENTION!