Servizi core INFN Grid presso il CNAF: setup attuale

Servizi core INFN Grid presso il CNAF: setup attuale
A.Cavalli, D.Dongiovanni, T.Ferrari, P.Veronesi Riunione di cabina, CNAF,

Outline FTS Top-level BDII WMS VOMS

59 channel agents, 8 VOs supported
FTS 1/2 Current version: gLite Update 42 Hardware layout: 3 DELL servers poweredge 1950 2 CPU dual core 8 GB RAM 2 SATA disks – 160 GB RAID1 Backend: Oracle cluster fts01-sc fts02-sc fts03-sc 59 channel agents, 8 VOs supported FTS Current Version: gLite Update 42

FTS 2/2 Monitoring of service: Configuration details:
profiles managed via Quattor+YAIM Nagios checks (agents + web server)‏ Lemon Monitoring ( pperational precedure described on wiki page ( Support: mailing list fts-support<at>cnaf.infn.it‏ FTS support on Italian Ticketing System FTSMonitor: Configuration details: all channel agents type are URLCOPY default SRM version copy: 2.2 SRMVERSIONPOLICY: with-space-token max number of transfer files per channels and streams configured as needed (details available via command line client)‏ timeouts tuned as needed (details available on quattor profiles)‏

TOP-BDII ROC-Italy CURRENT: FUTURE:
CNAF: 3 hosts behind a DNS round-robin: egee-bdii => egee-bdii-02, egee-bdii-05, egee-bdii-06 (SL3) INFN-PADOVA, INFN-FERRARA 2 alternative hosts used in case of major CNAF downtime (DNS name temporarily remapped on the external ones) MONITORING: NAGIOS alerts, manual intervention (automatic DNS update not allowed under the current domain) FUTURE: SL4 – gLite 3.1 CNAF BDIIs moved under a new domain, that allows automatic DNS update NAGIOS: alerts + automatic exclusion of bad BDII All 5 BDIIs integrated in the pool (internal + external)

WMS 1/4 VO WMS + LB SL4 WMS + LB SL3 Lcg-RB TOT instances ALICE 1 + 1
ATLAS 3,5 CDF 2 CMS 3 + 2 13,5* LHCB 1 + 0,5 1,5 MULTI - VO 2 + 0,5 2,5 Also WMS-LB (SL4) for middleware test purposes (CMS test) * 1 WMS LB (SL4) temporarily borrowed from ATLAS

May 2008 – Overall submission activity on CNAF WMS
(data source: VO SUBMITTED DONE COLLECTIONS ALICE 831 686 ATLAS 20083 13520 1325 CDF 3970 2713 2 CMS 751774 583042 131597 LHCB 4958 3528 MULTI - VO 14122 11196 50 TOTAL 795738 614685 132974 Peak daily submission rate (per single WMS): on production WMS: 11 Kjob/day (wms009) Including exp. service instances: 27.4 Kjob/day (devel07)

CCRC 08 MAY (data source: Submission Daily Activity on all 26 CNAF WMS/LB instances

Work in Progress: WMS Load Balancing
To better distribute load on all available instances we’re working on an automatic load balancing system: A load metric measuring several parameters on each WMS instance is calculated On the base of the load-metric an ARBITER service ranks all WMS instances and sets a list of Best unloaded WMS The list of WMS is made available behind an alias hostname: pre-prod.wms.cnaf.infn.it The user submits to the alias and his jobs automatically go to one of the available WMS The chosen WMS returns a Job ID as usual

VOMS SETUP CNAF hosts two VOMS servers:
voms.cnaf.infn.it (VOMS master for CDF) and voms2.cnaf.infn.it more than 20 VOs served in total INFN Grid VOMS replica in INFN Padova (since March 2008): voms-01.pd.infn.it Master and replica both with mysql backend as soon as something changes in the master DB  immediate propagation of changes to the replica server DB Both instances are on sl3, glite 3.0, voms-admin VOMS hosts under Nagios monitoring ( alarms) From March 2008 (INFN T1 machine room reengineering): dual power supply system for all racks is possible Migration of the voms server to an new host with redundant hw configuration needed to take advantage of this No suitable spare hw available currently Fault tolerance based on the existence of the VOMS replica outside of the INFN T1 LAN To cope with network outages Local VOMS replicas sharing a single DB backend are not totally fault tolerant (this is the current set up at CERN, VOMS replica of CERN will be put into production soon at CNAF)

Service availability from Jan 2008 to date: 95%
CNAF centre scheduled downtime Nagios update

Outstanding problems Automatic restart of deamons Plans:
Detected problem with init scripts, which prevented the VOMS and VOMS-Admin services to restart correctly after manual reboot of the machine  problem under study (A.Cavalli, A.Paolini) SOLVED Nagios alarms: only related to status of the host, service processes not under test currently  extension of nagios VOMS test suite  action on P.Veronesi DONE Plans: Extension of Nagios tests to raise SMS alarms in addition to messages for critical VOMS problems Hw and sw upgrade to gLite 3.1 Major database structure upgrade Waiting for input from CDF

Servizi core INFN Grid presso il CNAF: setup attuale

Similar presentations

Presentation on theme: "Servizi core INFN Grid presso il CNAF: setup attuale"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Servizi core INFN Grid presso il CNAF: setup attuale

Similar presentations

Presentation on theme: "Servizi core INFN Grid presso il CNAF: setup attuale"— Presentation transcript:

Similar presentations

About project

Feedback