Accounting at the T1/T2 Sites of the Italian Grid DGAS: Distributed Grid Accounting System Riccardo Brunetti INFN-TORINO
Summary What is DGAS ? DGAS Features DGAS Components Security and Privacy DGAS Deployment in the Italian Grid Recent Improvements and Results Work in Progress
What is DGAS ? DGAS is a distributed accounting system able to perform a resource usage metering and billing in the Grid environment. It is based on a client/server infrastructure relying on a network of independent accounting servers. Developed inside the EDG/WP1 and EGEE/JRA1 project by INFN-TORINO people (A. Guarise, R. Piro, G. Patania)
DGAS Components Sensors on CEs Build usage records from LRMS accounting files Resource (site) HLRs (Multilevel structure) Collect usage records from one or more sites User (VO) HLRs Collect usage records for a whole VO Query clients and visualization tools Allow to retrieve data from HLRs
DGAS Features Granularity Scalability Hierarchical Design Resource accounting at single job level or in aggregate form per user, per VO, per resource (site) or per infrastructure (collection of sites). Capability to collect information both for grid and local jobs. Scalability Arbitrary number of Resource/VO HLRs can be deployed. Hierarchical Design HLRs can be interconnected, in order to have multiple levels of aggregation.
DGAS Workflow Usage Record L2 HLR Usage Record VO HLR Site HLR CE job 3 L2 HLR 3 Usage Record VO HLR 3 Site HLR CE 1 job 2 job WN
Security and Privacy Information confidentiality is guaranteed by the use of different authorization levels to access the Usage Records. Users (can access their own detailed records and aggregates) Site Managers (Can access their own site detailed records and aggregates) VO Managers (Can access detailed records and aggregates of all VO members) Full VOMS integration in query authorization is available (now on L2 HLR, on every HLR in future releases) (e.g. /atlas/Role=vomanager/Group=NULL) Security and integrity of the data flow is guaranteed by the use of GSI and data encryption. No sensitive information sent in clear text
Interface to APEL DGAS Usage Records tables can be converted into APEL LCGRecords table structure. This instance of the table can then be sent to the GOC APEL database. This can be performed using the already existing APEL producer. The translation tool (dgas2Apel) is already tested and working. This version of dgas2apel does not send the user DN with the records. Future version will be able to do that since it is now possible for APEL to encrypt them.
DGAS Deployment Italian Grid (L2 HLR) DGAS deployed in 43 sites (RPM+YAIM) L1 HLR in 1 T1 site (CNAF-T1) L1 HLR in 10 T2 Sites (2 of them registering data for small T3 sites)
DGAS Deployment Italian Grid (L2 HLR) L2 HLR in 1 Site (Torino) collecting data for 4 T2 sites (Torino,Padova,Roma1,Milano) . INFN-ROMA 1-2-3
Examples Site level Information Aggregate per VO Aggregate per User
Examples UI Query Client: Aggregate (job number, hours of CPU time), for the supported VO, at INFN-TORINO site.
Examples VO level Information (L2 HLR Collecting 4 sites) Total Aggregate per VO VO Aggregate per Site
Examples HLR Query Client: Aggregate (job number, average hours of CPU time), for the ATLAS VO, grouped by Computing Site.
Recent Improvements Up to now complete grid information accounting (Grid Job ID, user DN) were available only for jobs going through an Italian RB (patched). Other jobs were accounted using an “out-of-band” procedure. A new CE configuration (Marteen Litmah patch) is being tested which allows to register jobs independently from the RB (patch to gatekeeper) Currently tested on two CE (PBS and LSF) and two RB old (patched) and new (not patched) Up to now: 1598 jobs done -> 1598 accounted (PBS) 2000 jobs done -> 2000 accounted (LSF)
Test Facility Layout Patched RB Standard RB LSF CE (M.L. Patch) PBS CE HLR
Test Results From patched RB LSF CE (M.L. Patch) Done: 100+1000 Acc: (100+1000) From standard RB LSF CE (M.L. Patch) Done: 100+800 Acc: (100+800) From patched RB PBS CE (M.L. Patch) Done: 162+177+537 Acc: (162+177+537) From standard RB PBS CE (M.L. Patch) Done: 165+557 Acc: (165+557)
Test at T1 Site Accounting at T1 site is now performed using Red Eye (developed by F.Rosso) + DGAS Red Eye collects usage records from LSF and prepares records for the HLR (already ~4M of records collected) A new site HLR has been recently installed at T1, together with DGAS sensors in one of the 4 CE (Glite) Currently testing the functionality and particularly the impact on the CE load using bunch of jobs. A fine tuning of the parameters is probably required because of the big number of CPU at T1.
Accounting for locally submitted Work in Progress Job submitted locally (not using a proxy certificate) usually cannot be associated to any VO (no certificate or Grid information) We are going to implement a solution to provide DGAS sensors with a configurable local lookup table containing the association between local unix Username/Group and a given VO In such a way also local resource usage can be automatically computed as a VO activity (if required) Accounting for locally submitted jobs
Work in Progress WEB Interface to L2 HLR (S. Dalpra P. Veronesi)
DGAS References General information about DGAS can be found at: DGAS website: http://www.to.infn.it/grid/accounting/ DGAS User Guides: https://edms.cern.ch/cedar/plsql/doc.info?coo kie=3881073&document_id=571271&version=1