Grid Operations Centre LCG Accounting Trevor Daniels, John Gordon GDB 8 Mar 2004
LCG Accounting Overview 1.PBS log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> PbsRecords table 2.Gatekeeper log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> GkRecords table 3.Job Manager log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> PbsJobIds 4.Site GIIS interrogated daily on site CE to obtain SpecInt and SpecFloat values for CE, acts as DBProducer -> SpecRecords table, one dated record per day 5.These four tables joined daily on MON to produce LcgRecords table. As each record is produced program acts as StreamProducer to send the entries to the LcgRecords table on the GOC site. 6.Site now has table containing its own accounting data; GOC has aggregated table over whole of LCG. 7.Interactive and regular reports produced by site or at GOC site as required.
GOC Site LCG Site MON LCG Site CEMON PBS log gk log site GIIS filter GOC Reports LCG Site Accounting DB LCG Accounting Flow
Changes to GK Logs The way in which the gatekeeper records information relevant to accounting was changed in LCG1_1_1_0 issued on 24 Oct 2003 when the lcgpbs job manager was introduced. The implications of this change were not communicated to GOC, and they were not discovered until February during integration tests of the final system with live data. The code had been developed using the earlier log formats, and required substantial changes to accommodate the new formats. An extra log file now has to be processed to generate a fourth intermediate table which then has to entered into the final 4- table join. The code to do this has been designed and is now being written. The result is a delay in deploying the software of perhaps 2 weeks.
R-GMA Infrastructure The Grid Deployment Area meeting on 23 Feb 2004 discussed the requirement to deploy an R-GMA infrastructure. It was needed for the immediate applications of accounting, network monitoring and CMS monitoring. It was proposed that RAL would provide effort to package, deploy and support R-GMA and work with the Deployment Team to achieve this. This was agreed, with a target of end- March for deployment to a few test sites following certification on the CERN testbed. RAL would also assist with installing and testing the applications, particularly accounting.
Interim Arrangements Until R-GMA becomes operational throughout LCG, RAL will make arrangements to upload accounting files to the GOC system so they may be processed there. All sites have been instructed to preserve the relevant logs from 1 Feb 2004 until they can be uploaded. A script has been written to automate the uploading of the required files. These are the gzipped globus gatekeeper logs in /var/log/ the gzipped gatekeeper job manager messages files in /var/log the pbs log files in /var/spool/pbs/server_priv/accounting/ The script has been packaged and will shortly be deployed to a few test sites. The files will be uploaded to a set of directories on the GOC system using a mutually authenticated transfer to ensure that the files in a particular directory can only come from the CE associated with that directory.
Progress Status on 2 Mar 2004: The code to extract data from the pbs and gk logs and to obtain the estimate of CE power is written, working and tested Code to extract the mapping from GK JM ID to pbs jobID from the job manager messages and globus gatekeeper logs is being written Scripts to automate the uploading of files to GOC are written, packaged and awaiting deployment Directory structure at GOC to receive files is set up - Logs are being preserved from 1 Feb for later processing – Some lost An option flag has been added to suppress the publication of the DN if sites are unable to do this due to data protection or privacy laws To do: Complete rewriting of accounting client following log change - Done (5/3) Package accounting client for deployment – 3 days Write the report generators – 30 days (estimate – they are not yet designed)
9
10
Remaining Issues 1.The VO associated with a user is not available in the batch or gatekeeper logs. It will be assumed that the group ID used to execute user jobs, which is available, is the same as the VO name. This needs to be acknowledged as an LCG requirement. 2.The global jobID assigned by the Resource Broker is not available in the batch or gatekeeper logs. This global jobID cannot therefore appear in the accounting reports. The RB Events Database contains this, but that is not accessible nor is it designed to be easily processed. 3.At present the logs provide no means of distinguishing sub- clusters of a CE which have nodes of differing processing power. Changes to the information logged by the batch system will be required before such heterogeneous sites can be accounted properly. At present it is believed all sites are homogeneous.
LcgRecords Table Where possible, the fieldnames in the LcgRecords table have been chosen to correspond with the schema developed by the Global Grid Forum’s Usage Record Working Party. There is one record per job. SiteNamesite at which the job executed JobName(as known to the executing site) LocalUserID(as known to the executing site for that job) GlobalUserNamesubmitting user’s Distinguished Name ProjectNameuser’s Groupname; assumed to be VO name WallDurationelapsed time while job was running CpuDurationcpu time used by job EndTimetime job finished} { in ISO 8601 format.. StartTimetime job started } {..local time and UTC SubmitHostdomain name of CE MemoryRealreal memory used MemoryVirtualvirtual memory used SpecInt2000of Cluster/SubCluster associated with CE SpecFloat2000of Cluster/SubCluster associated with CE
Accounting Reports The way in which the accounting records will be summarised in reports has not yet been designed in detail. The following show our current thoughts, and comments from GDB on these points would be helpful: Regular summary reports (monthly?) will be published automatically showing usage by site and by VO. Interactive reports generated with various selection criteria, including DN, site, VO, dates, will be available from the GOC website. ‘Usage’ could include raw cpu time, cpu time normalised to SpecIntSeconds (or some agreed combination of SpecInt and SpecFloat powers) (probably the default, once agreed), a notional ‘charge’ based on any combination of cputime, real memory, and virtual memory (however, the available logs do not include data on storage used, nor on the queue through which the job was submitted, both of which would be desirable for calculating a notional charge.) The same reports will be available for running at sites on local data and at the GOC on aggregated data.
Summary An accounting prototype has been deployed at GOC Using logs transferred from sites Need all sites to transfer records Have written a tool but it needs deploying. GOC will start to publish a few standard reports Next Steps Package and distribute Develop reports Consider normalisation of heterogeneous clusters Include more batch types.