Adrian Jackson, Stephen Booth EPCC Resource Usage Monitoring and Accounting
GridSafe AHM Introduction Resource usage accounting has long been standard practice on high-end compute resources. Historically less common on smaller systems where it was easier to apportion costs locally. –This is becoming less viable. –FEC costing –Grid computing (users no longer local) –Virtualisation
GridSafe AHM GridSAFE JISC funded project to build general purpose accounting/monitoring solution. – –Builds on accounting subsystem from SAFE user administration system used by HPCx/HECToR Challenges: –Need to work with wide variety of different local policies. –Need to work with both grids and local HPC resources. One solution won’t fit all potential users –Build kit of parts –Pre-built solutions for common deployment scenarios. Key aims –Modular design, individual functions can be deployed independently –Behaviour can be customised using plug-ins to implement different service policies.
GridSafe AHM End Users End users are interested in accounting for their own use. –Compare the efficiency of different systems –Compare the cost effectiveness of different systems. –Check resources available Often interested in individual jobs as well as overall totals.
GridSafe AHM Resource Providers Need to gather the raw accounting data. –Format depends on the underlying technology. Need to apply local policies –Charges –Discounts –Where to charge Usage data may be useful for purposes other than accounting. –Analysing queue wait times. –Job size profiles. –May want to keep some of this data private.
GridSafe AHM Research groups/Virtual organisations Research groups/VOs need to manage their resources across all available platforms. –Ideally have all information available in a single place. Where all resources reside within a single grid this can be provided by grid-level accounting. Resources may come from multiple grids or independent resource/ providers.
GridSafe AHM Overview
GridSafe AHM Grid-SAFE core Java code with data stored in MySQL database. –Normally run within a tomcat container. UsageRecords are treated as a collection of properties Highly customisable –Code does not mandate a single format –Can choose which of the available properties to store in database. –Can add new properties for site local concepts –Easily extendable to new types of data –Storage accounting –Allocation tracking
GridSafe AHM Accounting code Plug-in parser modules handle different types of input data. –OGF-UR –SGE –PBS –EGEE JobManager –Etc. Plug-in policy modules augment these allowing site local customisation
GridSafe AHM Reporting Portal Grid-safe uses XML templates to define reports –Can generate unified reports over multiple data tables containing different types of data –Tables/charts –Parameterised reports (e.g. to select user or project). Support reports in multiple formats –PDF HTML CSV Performance of report generation a particular issue –Utilise database effectively. –Use aggregate tables for high throughput systems.
GridSafe AHM Sample report
GridSafe AHM Web Services Web service interface for access by other services. Web service interfaces use OGF-UR XML as common interchange format. RUPI – Resource Usage Publishing Interface –Interface for uploading usage records to a remote repository. –Currently a OGF-RUS-WG proposal RUQI – Resource Usage Query Interface –Interface for running queries on a remote repository. –Aim to submit to OGF-RUS-WG
GridSafe AHM Grid level accounting Grid accounting is not a solved problem –We are aiming to contribute useful technology not to dictate a solution. Different grids are pursuing different architectures –EGEE/NGS hierarchical model –Data published up tree of repositories –DEISA distributed model. –Resource providers run local repositories and control access to data. –Accounting operations query multiple repositories. Some commonality –OGF-UR format generally accepted as common data interchange format. Combination of RUPI/RUQI can be used to implement either model.
GridSafe AHM Actively looking for sites to use the software Don’t need to use everything