Accounting in LCG Dave Kant CCLRC, e-Science Centre.

Slides:



Advertisements
Similar presentations
GridPP Monitoring & Accounting Dave Kant CCLRC, e-Science Centre.
Advertisements

EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
HTCondor and the European Grid Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Accounting in LCG Dave Kant & John Gordon CCLRC, e-Science Centre.
Accounting Update Dave Kant Grid Deployment Board Nov 2007.
Accounting in EGEE … and beyond John Gordon and David Kant CCLRC, e-Science Centre.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Monitoring and Accounting in EGEE/LCG Jeremy Coles (for Dave Kant) ARM-6 Barcelona Based on GridPP15 talk.
Dave Kant Grid Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK HEPiX at Brookhaven 18 th – 22 nd Oct 2004.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Summary of Accounting Discussion at the GDB in Bologna Dave Kant CCLRC, e-Science Centre.
A.Guarise – F.Rosso 1 Enabling Grids for E-sciencE INFSO-RI Comprehensive Accounting Views on large computing farms. Andrea Guarise & Felice Rosso.
JSPG: User-level Accounting Data Policy David Kelsey, CCLRC/RAL, UK LCG GDB Meeting, Rome, 5 April 2006.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Dave Kant Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK GridPP 12 Jan 31 st - Feb 1 st 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security and Job Management.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Steve Traylen PPD Rutherford Lab Grid Operations PPD Christmas Lectures Steve Traylen RAL Tier1 Grid Deployment
Grid Operations Centre LCG Accounting Trevor Daniels, John Gordon GDB 8 Mar 2004.
Some Title from the Headrer and Footer, 19 April Overview Requirements Current Design Work in Progress.
GDB March User-Level, VOMS Groups and Roles Dave Kant CCLRC, e-Science Centre.
LCG Accounting John Gordon Grid Deployment Board 13 th January 2004.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract INFSO-RI Grid Accounting.
Storage Accounting John Gordon, STFC GDB March 2013.
LCG workshop on Operational Issues CERN November, EGEE CIC activities (SA1) Accounting: current status
HLRmon accounting portal DGAS (Distributed Grid Accounting System) sensors collect accounting information at site level. Site data are sent to site or.
EMI INFSO-RI Accounting John Gordon (STFC) APEL PT Leader.
Local Job Accounting Cristina del Cano Novales STFC-RAL.
Recent improvements in HLRmon, an accounting portal suitable for national Grids Enrico Fattibene (speaker), Andrea Cristofori, Luciano Gaido, Paolo Veronesi.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
LCG User Level Accounting John Gordon CCLRC-RAL LCG Grid Deployment Board October 2006.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
Accounting in LCG/EGEE Can We Gauge Grid Usage via RBs? Dave Kant CCLRC, e-Science Centre.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
Accounting in LCG Dave Kant CCLRC, e-Science Centre.
APEL Accounting Update Dave Kant CCLRC, e-Science Centre.
HLRmon accounting portal The accounting layout A. Cristofori 1, E. Fattibene 1, L. Gaido 2, P. Veronesi 1 INFN-CNAF Bologna (Italy) 1, INFN-Torino Torino.
Dave Kant LCG Accounting Overview GDA 7 th June 2004.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI-InSPIRE APEL for Accounting John Gordon, Stuart Pullinger STFC.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
INFSO-RI Enabling Grids for E-sciencE DGAS, current status & plans Andrea Guarise EGEE JRA1 All Hands Meeting Plzen July 11th, 2006.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
John Gordon Grid Accounting Update John Gordon (for Dave Kant) CCLRC e-Science Centre, UK LCG Grid Deployment Board NIKHEF, October.
Accounting in LCG Dave Kant CCLRC, e-Science Centre.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Accounting Portal Pablo Rey, Javier Lopez.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
Enabling Grids for E-sciencE APEL Accounting update Dave Kant (presented by Jeremy Coles) 2 nd EGEE/LCG Operations Workshop Bologna 25.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Storage Accounting John Gordon, STFC OMB August 2013.
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
Enabling Grids for E-sciencE INFN Workshop – May 7-11 Rimini 1 Grid Accounting Status at INFN Riccardo Brunetti INFN-TORINO.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
DGAS Accounting – toward national grid infrastructures HPDC workshop on Monitoring, Logging and Accounting, (MLA) in production Grids 10/06/2009, Munich.
Accounting Update Dave Kant, John Gordon RAL Javier Lopez, Pablo Rey Mayo CESGA.
GDB July APEL Accounting Summary Dave Kant Rutherford Appleton Laboratory.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Acccounting Portal Javier Lopez Cacheiro/
How to integrate portals with EGI accounting system R.Graciani EGI TF 2012.
Benchmarking Changes and Accounting
Accounting at the T1/T2 Sites of the Italian Grid
Giuseppe Patania Nov, Martina Franca (Ta)‏
Cristina del Cano Novales STFC - RAL
Site availability Dec. 19 th 2006
Presentation transcript:

Accounting in LCG Dave Kant CCLRC, e-Science Centre

LCG GDB Rome 2 APEL in LCG/EGEE 1. Quick Overview 2. The current state of play 3. Integration with OSG 4. Accounting in gLite

LCG GDB Rome 3 Overview Data Collection via Sensors Transportation via RGMA High level Aggregation and Reporting via Graphical Front-end High Level Reporting: Tables, Pies, Gantts, Metrics, Trees Aggregation

LCG GDB Rome 4 Component View of APEL Sensors (Deployed at site) :- Process log files; maps DN to Batch Usage; Builds accounting records: DN, CPU, WCT, SpecInt2000 etc Accounts for Grid Usage (Jobs) Only Supports PBS, SunGridEngine, Condor, and LSF Not REAL-TIME accounting Data Transport:- Uses RGMA to send data to a central repository 196 sites publishing, 7.7 Million Job records collected Could use other transport protocols Allows sites to control exports of DN information from site Presentation (GOC and Regional Portal) View, EGEE View, GridPP View, Site View Reporting based on data aggregation Metrics (e.g. Time Integrated CPU Usage) Tables, Pies, Gantt Charts,

LCG GDB Rome 5 Demos of Accounting Aggregation Global views of CPU resource consumption. LHC View  Shows Aggregation for each LHC VO Requirements driven by RRB Tier-1 and Countries are the entry points LHC VO only All data normalised in units of SI2000. Hour GridPP View  Shows Aggregation for an Organisation at Tier1/Tier2 level EGEE View (New!)  Regional Views and detailed site level reporting  Active Development by CESGA/RAL  Pablo Rey Mayo, Javier Lopez, Dave Kant

LCG GDB Rome 6 VOs/LCG/EGEE Requirements One line summary “How Much is Done, and Who did it”. High Level Anonymous Reporting  How much resource has been provided to each VO  Aggregation across: VOs, Countries, Regions, Grids, Organisations  Granularity: time frame: Weeks, Quarterly, Annually Finer Granularity at User Level  If 10,000 CPU hours were consumed by Atlas VO, who are the users that submitted the work?  Data privacy laws  A Grid “DN” is personal information which could be used to target an individual.  Who has access to this data and how do you get it?

LCG GDB Rome 7 APEL Developments Extending Batch System Support (Testing Phase)  Support for Condor and SGE. Both are being tested: SGE by CESGA and Condor by GridPP. Un-official releases are available on the APEL Home page.   Gap Publisher (Testing Phase)  Provide sites with better tools to identify and to publish missing data into the archiver. The reporting system uses Gantt charts to identify gaps, and enhancements to the publisher module are being tested.

LCG GDB Rome 8 APEL Issues…1 Normalisation (Under investigation, CESGA/RAL)  Recall that in order to account for usage across heterogeneous compute farms, data are scaled to a common reference in LCG Reference Scale = 1K.SI2000  Job records scale factor is SI2000_Published_by_Site / Reference  Some sites have a large number of job records where the site SI2000 is zero.  Identify sites via the reporting tools and provide recipe to fix. APEL Memory Usage (Important, will become urgent…)  Site databases are growing ever larger: APEL requires more memory in order to join records (RAL Tier-1 requires 2GB RAM for full build)  Implement a scheme to reduce the number of redundant records used in the Join process: flag rows used in a successful build and delete them as they are no longer needed. DN Accounting ?  Should APEL account for local usage as well as grid usage?  BNL recently sent data to us that included both Grid and local usage

LCG GDB Rome 9 APEL Issues…2 Handling Large Log files (Under Investigation)  Condor history and SGE batch logs are very large (> 1 GB )  Large logs are problematic: large amount of memory to read / store records inline. Application run time grows! We don’t want to re-read data that was passed on a previous run (efficiency).  Develop an efficient way to parse these logs? Or ask batch log providers to support log rotation? Or provide a recipe to site admins? Recipe to site admins half-work as events are lost: event data split over multiple lines. RGMA Queries to Central Repository  Query response time very slow. Prevents some sites from checking continuous consumers are actually listening for data.  Would need to archive data from the central repository to another database in order to speed up such queries.  Not an issue for the reporting front-end  Does not appear to be something that sites urgently need (requested by IN2P3-CC).

LCG GDB Rome 10 Integration with OpenScienceGrid A few OSG sites have deployed a minimal LCG front-end to publish accounting data into the APEL database (GOCDB registration + APEL sensors + RGMA MON node)  Successful deployment at University of Indiana (PBS and Condor data published) Due to (subtle) differences in the grid middleware, APELs Core library must be modified to build accounting records in the OSG environment.  LCG: DN  local batch jobId mappings encoded within three log files: LCG job manager  OSG: DN  local batch jobId mappings in single log file; globus job manager? Main Issues Under Consideration  Currently there are THREE versions of APEL CORE library, each sharing common batch system plugins: LCG production release, gLite 3 development, OSG development  Refactoring of core library to create a new plugin? LCG/gLite/OSG ?  A more sensible approach would be to use a *common* accounting file in BOTH gLite and OSG to provide the grid DN  Local Batch JobId mapping  Need a common agreement for log rotation:  Prefer lognname-YYYYMMDD.gz (static file) to logname-1.gz (not-static) Very much in the early stages, need some common agreements and some more understanding of OSG middleware before proceeding.

LCG GDB Rome 11 Accounting in gLite 3 In gLite the BLAH daemon (provided by Condor) is used to mitigate jobs between the WMS and the Compute element. Consequently, accounting information needed by APEL is no longer in the gatekeeper logs but found elsewhere e.g. in local user home directory. An accounting mapping file has been proposed by DGAS and implemented by gLite middleware developers to simplify the process of building accounting records.  For mapping grid-related information to the local job ID  Independent of submission procedure (WMS or not...)  No services or clients required on the WN  Format (one line per job, daily log rotation) timestamp= userFQAN= ceID= jobID= lrmsID= localUser= Already implemented for BLAH (and CREAM) work in progress for LCG Did not make it into gLite3.0 – no accounting for gLiteCE APEL development to begin in April (D.Kant) Development and Testing expected to take most of April

LCG GDB Rome 12 DGAS DGAS meets some requirements for privacy of user identity  user job info only readable by user, site manager and VO manager DGAS cannot aggregate info across whole Grid Solution 1 – DGAS sensors also publish anonymous data to central APEL repository,  User details available in DGAS HLR for VO Solution 2 – A higher level repository that HLRs can all publish into.  GGF Resource Usage Service – RHUL working on an implementation BUT DGAS not in gLite3.0

LCG GDB Rome 13 Summary We have a working accounting system but work is still required  to keep it working  to meet (conflicting?) outstanding requirements for Privacy User information