Dave Kant LCG Accounting Overview GDA 7 th June 2004.

Slides:



Advertisements
Similar presentations
GridPP Monitoring & Accounting Dave Kant CCLRC, e-Science Centre.
Advertisements

John Gordon CCLRC eScience centre Grid Support and Operations John Gordon CCLRC GridPP9 - Edinburgh.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
John Gordon and LCG and Grid Operations John Gordon CCLRC e-Science Centre, UK LCG Grid Operations.
Accounting Update Dave Kant Grid Deployment Board Nov 2007.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Creating Data Marts from COBOL Files (ISAM to RDBMS)
Log analysis and user traceability Eygene Ryabinkin, Russian Research Centre «Kurchatov Institute» March, 12 th 2009,
Introduction on R-GMA Shi Jingyan Computing Center IHEP.
Dave Kant Grid Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK HEPiX at Brookhaven 18 th – 22 nd Oct 2004.
Dave Kant LCG Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK HEPSYSMAN April 2005.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
JSPG: User-level Accounting Data Policy David Kelsey, CCLRC/RAL, UK LCG GDB Meeting, Rome, 5 April 2006.
Dave Kant Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK GridPP 12 Jan 31 st - Feb 1 st 2005.
APEL & MySQL Alison Packer Richard Sinclair. APEL Accounting Processor for Event Logs extracts job information by parsing batch system (PBS, LSF, SGE.
Dave Kant Grid Operations Centre LCG Workshop CERN 24/3/04.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Accounting in LCG Dave Kant CCLRC, e-Science Centre.
Grid Operations Centre LCG Accounting Trevor Daniels, John Gordon GDB 8 Mar 2004.
Some Title from the Headrer and Footer, 19 April Overview Requirements Current Design Work in Progress.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
GDB March User-Level, VOMS Groups and Roles Dave Kant CCLRC, e-Science Centre.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Dave Kant Monitoring ROC Workshop Milan 10-11/5/04.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
LCG Storage Accounting John Gordon CCLRC – RAL LCG Grid Deployment Board September 2006.
LCG Accounting John Gordon Grid Deployment Board 13 th January 2004.
Security aspects of the lcg-CE Maarten Litmaath (CERN) EGEE’08.
Local Job Accounting Cristina del Cano Novales STFC-RAL.
Recent improvements in HLRmon, an accounting portal suitable for national Grids Enrico Fattibene (speaker), Andrea Cristofori, Luciano Gaido, Paolo Veronesi.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
John Gordon CCLRC RAL Grid Operations LCG Grid Deployment Board FNAL, 9th October 2003.
Accounting non-Grid Use John Gordon Management Board 7/6/2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
ELK Stack Kashif Mohammad University of Oxford. Motivations Looks cool Planning to use as Central Sys-Logger Accounting Look for interesting patterns.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
LCG User Level Accounting John Gordon CCLRC-RAL LCG Grid Deployment Board October 2006.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
Accounting in LCG/EGEE Can We Gauge Grid Usage via RBs? Dave Kant CCLRC, e-Science Centre.
Accounting in LCG Dave Kant CCLRC, e-Science Centre.
APEL Accounting Update Dave Kant CCLRC, e-Science Centre.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
INFSO-RI Enabling Grids for E-sciencE DGAS, current status & plans Andrea Guarise EGEE JRA1 All Hands Meeting Plzen July 11th, 2006.
John Gordon Grid Accounting Update John Gordon (for Dave Kant) CCLRC e-Science Centre, UK LCG Grid Deployment Board NIKHEF, October.
Accounting in LCG Dave Kant CCLRC, e-Science Centre.
EGEE is a project funded by the European Union under contract IST The Workload Management System: an example Simone Campana LCG Experiment.
Accounting service for Belarusian grid infrastructure Andrew Lukoshko, M.Sc. United Institute of Informatics Problems BELARUS.
Enabling Grids for E-sciencE APEL Accounting update Dave Kant (presented by Jeremy Coles) 2 nd EGEE/LCG Operations Workshop Bologna 25.
APEL Architecture Alison Packer. Overview Grid jobs accounting tool APEL Client software - installed in sites (CEs, gLite- APEL node) APEL Server accepts.
Using HLRmon for advanced visualization of resource usage Enrico Fattibene INFN - CNAF ISCG 2010 – Taipei March 11 th, 2010.
Why you should care about glexec OSG Site Administrator’s Meeting Written by Igor Sfiligoi Presented by Alain Roy Hint: It’s about security.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Accounting Update Dave Kant, John Gordon RAL Javier Lopez, Pablo Rey Mayo CESGA.
CH-1211 Genève 23 Job efficiencies at CERN Review of job efficiencies at CERN status report James Casey, Daniel Rodrigues, Ulrich Schwickerath.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Acccounting Portal Javier Lopez Cacheiro/
How to integrate portals with EGI accounting system R.Graciani EGI TF 2012.
Grid Operations Centre Progress to Aug 03
Accounting at the T1/T2 Sites of the Italian Grid
Raw Wallclock in APEL John Gordon, STFC-RAL
Giuseppe Patania Nov, Martina Franca (Ta)‏
Cristina del Cano Novales STFC - RAL
EGEE Middleware: gLite Information Systems (IS)
Status of Cloud Accounting and future plans
Presentation transcript:

Dave Kant LCG Accounting Overview GDA 7 th June 2004

Dave Kant LCG Accounting Overview Overview of the accounting program Offline Analysis of Core Site Data Accounting Issues

Dave Kant Motivation Identify and distinguish jobs that are submitted through the grid from jobs which were submitted by non-grid users. Collect job statistics and aggregate job information across sites, across virtual communities on a daily basis Provide feedback into SLA and MoU Jason Leak Rob Byrom Martin Craig Trevor Daniels John Gordon Dave Kant

Dave Kant Basic Concepts Program is run on several different nodes within a site to process log files necessary to generate accounting information on a daily basis. It uses RGMA to publish information into a MySQL database on the R-GMA MON node at the site. The accounting program uses the RGMA framework and the site is required to have a properly configured RGMA installation. The MON node processes this data to create accounting records. Accounting records are streamed to the GOC R-GMA MON node

Dave Kant Event Log Processing DATA SOURCE PBS EVENT LOGS SQL PbsRecords Table LcgProcessed Table PBS filter to extract data from the event log records. RGMA-API publishes data to a PbsRecords database table on the MON box and records the names of the processed logs for book- keeping CE MON /var/spool/pbs/server_priv/accounting EVERY DAY dbProducer

Dave Kant PBS Event Record Format A typical “END” EventRecord from PBS : /30/ :00:47;E;21891.lcgce02.gridpp.rl.ac.uk;user=dteam001 group=dteam jobname=STDIN queue=short ctime= qtime= etime= start= exec_host=lcg0317.gridpp.rl.ac.uk/1 Resource_List.cput=00:15:00 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 Resource_List.walltime=02:00:00 session=9842 end= Exit_status=0 resources_used.cput=00:00:03 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:11 To identify these records, the program looks for are the date and time format data at the start of the records and the E indicating that this is an end or record entry. JobName = lcgce02.gridpp.rl.ac.uk LocalUserID = dteam001 LocalUserGroup = dteam WallDuration = 02:00:00 WallDurationSeconds = 120 CpuDuration = 00:00:03 CpuDurationSeconds = 3 StartTimeEpoch = StartTimeUTC StopTimeEpoch = StopTimeUTC SubmitHost = lcgce02.gridpp.rl.ac.uk MemoryReal = 0 MmeoryVirtual = 0

Dave Kant PbsRecords Table Schema R-GMA publishes data to a PbsRecords database table on the MON box and records the names of the processed logs for book-keeping | Field | Type | | RecordIdentityP | varchar(255)| | SiteName | varchar(50) | | JobName | varchar(100) | | LocalUserID | varchar(20) | | LocalUserGroup | varchar(20) | | WallDuration | varchar(30) | | CpuDuration | varchar(30) | | WallDurationSeconds | int(11) | | CpuDurationSeconds | int(11) | | StartTime | varchar(30) | | StopTime | varchar(30) | | SubmitHost | varchar(50) | SQL PbsRecords Table MON

Dave Kant GateKeeper & Message Log Processing Extract data from globus-gatekeeper and system messages logs DATA SOURCE GLOBUS GATEKEEPER LOGS GateKeeper SQL GKRecords Table LcgProcessed Table JobNames MON /var/log: globus-gatekeeper.log gz messages.2.gz messages.3.gz DATA SOURCE System Messages LOGS EVERY DAY

Dave Kant GateKeeper Log File Format JMA 2004/03/29 23:59:49 GATEKEEPER_JM_ID :59: for /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=dave kant on JMA 2004/03/29 23:59:49 GATEKEEPER_JM_ID :59: has GRAM_SCRIPT_JOB_ID :lcgpbs:internal_ : manager type lcgpbs This tells us that the job was submitted through the grid and that the jobmanager was lcgpbs. GramScriptJobID = :lcgpbs:internal_ : LocalJobID = :59: GlobalUserName = /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=dave kant MeasurementDate = MeasurementTime = 23:59:32 Since GOC may processes logs independently from the sites and store this data in the same tables, we add some additional information to the database in the form of a Unique Record Identifier which is derived from the GramScriptJobID, the MeasurementDate and MeasurementTime. SubmitHost => From Config File SiteName => From Config File JMA record pairs that involve a fork are ignored. The program searches through the gatekeeper logs looking for the JMA pairs of records:-

Dave Kant GkRecords Table Schema | Field | Type | RecordIdentityG | varchar(255) | | GramScriptJobID | varchar(100) | | LocalJobID | varchar(50) | | GlobalUserName | varchar(255) | | SubmitHost | varchar(50) | | SiteName | varchar(50) | | SQL GKRecords Table MON

Dave Kant Message Log Processing Message Log files contain gridinfo records which map lcgpbs GramScriptJobID records to PBS Event Log records. Such a mapping is not necessary in vanilla PBS as these records are identical. Gatekeeper log PBS Event log PBSJobNameID lcgce02.gridpp.rl.ac.uk Messages log GramScriptJobID :lcgpbs:internal_ : gridinfo records match GK “JMA” records to PBS “E” records

Dave Kant Message Log File Format Mar 30 00:02:25 lcgce02 gridinfo: [ ] Job :lcgpbs:internal_ : (ID lcgce02.gridpp.rl.ac.uk) has finished GramScriptJobID = :lcgpbs:internal_ : JobName = lcgce02.gridpp.rl.ac.uk MeasurementDate = 30 Mar 2004 MeasurementTime = 00:02: | Field | Type | | GramScriptJobID | varchar(100) | | LocalJobID | varchar(50) | | MeasurementDate | varchar(255) | | MeasurementTime | | SQL JobNameRecords Table MON

Dave Kant CPU Performance DATA SOURCE LDAP GIIS Server GIIS filter to collect CPU performance benchmarks for the worker nodes from the subclusters attached to the CE. RGMA-API publishes data to SpecRecords database table on the MON box CE SQL SpecRecords Table MON EVERY DAY

Dave Kant SpecRecords Schema | Field | Type | | RecordIdentity | varchar(255) | | SiteName | varchar(50) | | ClusterID | varchar(50) | | SubClusterID | varchar(50) | | SpecInt2000 | int(11) | | SpecFloat2000 | int(11) | SQL SpecRecords Table MON CPU Performance benchmarks for the worker nodes in the subclusters attached to the CE

Dave Kant Joining Records Together 4-Way Join matches records and writes them to the LcgRecords Table. These records are unique Site now has a copy of its own accounting data. SQL GKRecords PbsRecords JobNames SpecRecords LcgRecords MON EVERY DAY

Dave Kant GOC LcgRecords MON Site 1 LcgRecords MON Site n Site LcgRecords 1. n MON GOC GOC runs a special program on its MON node. This program listens for data streamed from the LcgRecords table by R-GMA. In this way, the GOC collects accounting data aggregated across all LCG sites. EVERY DAY streamProducer

Dave Kant Stand-Alone Test Results Stand-alone means that GOC has processed log data which has been sent from the sites. Data received from 7 sites covering different periods of time. SiteCEJobManagerStartEnd CAMfarm012lcgpbs30/03/0415/05/04 CERNlxn1181lcgpbs16/02/0415/04/04 CNAFwn alcgpbs09/02/0431/03/04 FZKpbs-server2pbspro11/03/0431/03/04 NIKHEFtbn18lcgpbs11/02/0415/04/04 RALlcgce02pbs/lcgpbs20/01/0415/04/04 Taipeilcg00125lcgpbs02/02/0424/05/04

Dave Kant Stand-Alone Test Results CPU usage per VO per site: Note that Alice jobs dominate by more than an order of magnitude.

Dave Kant Stand-Alone Test Results CPU usage per VO, aggregated over sites

Dave Kant Accounting Issues 1.Support for vanilla pbs, lcgpbs and pbspro only. IN2P3 is supporting bqs. Extending support to LSF and other batch systems will depend on the amount of effort required. To be investigated. 2.The program has been tested in stand-alone mode using log files sent to the GOC by site administrators. It will begin production-mode testing this week 3.At present the logs provide no means of distinguishing sub-clusters of a CE which have nodes of differing processing power. 4.VO synonyms: FZK prefer “d0” wheras other sites prefer “dzero”. Does LCG impose a fixed-name schema for VOs? 5.The VO associated with a user’s DN is not available in the batch or gatekeeper logs. It will be assumed that the group ID used to execute user jobs, which is available, is the same as the VO name. This needs to be acknowledged as an LCG requirement. REFER TO NEXT SLIDE FOR EXAMPLE

Dave Kant Specific Issues CNAF 1.25% of all accounting records built have an un-recognised group in the PBS event END record. There is no way to trace this to the user without access to the log file. 03/31/ :49:02;S;40372.wn a.cr.cnaf.infn.it;user=dteam003 group=2688 jobname=STDIN queue=lcg ctime= qtime= PBS log files show that group 2688 appeared on 16March Prior to this a named “dteam” group was defined 03/15/ :07:51;E;15099.wn a.cr.cnaf.infn.it;user=dteam004 group=dteam jobname=STDIN queue=lcg ctime= qtime= CNAF TO ASSOCIATE “DTEAM” TO GROUPID 2688