Enabling Grids for E-sciencE www.eu-egee.org APEL Accounting update Dave Kant (presented by Jeremy Coles) 2 nd EGEE/LCG Operations Workshop Bologna 25.

Slides:



Advertisements
Similar presentations
21 Sep 2005LCG's R-GMA Applications R-GMA and LCG Steve Fisher & Antony Wilson.
Advertisements

GridPP Monitoring & Accounting Dave Kant CCLRC, e-Science Centre.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
Accounting in LCG Dave Kant & John Gordon CCLRC, e-Science Centre.
Accounting Update Dave Kant Grid Deployment Board Nov 2007.
Accounting in EGEE … and beyond John Gordon and David Kant CCLRC, e-Science Centre.
Introduction on R-GMA Shi Jingyan Computing Center IHEP.
Dave Kant Grid Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK HEPiX at Brookhaven 18 th – 22 nd Oct 2004.
Dave Kant LCG Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK HEPSYSMAN April 2005.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Summary of Accounting Discussion at the GDB in Bologna Dave Kant CCLRC, e-Science Centre.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
Dave Kant Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK GridPP 12 Jan 31 st - Feb 1 st 2005.
APEL & MySQL Alison Packer Richard Sinclair. APEL Accounting Processor for Event Logs extracts job information by parsing batch system (PBS, LSF, SGE.
Dave Kant Grid Operations Centre LCG Workshop CERN 24/3/04.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Accounting in LCG Dave Kant CCLRC, e-Science Centre.
Steve Traylen PPD Rutherford Lab Grid Operations PPD Christmas Lectures Steve Traylen RAL Tier1 Grid Deployment
Grid Operations Centre LCG Accounting Trevor Daniels, John Gordon GDB 8 Mar 2004.
Some Title from the Headrer and Footer, 19 April Overview Requirements Current Design Work in Progress.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
GDB March User-Level, VOMS Groups and Roles Dave Kant CCLRC, e-Science Centre.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
Dave Kant Monitoring ROC Workshop Milan 10-11/5/04.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
LCG Accounting John Gordon Grid Deployment Board 13 th January 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Local Job Accounting Cristina del Cano Novales STFC-RAL.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
LCG User Level Accounting John Gordon CCLRC-RAL LCG Grid Deployment Board October 2006.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
Accounting in LCG/EGEE Can We Gauge Grid Usage via RBs? Dave Kant CCLRC, e-Science Centre.
LCG Accounting/Reporting John Gordon, STFC MB November 9 th 2011.
Accounting in LCG Dave Kant CCLRC, e-Science Centre.
APEL Accounting Update Dave Kant CCLRC, e-Science Centre.
Dave Kant LCG Accounting Overview GDA 7 th June 2004.
INFSO-RI Enabling Grids for E-sciencE gLite Information System: R-GMA Tony Calanducci INFN Catania gLite tutorial at the EGEE User.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
INFSO-RI Enabling Grids for E-sciencE DGAS, current status & plans Andrea Guarise EGEE JRA1 All Hands Meeting Plzen July 11th, 2006.
CERN 21 January 2005Piotr Nyczyk, CERN1 R-GMA Basics and key concepts Monitoring framework for computing Grids – developed by EGEE-JRA1-UK, currently used.
Open Science Grid OSG Accounting System Matteo Melani SLAC 9/28/05 Joint OSG and EGEE Operations Workshop.
John Gordon Grid Accounting Update John Gordon (for Dave Kant) CCLRC e-Science Centre, UK LCG Grid Deployment Board NIKHEF, October.
Accounting in LCG Dave Kant CCLRC, e-Science Centre.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
APEL Architecture Alison Packer. Overview Grid jobs accounting tool APEL Client software - installed in sites (CEs, gLite- APEL node) APEL Server accepts.
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
Enabling Grids for E-sciencE INFN Workshop – May 7-11 Rimini 1 Grid Accounting Status at INFN Riccardo Brunetti INFN-TORINO.
INFSO-RI Enabling Grids for E-sciencE GOCDB2 Matt Thorpe / Philippa Strange RAL, UK.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Accounting Update Dave Kant, John Gordon RAL Javier Lopez, Pablo Rey Mayo CESGA.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Acccounting Portal Javier Lopez Cacheiro/
Accounting at the T1/T2 Sites of the Italian Grid
Cristina del Cano Novales STFC - RAL
User Accounting Integration Spreading the Net.
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

Enabling Grids for E-sciencE APEL Accounting update Dave Kant (presented by Jeremy Coles) 2 nd EGEE/LCG Operations Workshop Bologna 25 th May 2005

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Overview This is a summary of the status of Accounting & Reporting following its deployment in LCG2_4_0 (based on talk given to LCG deployment board last week) 1.Overview (mainly for background reading) 2.APEL Design (mainly for background reading) 3.What’s New? 4.LCG Accounting (OSG, NorduGrid, EGEE) 5.Issues (for discussion)

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Requirement Capture Originally a requirement of the LHC Computing Grid project. Requirements were first captured through presentations to –LCG’s Grid Deployment Board –Deployment Team. –LHC experiments and the Tier1 centres are represented on the GDB. Subsequent input from –EGEE ROC Managers

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Requirements A historical record of grid usage to identify the use of individual sites by VOs as a function of time To demonstrate the total delivery of resources by that site to the Grid Aggregated views of the collected data grouped by: –Virtual Organisation –Country – a requirement of LCG which has a country-based structure –EGEE Region – for use by EGEE Regional Operations Centre (ROC) A presentation front-end to the data to allow the selection on- demand of the views described above for different VOs and periods of time. To present the data as –A graphical view for interpretation –A tabular view for precision To support sites that already had their own methods of data collection by allowing arbitrary data collection techniques and insertion of the data in the standard schema into the central database.

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Requirements It was not an explicit requirement that user information be captured but we included this in the design as we were sure this would be a secondary requirement This is a reporting (or metering) system, not a charging mechanism. The information is under the control of the site, –so it does not meet the requirement of a charging system to be digitally signed by the user and irrefutable. –collection is thus lightweight and efficient Information is gathered centrally, not under the control of the VO

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Design Information collected at each site from batch logs, gatekeeper logs etc Information joined at site level to select grid jobs and stored in database on R-GMA MON box at site. Information published through R-GMA and collected centrally in an R-GMA archive at GOC Web site presents various views of this data for presentation Structure of Grid taken from GOC DB – the grid configuration database.

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 7

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ How APEL Works? PBS/LSF log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> PbsRecords table Gatekeeper log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> GkRecords table Message log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> MessageRecords table Site GIIS interrogated daily on site CE to obtain SpecInt and SpecFloat values for CE, acts as DBProducer -> SpecRecords table, one dated record per day These three tables joined daily on MON to produce LcgRecords table. As each record is produced program acts as StreamProducer to send the entries to the LcgRecords table on the GOC site. Site now has table containing its own accounting data; GOC has aggregated table over whole of LCG. Interactive and regular reports produced by site or at GOC site as required.

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Job Records In via RGMA RGMA MON SQL QUERY TO Accounting Server 1 Query / Hour On-Demand Accounting Pages based on SQL queries to summary data 1 Record per Grid Job (Millions of records expected) Summary data refreshed every hour (Max records about 100K per year) Home Page User queries Graphs GOC

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Description Web allows information to be selected by –VO, time range, (Whole Grid, Country, EGEE Region, site) Also shows information on data collected

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Web form to apply selection criteria on the data Aggregate data across an organisation structure (Default= All ROCs) Select VOs (Default = All) Select date range Selectable views

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ VO Index Summed CPU (Seconds) consumed by resources in selected Region Selected Date Range

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ List of Sites Belonging to the Selected ROC A breakdown of the resource usage per Site, per VO, per Month

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Sites publishing data to GOC (May 17th 2005) Over 1.7 Million Job records ~ 50K records per week /

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ What is New? –Original interface designed for EGEE DSA1.3  ROC and CIC views –Added LHC View to the reporting interface  Requirements driven by RRB / Kors Bos  Tier-1 and Country entry points  LHC VO only  All data normalised in units of SI2000. Hour  Tabular Summaries per Tier1/ Country

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Breakdown of Usage per Country / Tier1 Tier-1 Resource Usage per LHC VO Graphical Plot showing Usage per VO Select Tier1 View / Country View using navigable tree Select time frame Drill Down to view data belonging to individual sites within a Tier1/country

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Batch System Support APEL supports PBS and LSF Implementations are separate and independent of one another. Currently LCG2_4_0 has PBS support only. What is the current status about LSF Support? support.ac.uk/gridsite/accounting/lsf_dev.htmlhttp://goc.grid- support.ac.uk/gridsite/accounting/lsf_dev.html On this page you find release candidate edg-rgma-lsf noarch.rpm LSF currently comes in three flavours (4, 5 and 6), each has a different usage record format New RPM edg-rgma-apel-lsf has been released to CERN for testing. Expect both to be part of LCG 2_5_0

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ CERN Data Meeting with Harry and Thorsten (17 th May) Run APEL-LSF over full 2005 dataset Compare APEL numbers to internal LSF reporting

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ LSF Deployment at CERN 1.PBS and LSF implementations use the same APEL core. This has some consequences for the normalisation model. 1.We use a global parameter to normalise data belonging to each site which is published by the CE. 2.PBS uses internal normalisation model, lsf has no such model 3.Two ways to perform nomalisation in lsf: 1.normalise record by record (exact but requires converting a cpu factor into an SI00 value) or 2.globally (approximate method) as is done with PBS. 2.CERN has a heterogeneous farm. They calculate a weighted mean on a daily basis and publish this value on the CE (via ldif : GlueHostBenchMark00) 3.Accuracy of Method 1.We will compare numbers with CERN very shortly.

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Accounting Dissemination 1.CERN Courier 2.LCG Computing Newsletter (slightly more technical) 3.AHM 2005 (more technical still)

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Issues 1.Which RPM Version? Latest version on for LCG2_4_0 Change Log to oApel (April 6 th ) Startup script modified for RGMA 2_4_0 s/w release oApel (Mar 20 th ) Improved core functionality oBetter handling of dn suppression oCheck flexible archiver on-line before attempting to send job records oApel (Feb 2 nd ) Minor fix to SQL script oApel (Jan 17 th )  Normalisation issue (see later)  CatchAll specInt/specFloat set to value in GIIS rather than 0 oApel (Dec 16 th ) Current PBS log excluded from archive oApel (Nov 19 th )  Bug in “reprocess” option during Join  Added “cleanAll” option oApel (Oct 14 th )  grant mechanism to allow GK and CE to connect to MySQL database

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Issues 2.VO Filtering National Grid VOs activities running on same infrastructure as EGEE/LCG Privacy reasons why sites don’t want to publish National VO data to GOC APEL does not discriminate between the VOs Develop a solution? What can we suggest today? GOCDB can hide resources APEL made the requirement to exclude Local work not published but non LCG work does come through. Whats the model 1 CE per VO…what do people do? Don’t need to install Apel on non-LCG VO CEs SARA-LCG2, IISAS-Bratislava GridPP?

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Issues 3.Development of Tests to Check the Accounting Service Is site accounting working? Is the GOC listening for new data? Is the RGMA Registry working? GSTAT oGOC Flexible archiver service listens for accounting producers oIf the service is down, no data can be sent to the GOC! oUse the service every 5 minutes to update a timestamp in a test record in the accounting database. GSTAT can query table, look at the timestamp and compare with the current date/time. o3 rd party to use the flexi service. oUse RGMA to compare records in the site database and GOC Site Functional Tests oCan check the RPM version installed on the CE Testing the Whole Thing instead of the Pieces oInvestigate an Apel “heart-beat” oSite cron writes a test record every hour and publishes to GOC

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Issues 4.Which Log Files Should Site Administrators Backup? To build accounting records, we need to process data from THREE log file sources. This is a mandatory requirement in order to reconstruct what has been done during the 2004 period. /var/log/globus-gatekeeper* oMatch between grid-user dn to GramScriptJobId /var/spool/pbs/server_priv/accounting/* oLocal jobID and details of resources consumed oNo distinction between grid jobs and non-grid jobs. /var/log/messages* oMap GramScriptJobID to local JobID This is how we separate grid jobs from local user jobs which run on the local fabric. If the site has deleted its messages files, we may be able to work around this by matching local unix groups in the batch logs. Accounting records formed in this way will not contain the dn of the grid-user.

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Issues 5.siteName Changes –Recent problem with presenting data from the French ROC where CCIN2P3 was renamed to IN2P3-CC via GOCDB portal –All records associated with the site are updated in order for SQL queries to match the new siteName. 6.Namespace Convention? –Naming scheme to identify data belonging to large sites which provide services for different communities etc. –NIKHEF: lcgprod.nikhef.nl, lcg2prod.nikhef.nl, edgapptb.nikhef.nl –*SiteName* is a bad choice because we get multiple hits o*IC-LCG2* gives multiple matches PIC-LCG2 and IFIC-LCG2 –Request sites stick to the convention *.SiteName oh1.desy.de, zeus.desy.de

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Issues 7.Normalisation –We want to perform a reasonably sensible first order estimate to account for the differences in worker node performance. –Homogeneous vs Heterogeneous –PBS Job Records don’t have any information about the worker node benchmarks, so we must insert one manually –PBS Farms setup in different ways; can lead to an error in the normalisation calculation (Blindman vs internal normalisation) –Histories - What SpecInts do we use in order to process archived Job Records? –LSF Job Records have a CPU_FACTOR (1 - 4) in the Job Record. oWhat does a value of 1 correspond to? oDifferent “calibration” value at each site oConversion table? oCan the site publish a weighted specInt2000 for the farm?

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Issues 8.Service Reliability & Hardening –If flexible archiver is down, sites unable to publish data to GOC  Update: /43 Apel core checks if flexible archiver service is available before attempting to publish data.  GOC publishes a test record every 5 minutes to check the service is alive: automatic service recovery mechanism now in place –Investigate running multiple flexible archiver services  1 per GOC or 1 per ROC?  At the moment, the archiver service listens for all producers rather than producers belonging to a ROC. –Single point of failure if registry is down?  Multiple registry replicas supported in the RC1 (gLite) release?  Update: Multiple registries supported in LCG2_4_0 ?

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Future Plans 1.Interoperate 2.CERN Courier / LCG News 3.Wiki Pages

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ APEL and gLite –Is APEL integrated in g-Lite?  Work currently in progress.  We have ported the APEL code into the gLite CVS repository but need to understand functional differences e.g. WMS and use of Condor  3 Components: Core + PBS plugin + LSF plugin  Sent our requirements to Erwin Laure….waiting for information. –What about its deployment plan?  As soon as possible  …but would also like to add some new features Global Job ID to link with L&B DN to VO mapping

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ LCG Accounting Project involves combining results from all three infrastructures and presenting an aggregated view –Peer Infrastructures in LCG  Open Science Grid (Ruth Pordes, Philippe Canal, Matteo Melani)  Nordugrid (Per Oster)  EGEE  Currently, LHCView filters LHC VO data from EGEE accounting data.

Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/ Requirements Combine results from all three infrastructures … –Ideally: Distributed queries to multiple databases  Each peer manages an accounting database  LHC VO filtering provided through a web services interface –Initial Implementation: Centralised Collection  Peers publish data into a global database  WebServices or direct MySql inserts Common Problem: Different Grid infrastructures may use different Schemas. GGF define a schema, but quite flexible. May need “translators” to convert from one schema to another. (already exist)