Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Grids for E-sciencE www.eu-egee.org APEL Accounting update Dave Kant (presented by Jeremy Coles) 2 nd EGEE/LCG Operations Workshop Bologna 25.

Similar presentations


Presentation on theme: "Enabling Grids for E-sciencE www.eu-egee.org APEL Accounting update Dave Kant (presented by Jeremy Coles) 2 nd EGEE/LCG Operations Workshop Bologna 25."— Presentation transcript:

1 Enabling Grids for E-sciencE www.eu-egee.org APEL Accounting update Dave Kant (presented by Jeremy Coles) 2 nd EGEE/LCG Operations Workshop Bologna 25 th May 2005

2 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 2 Overview This is a summary of the status of Accounting & Reporting following its deployment in LCG2_4_0 (based on talk given to LCG deployment board last week) 1.Overview (mainly for background reading) 2.APEL Design (mainly for background reading) 3.What’s New? 4.LCG Accounting (OSG, NorduGrid, EGEE) 5.Issues (for discussion)

3 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 3 Requirement Capture Originally a requirement of the LHC Computing Grid project. Requirements were first captured through presentations to –LCG’s Grid Deployment Board –Deployment Team. –LHC experiments and the Tier1 centres are represented on the GDB. Subsequent input from –EGEE ROC Managers

4 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 4 Requirements A historical record of grid usage to identify the use of individual sites by VOs as a function of time To demonstrate the total delivery of resources by that site to the Grid Aggregated views of the collected data grouped by: –Virtual Organisation –Country – a requirement of LCG which has a country-based structure –EGEE Region – for use by EGEE Regional Operations Centre (ROC) A presentation front-end to the data to allow the selection on- demand of the views described above for different VOs and periods of time. To present the data as –A graphical view for interpretation –A tabular view for precision To support sites that already had their own methods of data collection by allowing arbitrary data collection techniques and insertion of the data in the standard schema into the central database.

5 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 5 Requirements It was not an explicit requirement that user information be captured but we included this in the design as we were sure this would be a secondary requirement This is a reporting (or metering) system, not a charging mechanism. The information is under the control of the site, –so it does not meet the requirement of a charging system to be digitally signed by the user and irrefutable. –collection is thus lightweight and efficient Information is gathered centrally, not under the control of the VO

6 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 6 Design Information collected at each site from batch logs, gatekeeper logs etc Information joined at site level to select grid jobs and stored in database on R-GMA MON box at site. Information published through R-GMA and collected centrally in an R-GMA archive at GOC Web site presents various views of this data for presentation Structure of Grid taken from GOC DB – the grid configuration database.

7 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 7

8 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 8 How APEL Works? PBS/LSF log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> PbsRecords table Gatekeeper log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> GkRecords table Message log processed daily on site CE to extract required data, filter acts as R-GMA DBProducer -> MessageRecords table Site GIIS interrogated daily on site CE to obtain SpecInt and SpecFloat values for CE, acts as DBProducer -> SpecRecords table, one dated record per day These three tables joined daily on MON to produce LcgRecords table. As each record is produced program acts as StreamProducer to send the entries to the LcgRecords table on the GOC site. Site now has table containing its own accounting data; GOC has aggregated table over whole of LCG. Interactive and regular reports produced by site or at GOC site as required.

9 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 9 Job Records In via RGMA RGMA MON SQL QUERY TO Accounting Server 1 Query / Hour On-Demand Accounting Pages based on SQL queries to summary data 1 Record per Grid Job (Millions of records expected) Summary data refreshed every hour (Max records about 100K per year) Home Page User queries Graphs GOC

10 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 10 Description Web allows information to be selected by –VO, time range, (Whole Grid, Country, EGEE Region, site) Also shows information on data collected

11 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 11 Web form to apply selection criteria on the data Aggregate data across an organisation structure (Default= All ROCs) Select VOs (Default = All) Select date range Selectable views

12 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 12 VO Index Summed CPU (Seconds) consumed by resources in selected Region Selected Date Range

13 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 13 List of Sites Belonging to the Selected ROC A breakdown of the resource usage per Site, per VO, per Month

14 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 14 72 Sites publishing data to GOC (May 17th 2005) Over 1.7 Million Job records ~ 50K records per week http://goc.grid-support.ac.uk/ /

15 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 15 What is New? –Original interface designed for EGEE DSA1.3  ROC and CIC views –Added LHC View to the reporting interface  Requirements driven by RRB / Kors Bos  Tier-1 and Country entry points  LHC VO only  All data normalised in units of 1000. SI2000. Hour  Tabular Summaries per Tier1/ Country

16 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 16 Breakdown of Usage per Country / Tier1 Tier-1 Resource Usage per LHC VO Graphical Plot showing Usage per VO Select Tier1 View / Country View using navigable tree Select time frame Drill Down to view data belonging to individual sites within a Tier1/country

17 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 17 Batch System Support APEL supports PBS and LSF Implementations are separate and independent of one another. Currently LCG2_4_0 has PBS support only. What is the current status about LSF Support? http://goc.grid- support.ac.uk/gridsite/accounting/lsf_dev.htmlhttp://goc.grid- support.ac.uk/gridsite/accounting/lsf_dev.html On this page you find release candidate edg-rgma-lsf-1.0.0-4.noarch.rpm LSF currently comes in three flavours (4, 5 and 6), each has a different usage record format New RPM edg-rgma-apel-lsf has been released to CERN for testing. Expect both to be part of LCG 2_5_0

18 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 18 CERN Data Meeting with Harry and Thorsten (17 th May) Run APEL-LSF over full 2005 dataset Compare APEL numbers to internal LSF reporting

19 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 19 LSF Deployment at CERN 1.PBS and LSF implementations use the same APEL core. This has some consequences for the normalisation model. 1.We use a global parameter to normalise data belonging to each site which is published by the CE. 2.PBS uses internal normalisation model, lsf has no such model 3.Two ways to perform nomalisation in lsf: 1.normalise record by record (exact but requires converting a cpu factor into an SI00 value) or 2.globally (approximate method) as is done with PBS. 2.CERN has a heterogeneous farm. They calculate a weighted mean on a daily basis and publish this value on the CE (via ldif : GlueHostBenchMark00) 3.Accuracy of Method 1.We will compare numbers with CERN very shortly.

20 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 20 Accounting Dissemination 1.CERN Courier 2.LCG Computing Newsletter (slightly more technical) 3.AHM 2005 (more technical still)

21 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 21 Issues 1.Which RPM Version? Latest version on http://goc.grid-support.ac.uk/gridsite/accountinghttp://goc.grid-support.ac.uk/gridsite/accounting 3.4.44 for LCG2_4_0 Change Log 3.4.37 to 3.4.43 oApel 3.4.43 (April 6 th ) Startup script modified for RGMA 2_4_0 s/w release oApel 3.4.42 (Mar 20 th ) Improved core functionality oBetter handling of dn suppression oCheck flexible archiver on-line before attempting to send job records oApel 3.4.41 (Feb 2 nd ) Minor fix to SQL script oApel 3.4.40 (Jan 17 th )  Normalisation issue (see later)  CatchAll specInt/specFloat set to value in GIIS rather than 0 oApel 3.4.39 (Dec 16 th ) Current PBS log excluded from archive oApel 3.4.38 (Nov 19 th )  Bug in “reprocess” option during Join  Added “cleanAll” option oApel 3.4.37 (Oct 14 th )  grant mechanism to allow GK and CE to connect to MySQL database

22 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 22 Issues 2.VO Filtering National Grid VOs activities running on same infrastructure as EGEE/LCG Privacy reasons why sites don’t want to publish National VO data to GOC APEL does not discriminate between the VOs Develop a solution? What can we suggest today? GOCDB can hide resources APEL made the requirement to exclude Local work not published but non LCG work does come through. Whats the model 1 CE per VO…what do people do? Don’t need to install Apel on non-LCG VO CEs SARA-LCG2, IISAS-Bratislava GridPP?

23 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 23 Issues 3.Development of Tests to Check the Accounting Service Is site accounting working? Is the GOC listening for new data? Is the RGMA Registry working? GSTAT oGOC Flexible archiver service listens for accounting producers oIf the service is down, no data can be sent to the GOC! oUse the service every 5 minutes to update a timestamp in a test record in the accounting database. GSTAT can query table, look at the timestamp and compare with the current date/time. o3 rd party to use the flexi service. oUse RGMA to compare records in the site database and GOC Site Functional Tests oCan check the RPM version installed on the CE Testing the Whole Thing instead of the Pieces oInvestigate an Apel “heart-beat” oSite cron writes a test record every hour and publishes to GOC

24 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 24 Issues 4.Which Log Files Should Site Administrators Backup? To build accounting records, we need to process data from THREE log file sources. This is a mandatory requirement in order to reconstruct what has been done during the 2004 period. /var/log/globus-gatekeeper* oMatch between grid-user dn to GramScriptJobId /var/spool/pbs/server_priv/accounting/* oLocal jobID and details of resources consumed oNo distinction between grid jobs and non-grid jobs. /var/log/messages* oMap GramScriptJobID to local JobID This is how we separate grid jobs from local user jobs which run on the local fabric. If the site has deleted its messages files, we may be able to work around this by matching local unix groups in the batch logs. Accounting records formed in this way will not contain the dn of the grid-user.

25 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 25 Issues 5.siteName Changes –Recent problem with presenting data from the French ROC where CCIN2P3 was renamed to IN2P3-CC via GOCDB portal –All records associated with the site are updated in order for SQL queries to match the new siteName. 6.Namespace Convention? –Naming scheme to identify data belonging to large sites which provide services for different communities etc. –NIKHEF: lcgprod.nikhef.nl, lcg2prod.nikhef.nl, edgapptb.nikhef.nl –*SiteName* is a bad choice because we get multiple hits o*IC-LCG2* gives multiple matches PIC-LCG2 and IFIC-LCG2 –Request sites stick to the convention *.SiteName oh1.desy.de, zeus.desy.de

26 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 26 Issues 7.Normalisation –We want to perform a reasonably sensible first order estimate to account for the differences in worker node performance. –Homogeneous vs Heterogeneous –PBS Job Records don’t have any information about the worker node benchmarks, so we must insert one manually –PBS Farms setup in different ways; can lead to an error in the normalisation calculation (Blindman vs internal normalisation) –Histories - What SpecInts do we use in order to process archived Job Records? –LSF Job Records have a CPU_FACTOR (1 - 4) in the Job Record. oWhat does a value of 1 correspond to? oDifferent “calibration” value at each site oConversion table? oCan the site publish a weighted specInt2000 for the farm?

27 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 27 Issues 8.Service Reliability & Hardening –If flexible archiver is down, sites unable to publish data to GOC  Update: 3.4.42/43 Apel core checks if flexible archiver service is available before attempting to publish data.  GOC publishes a test record every 5 minutes to check the service is alive: automatic service recovery mechanism now in place –Investigate running multiple flexible archiver services  1 per GOC or 1 per ROC?  At the moment, the archiver service listens for all producers rather than producers belonging to a ROC. –Single point of failure if registry is down?  Multiple registry replicas supported in the RC1 (gLite) release?  Update: Multiple registries supported in LCG2_4_0 ?

28 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 28 Future Plans 1.Interoperate 2.CERN Courier / LCG News 3.Wiki Pages

29 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 29 APEL and gLite –Is APEL integrated in g-Lite?  Work currently in progress.  We have ported the APEL code into the gLite CVS repository but need to understand functional differences e.g. WMS and use of Condor  3 Components: Core + PBS plugin + LSF plugin  Sent our requirements to Erwin Laure….waiting for information. –What about its deployment plan?  As soon as possible  …but would also like to add some new features Global Job ID to link with L&B DN to VO mapping

30 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 30 LCG Accounting Project involves combining results from all three infrastructures and presenting an aggregated view –Peer Infrastructures in LCG  Open Science Grid (Ruth Pordes, Philippe Canal, Matteo Melani)  Nordugrid (Per Oster)  EGEE  Currently, LHCView filters LHC VO data from EGEE accounting data.

31 Enabling Grids for E-sciencE Operations Workshop, Bologna 25/05/2005 31 Requirements Combine results from all three infrastructures … –Ideally: Distributed queries to multiple databases  Each peer manages an accounting database  LHC VO filtering provided through a web services interface –Initial Implementation: Centralised Collection  Peers publish data into a global database  WebServices or direct MySql inserts Common Problem: Different Grid infrastructures may use different Schemas. GGF define a schema, but quite flexible. May need “translators” to convert from one schema to another. (already exist)


Download ppt "Enabling Grids for E-sciencE www.eu-egee.org APEL Accounting update Dave Kant (presented by Jeremy Coles) 2 nd EGEE/LCG Operations Workshop Bologna 25."

Similar presentations


Ads by Google