Use of R-GMA in BOSS Henry Nebrensky (Brunel University) VRVS 26 April 2004 Some slides stolen from various talks at EDG 2 nd Review (http://documents.cern.ch/AGE/current/fullAgenda.php?ida=a021814),http://documents.cern.ch/AGE/current/fullAgenda.php?ida=

Slides:



Advertisements
Similar presentations
21 Sep 2005LCG's R-GMA Applications R-GMA and LCG Steve Fisher & Antony Wilson.
Advertisements

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 CMS Status Progress towards GridPP milestones Data management – the.
Workload Management David Colling Imperial College London.
IEEE NSS 2003 Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what.
CMS Grid Batch Analysis Framework
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
LNL CMS M.Biasotto, Bologna, 29 aprile LNL Analysis Farm Massimo Biasotto - LNL.
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
CMS-ARDA Workshop 15/09/2003 CMS/LCG-0 architecture Many authors…
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
A tool to enable CMS Distributed Analysis
CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Tech talk 20th June Andrey Grid architecture at PHENIX Job monitoring and related stuff in multi cluster environment.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Real Time Monitor of Grid Job Executions Janusz Martyniak Imperial College London.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
PNPI HEPD seminar 4 th November Andrey Shevel Distributed computing in High Energy Physics with Grid Technologies (Grid tools at PHENIX)
Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Andrey Meeting 7 October 2003 General scheme: jobs are planned to go where data are and to less loaded clusters SUNY.
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
WP3 Information and Monitoring Steve Fisher / RAL 23/9/2003.
Grid Operations Centre LCG Accounting Trevor Daniels, John Gordon GDB 8 Mar 2004.
EGEE is a project funded by the European Union under contract IST R-GMA: Production Services for Information and Monitoring in the Grid John.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
LCG Accounting John Gordon Grid Deployment Board 13 th January 2004.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions.
WP3 Information and Monitoring Rob Byrom / WP3
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
CERN 21 January 2005Piotr Nyczyk, CERN1 R-GMA Basics and key concepts Monitoring framework for computing Grids – developed by EGEE-JRA1-UK, currently used.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
gLite Information System(s)
BDII Performance Tests
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
R-GMA as an example of a generic framework for information exchange
Introduction to Grid Technology
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Scalability Tests With CMS, Boss and R-GMA
gLite Information System(s)
R-GMA (Relational Grid Monitoring Architecture) for monitoring applications “s” gLite and LCG.
Job Application Monitoring (JAM)
Presentation transcript:

Use of R-GMA in BOSS Henry Nebrensky (Brunel University) VRVS 26 April 2004 Some slides stolen from various talks at EDG 2 nd Review ( WP3 overview at GridPP middleware mtg. (???), WP1-WP7, CERN, 18th June 2002 ( and Claudio Grandi’s talk at CHEP’03

R-GMA Grid monitoring infrastructure Based on GGF GMA Discrete consumers and producers Registry acts as matchmaker

R-GMA More on R-GMA see e.g.“ RGMA deployment ” at

Basic BOSS components boss executable: the BOSS interface to the user MySQL database: where BOSS stores job information jobExecutor executable: the BOSS wrapper around the user job dbUpdator executable: the process that writes to the database while the job is running Local scheduler may be a “Grid” scheduler

Basic flow Accepts job submission from users Stores info about job in a DB Builds a wrapper around the job (jobExecutor) Sends the wrapper to the local scheduler The wrapper sends to the DB info about the job boss submit boss query boss kill BOSS DB BOSS Scheduler farm node Wrapper

BOSS CMSGrid SE CE CMS software BOSS DB Workload Management System JDL RefDB parameters data registration Job output filtering Runtime monitoring input data location Push data or info Pull info UI IMPALA/BOSS Replica Manager CE CMS software CE WN SE CE CMS software SE

Use of R-GMA in BOSS BOSS DB UI IMPALA/BOSS WN Sandbox BOSS wrapper Job Tee OutFile R-GMA API Farm servlets Receiver servlets Registry Receiver a5b 6

Use of R-GMA in BOSS Publish each update into R-GMA as a separate message – separate row Each producer gives host and name of “home” BOSS DB, and jobId; this identifies it uniquely Receiver looks for all rows relating to its DB; uses jobId and jobType to do SQL UPDATE

Use of R-GMA in BOSS The screenshot below shows the streamed output messages from a Brunel job (ID 112) being sent through R-GMA and displayed using the EDG Pulse tool from WP3. As Pulse can monitor multiple producers, it also shows the output from a longer job already running at Imperial (ID 72). The receivers that update the BOSS databases use the bossDatabaseHost and bossDatabaseName fields to select only the relevant messages, so that the database at each institute is updated with only the information about its own jobs.

Use of R-GMA in BOSS (1) R-GMA smoothes “firewall” issues Consumer can watch many producers; producers can feed multiple consumers. Provides uniform access to range of monitoring data (WP7 network, etc.) Doesn’t depend on other EDG components

Use of R-GMA in BOSS (2) BOSS job wrapper uses an R-GMA StreamProducer and C++ API –Can define minimum retention period –No guarantees BOSS receiver implemented in Java

Scalability Tests With CMS, Boss and R-GMA Stolen from Rob Byrom’s slides at (Presented at 2003 IEEE/NSS mtg, sub. to Trans. Nuc. Sci.)

Test Motivation Want to ensure R-GMA can cope with volume of expected traffic and is scalable. CMS production load estimated at around 2000 jobs. Initial tests with v only managed about could do better .

Test Design A simulation of the CMS production system was created. –A Java MC simulation was designed to represent a typical job. –Each job creates a stream producer. –Each job publishes a number of tuples depending on the job phase. –Each job contains 3 phases with varying time delays. –Emulates “CMSIM” message publishing pattern, but so far with 10 hour run time compressed into a minute … –… so actually have fewer simultaneous jobs than real case, but overall a much higher rate of message production.

Test Design An Archiver scoops up published tuples. –The Archiver db used is a representation of the BOSS db, but stores history of received messages, rather than just a cumulative update. –Archived tuples are compared with published tuples to verify the test outcome.

Topology IC (registry) Archiver Mon BoxSP Mon Box MC Sim Archiver Client Test Output Test verification HistoryProducer DB

Topology Archiver Mon Box Archiver Client Test verification MC Sims SP Mon Boxes Test Output IC (registry) HistoryProducer DB

Test Setup Archiver & SP mon box setup at Imperial. SP mon box & IC setup at Brunel. Archiver and MC sim clients positioned at various nodes within both sites. Tried 1 MC sim and Archiver with variable Job submissions. Also setup similar test on WP3 testbed using 2 MC sims and 1 Archiver.

Results 1 MC sim creating 2000 jobs and publishing 7600 tuples proven to work without glitch (R-GMA v3.4.13) Demonstrated 2 MC sims each running 4000 jobs (with published tuples) on the WP3 testbed.

Pitfalls Encountered Lots of fun integration problems. –Firewall access between imperial and Brunel initially blocked for streaming data (port 8088). –Limitation on number of open streaming sockets – 1K. –Discovered lots of OutOfMemoryErrors. –Various configurations problems at both imperial and Brunel sites. –Caused many test runs to fail. Probably explained poor initial performance.

Weaknesses Test is time consuming to set-up. –MC and Archiver are are manually started. –Analysis bash script takes forever to complete. Requires continual attention while running. –To ensure the MC simulation is working correctly. –Test takes considerable time for large number of jobs (ie > 1000). Need to run multiple MC sims. –To generate more realistic load. –How to we account for ‘other’ R-GMA traffic?

Improving the Design Need to automate! Improvements made so far. –CMS test comes as part of the ‘performance’ R-GMA RPM. –MC simulation can be configured to occur repeatedly with random job ids. –Rather than monitoring the STDOUT from the MC Sim, test results are published to an R-GMA table. –Analysis script now implemented in java; Verification of data now takes seconds rather than hours! –CMS test can now be used as part of the WP3 testing framework and is ready for Nagios integration.

Measuring Performance Need a nifty way of measuring performance! Things to measure. –Average time taken for a tuple published via the MC simulation to arrive at the Archiver. –The tuple throughput of the CMS test (eg how do we accurately measure the rate at which tuples are archived). –Would be nice to measure the number of jobs each mon box can carry before hitting memory limits. Need to define a hardware spec that satisfies a level of performance.

Summary After initial configuration problems tests were successful -. But scalability of test is largely dependent on the specs of the Stream Producer/Archiver Mon box. Possible to keep increasing number of submitted Jobs but will eventually hit an upper memory limit. Need more accurate ways to record performance particularly for data and time related measurements. Need to pin down exact performance requirements!