Scalability Tests With CMS, Boss and R-GMA

Slides:



Advertisements
Similar presentations
CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 CMS Status Progress towards GridPP milestones Data management – the.
Advertisements

Workload Management David Colling Imperial College London.
IEEE NSS 2003 Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what.
CMS Grid Batch Analysis Framework
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
LNL CMS M.Biasotto, Bologna, 29 aprile LNL Analysis Farm Massimo Biasotto - LNL.
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
INFN - Ferrara BaBarGrid Meeting SPGrid Efforts in Italy BaBar Collaboration Meeting - SLAC December 11, 2002 Enrica Antonioli - Paolo Veronesi.
CMS-ARDA Workshop 15/09/2003 CMS/LCG-0 architecture Many authors…
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
A tool to enable CMS Distributed Analysis
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management.
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
Rsv-control Marco Mambelli – Site Coordination meeting October 1, 2009.
Use of R-GMA in BOSS Henry Nebrensky (Brunel University) VRVS 26 April 2004 Some slides stolen from various talks at EDG 2 nd Review (
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
- Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
Nadia LAJILI User Interface User Interface 4 Février 2002.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
WP8 Status – Stephen Burke – 30th January 2003 WP8 Status Stephen Burke (RAL) (with thanks to Frank Harris)
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Andrew McNabETF Firewall Meeting, NeSC, 5 Nov 2002Slide 1 Firewall issues for Globus 2 and EDG Andrew McNab High Energy Physics University of Manchester.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
WP3 Information and Monitoring Steve Fisher / RAL 23/9/2003.
EGEE is a project funded by the European Union under contract IST R-GMA: Production Services for Information and Monitoring in the Grid John.
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
CE design report Luigi Zangrando
EU 2nd Year Review – Feb – WP1 Demo – n° 1 WP1 demo Grid “logical” checkpointing Fabrizio Pacini (Datamat SpA, WP1 )
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
BaBar-Grid Status and Prospects
The EDG Testbed Deployment Details
Information System Valeria Ardizzone INFN
U.S. ATLAS Grid Production Experience
Software Architecture in Practice
gLite Information System(s)
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Sergio Fantinel, INFN LNL/PD
Introduction to Grid Technology
CRAB and local batch submission
BOSS: the CMS interface for job summission, monitoring and bookkeeping
5. Job Submission Grid Computing.
Survey on User’s Computing Experience
gLite Information System(s)
Job Application Monitoring (JAM)
Site availability Dec. 19 th 2006
Presentation transcript:

Scalability Tests With CMS, Boss and R-GMA EDG WP3 Meeting 14/01/04 – 16/01/04

Boss Overview CMS jobs are submitted via a UI node for data analysis. Individual jobs are wrapped in a boss executable. Jobs are then delegated to a local batch farm managed by (for example) PBS. When a job is executed, the boss wrapper spawns a separate process to catch input, output and error streams. Job status is then redirected to a local DB. EDG WP3 Meeting 14/01/04 – 16/01/04

Boss CMS EDG RefDB BOSS DB WN SE CE Workload Management System SE UI parameters Job output filtering Runtime monitoring WN SE CE CMS software data registration Workload Management System SE UI IMPALA/BOSS JDL CE CMS software input data location SE CE Replica Manager Push data or info Pull info CE CMS software SE EDG WP3 Meeting 14/01/04 – 16/01/04

Where R-GMA Fits In Boss designed for use within a local batch farm. If a job is scheduled on a remote compute element, status info needs to be sent back the submitter’s site. Within a grid environment we want to collate job info from potentially many different farms. Job status info should be filtered depending on where the user is located – the use of predicates would therefore be ideal. R-GMA fits in nicely. Boss wrapper makes use of the R-GMA API. Job status updates are then published using a stream producer. An archiver positioned at each UI node can then scoop up relevant job info and dump it into the locally running boss db. Users can then access job status info from the UI node. EDG WP3 Meeting 14/01/04 – 16/01/04

Use of R-GMA in BOSS BOSS wrapper Job Tee OutFile R-GMA API Sandbox WN UI IMPALA/BOSS WN BOSS DB Receiver servlets CE/GK servlets Registry Job EDG WP3 Meeting 14/01/04 – 16/01/04

Test Motivation Want to ensure R-GMA can cope with volume of expected traffic and is scalable. CMS production load estimated at around 2000 jobs. Initial tests with v3-3-28 only managed about 400 - could do better . EDG WP3 Meeting 14/01/04 – 16/01/04

Test Design A simulation of the CMS production system was created. An MC simulation was designed to represents a typical job. Each job creates a stream producer. Each job publishes a number of tuples depending on the job phase. Each job contains 3 phases with varying time delays. An Archiver scoops up published tuples. The Archiver db used is a representation of the boss db. Archived tuples are compared with published tuples to verify the test outcome. EDG WP3 Meeting 14/01/04 – 16/01/04

Topology SP Mon Box Archiver Mon Box Archiver Client Boss DB IC MC Sim Test verification Test Output EDG WP3 Meeting 14/01/04 – 16/01/04

Topology Archiver Mon Box SP Mon Boxes Archiver Client IC Boss DB MC Sims Test verification Test Output EDG WP3 Meeting 14/01/04 – 16/01/04

Test Setup Archiver & SP mon box setup at Imperial. SP mon box & IC setup at Brunel. Archiver and MC sim clients positioned at various nodes within both sites. Tried 1 MC sim and Archiver with variable Job submissions. Also setup similar test on WP3 testbed using 2 MC sims and 1 Archiver. EDG WP3 Meeting 14/01/04 – 16/01/04

Results 1 MC sim creating 2000 jobs and publishing 7600 tuples proven to work without clitch. Demonstrated 2 MC sims each running 4000 jobs (with 15200 published tuples) on the WP3 testbed. EDG WP3 Meeting 14/01/04 – 16/01/04

Pitfalls Encountered Lots of fun integration problems. Firewall access between imperial and Brunel initially blocked for streaming data (port 8088). Limitation on number of open streaming sockets – 1K. Discovered lots of OutOfMemoryErrors. Various configurations problems at both imperial and Brunel sites. Caused many test runs to fail. Probably explained poor initial performance. EDG WP3 Meeting 14/01/04 – 16/01/04

Weaknesses Test is time consuming to set-up. MC and Archiver are are manually started. Analysis bash script takes forever to complete. Requires continual attention while running. To ensure the MC simulation is working correctly. Test takes considerable time for large number of jobs (ie > 1000). Need to run multiple MC sims. To generate more realistic load. How to we account for ‘other’ R-GMA traffic? EDG WP3 Meeting 14/01/04 – 16/01/04

Improving the Design Need to automate! Improvements made so far. CMS test comes as part of the ‘performance’ R-GMA RPM. MC simulation can be configured to occur repeatedly with random job ids. Rather than monitoring the STDOUT from the MC Sim, test results are published to an R-GMA table. Analysis script now implemented in java; Verification of data now takes seconds rather than hours! CMS test can now be used as part of the WP3 testing framework and is ready for Nagios integration. EDG WP3 Meeting 14/01/04 – 16/01/04

Measuring Performance Need a nifty way of measuring performance! Things to measure. Average time taken for a tuple published via the MC simulation to arrive at the Archiver. The tuple throughput of the CMS test (eg how do we accurately measure the rate at which tuples are archived). Would be nice to measure the number of jobs each mon box can carry before hitting memory limits. Need to define a hardware spec that satisfies a level of performance. EDG WP3 Meeting 14/01/04 – 16/01/04

Summary After initial configuration problems tests were successful - . But scalability of test is largely dependent on the specs of the Stream Producer/Archiver Mon box. Possible to keep increasing number of submitted Jobs but will eventually hit an upper memory limit. Need more accurate ways to record performance particularly for data and time related measurements. Need to pin down exact performance requirements! EDG WP3 Meeting 14/01/04 – 16/01/04