BOSS: the CMS interface for job summission, monitoring and bookkeeping

Slides:



Advertisements
Similar presentations
1 14 Feb 2007 CMS Italia – Napoli A. Fanfani Univ. Bologna A. Fanfani University of Bologna MC Production System & DM catalogue.
Advertisements

21 Sep 2005LCG's R-GMA Applications R-GMA and LCG Steve Fisher & Antony Wilson.
Metadata Progress GridPP18 20 March 2007 Mike Kenyon.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Use of R-GMA in BOSS Henry Nebrensky (Brunel University) VRVS 26 April 2004 Some slides stolen from various talks at EDG 2 nd Review (
Tech talk 20th June Andrey Grid architecture at PHENIX Job monitoring and related stuff in multi cluster environment.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
Grid Workload Management Massimo Sgaravatto INFN Padova.
DIRAC Review (13 th December 2005)Stuart K. Paterson1 DIRAC Review Exposing DIRAC Functionality.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Andrey Meeting 7 October 2003 General scheme: jobs are planned to go where data are and to less loaded clusters SUNY.
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Transformation System report Luisa Arrabito 1, Federico Stagni 2 1) LUPM CNRS/IN2P3, France 2) CERN 5 th DIRAC User Workshop 27 th – 29 th May 2015, Ferrara.
Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
CE design report Luigi Zangrando
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
Practical using C++ WMProxy API advanced job submission
L’analisi in LHCb Angelo Carbone INFN Bologna
Database Replication and Monitoring
(on behalf of the POOL team)
CMS High Level Trigger Configuration Management
Moving the LHCb Monte Carlo production system to the GRID
GWE Core Grid Wizard Enterprise (
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Introduction to Grid Technology
CRAB and local batch submission
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Short update on the latest gLite status
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Scalability Tests With CMS, Boss and R-GMA
Job workflow Pre production operations:
LCG middleware and LHC experiments ARDA project
Wide Area Workload Management Work Package DATAGRID project
gLite Job Management Christos Theodosiou
Status and plans for bookkeeping system and production tools
Production Manager Tools (New Architecture)
Presentation transcript:

BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi , P. Capiluppi, G. Codispoti, C. Grandi INFN - Bologna, Italy D.Colling, B.McEvoy, S.Wakefield, Y.J. Zhang Imperial College London, UK Egee User Forum March 2th 2006

BOSS: Batch Object Submission System A tool for batch job submission, real time monitoring and bookkeeping interface to local and grid schedulers retrieval of user defined info from process STDOUT/ERR store job-specific logging and bookkeeping info in a relational DB provide real time monitor Optimized for use in distributed environment Glite ready Egee User Forum March 2th 2006

Interfacing batch schedulers BOSS allows transparent use of many local or distributed scheduler (localhost, LSF, PBS, Condor, LCG, gLite, ...) allow to perform standard operations: submit, query, kill, output retrieval plug in system: Script interface: administrator site can modify existing scripts or add its own Sample plug-in’s for many schedulers provided Particular effort developing glite scripts Egee User Forum March 2th 2006

CMS requirements PB of data produced by the online farm and MC simulations Data stored in a distributed environment Access to data in the site where they are stored Multiple processing over the same dataset Chains of processing to be done over datasets (e.g.: sim-digi- reco, monitored stage-in/out operations, analysis processes, etc.) complex jobs to handle A lot of homogeneous processes to be run simultaneously over several datasets Informarsi sulla m,ole di dati MC Egee User Forum March 2th 2006

BOSS job concept A BOSS job is a single elaboration unit Can be made of multiple processing steps (user executables) allows complex workflows: executables chaining Chaining tool may be external Multiple identical jobs can be grouped in Tasks: Logical grouping of jobs compact description of multiple homogeneous jobs using iterators Multiple iterators allowed XML description Egee User Forum March 2th 2006

User defined information A program type can be defined for a given elaboration: Schema for the information to be monitored A new table is created in the BOSS database with a defined structure Algorithms to retrieve the information from the job The program filters are stored in the database Defined user filters: retrieve user defined info from process STDIN/OUT/ERR One or more program types can be specified for a program Jobs standardization (analysis, MC,...) => Applications may define their own program type Egee User Forum March 2th 2006

BOSS wrappers A Wrapping system is needed to manage the execution of several processes on the WN BOSS Wrapping system is made of: a wrapper of the user job (jobExecutor): access to local runtime infos starts chaining of programs starts real time monitor a chainer: allow complex programs workflow Linear chaining provided by default External tools may be also used a program wrapper (programExecutor) access to local runtime info's for the single program starts pre-runtime-post filters, allows access to specific info's Egee User Forum March 2th 2006

Job wrapper Job JobChaining Program Program JobExecutor Program stdin user exec runtime-filter pre-filter post-filter stdout stderr programExecutor Program stdin user exec runtime-filter pre-filter post-filter stdout stderr programExecutor Program stdin user exec runtime-filter pre-filter post-filter stdout stderr programExecutor JobExecutor (wrapper) JobMonitor (real-time updater) Journal Egee User Forum March 2th 2006

Logging and Monitoring Logging provides long term storage of information allowed using a relational DB allowed personal db implementation in SQLite on local disk logging database updated from journal file retrieved at end of job and, optionally at runtime from information in RT server Monitoring provides real time access to logging info using an intermediate Real Time Server RT-clients registered to the BOSS client as plug-in’s may use different servers and technologies : R-GMA, Clarens, MonaLisa etc… allow different RT mechanisms for each job Egee User Forum March 2th 2006

Real-time system One server Two RT-clients the real-time DB server stores temporarily job information while the job is running shared by many users simple structure identification of BOSS-client and of user identification of destination table/variable value of parameter and time-stamp final L&B doesn’t rely on it Two RT-clients the real-time updater that runs on the execution host inserts or updates information about the running job a plug-in used by the boss client on the user interface fetches information about selected jobs and deletes it afterwards Egee User Forum March 2th 2006

Job control and logging Components Worker Node USER PROCESS LOCAL OR GRID SCHEDULER Job control and logging File I/O control BOSS JOB WRAPPER BOSS REAL-TIME UPDATER BOSS JOURNAL Submit or control job Get job running status Retrieve output files Set job logging info (possibly via proxy) User Interface REAL-TIME BOSS DB SERVER BOSS DB BOSS CLIENT Pop job monitoring info Egee User Forum March 2th 2006

BOSS in CMS computing Used in CMS MC production for 4 years Prototype CMS distributed analysis system (GROSS) based on BOSS and later new analysis system using BOSS BOSS v4 with new architecture and many new features Production / analysis tool BOSS Logging & bookkeeping monitoring Egee User Forum March 2th 2006

Glite Bulk submission BOSS modified to profit from glite bulk submission Chains grouped for submission to allow creation of an unique input sandbox with common files Actual implementation of a bulk submission delegated to the scheduler submit script Submission scripts implemented to efficiently use jdl job types: Normal, for single submission Parametric, most compact jdl, iteration over the boss job id: still to investigate if limitations can arise from the possibility to iterate over an unique parameter Collection, more general bulk submission possibilities: a single file keeps all the job jdl's, shared input sandbox allowed Tested version 1.4.1, planned 3.0.0 as soon as it will be ready on the pre-production system (PPS) Egee User Forum March 2th 2006

Summarizing Transparent use of batch systems Provide persistent storage of the logging infos Logging specific info Real Time Monitoring Glite optimization Sandbox packing for efficient use in distributed systems XML task description Command Line Interface Integrability in an experiment framework through API (C++ & Python) Egee User Forum March 2th 2006

Status First version released with a full set of basic functionalities MySQL and SQLite back-ends for local DB MySQL real-time DB – full working RT monitoring XML task description at declaration time, nested iterators allowed Full task description in the database Glite Bulk Submission Basic executable linear chaining, default solution plug-in system for chainer implemented but we need to better understand how to handle external chainers (mainly to configure them allowing the use of the program wrapper) MonaLisa monitoring allowed via APMon plug-in for many schedulers: local submission, lsf, LCG, glite; we are experiencing also some effort for condorG with v3.6 via end user support to allow use within OSG Egee User Forum March 2th 2006

Future plans Allowing the use of chainer plugins, mainly external programs (e.g. SHREEK) Implement more backend possibilities (i.e. ORACLE) Implement more RT monitoring solutions (i.e. R-GMA, Clarens, web services, but also mysql query encryption to avoid firewall problems) Finalize API, increase query possibilities Use external standard libraries (mainly from BOOST) Look at writing wrapper in scripting language i.e Perl/Python Egee User Forum March 2th 2006