DIRAC Production Manager Tools

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Member of the ExperTeam Group Ralf Ratering Pallas GmbH Hermülheimer Straße Brühl, Germany
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Bookkeeping data Monitoring info Get jobs Site A Site B Site C Site D Agent Production service Monitoring service Bookkeeping service Agent © Andrei Tsaregorodtsev.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Distribution After Release Tool Natalia Ratnikova.
Nick Brook Current status Future Collaboration Plans Future UK plans.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
The european ITM Task Force data structure F. Imbeaux.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
Framework of Job Managing for MDC Reconstruction and Data Production Li Teng Zhang Yao Huang Xingtao SDU
LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)
LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
Transformation System report Luisa Arrabito 1, Federico Stagni 2 1) LUPM CNRS/IN2P3, France 2) CERN 5 th DIRAC User Workshop 27 th – 29 th May 2015, Ferrara.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
34 Copyright © 2007, Oracle. All rights reserved. Module 34: Siebel Business Services Siebel 8.0 Essentials.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
V7 Foundation Series Vignette Education Services.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
Architecture Review 10/11/2004
L’analisi in LHCb Angelo Carbone INFN Bologna
MCproduction on the grid
(on behalf of the POOL team)
Belle II Physics Analysis Center at TIFR
Database System Concepts and Architecture
Moving the LHCb Monte Carlo production system to the GRID
The LHCb Software and Computing NSS/IEEE workshop Ph. Charpentier, CERN B00le.
Existing Perl/Oracle Pipeline
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Enterprise Computing Collaboration System Example
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
New developments on the LHCb Bookkeeping
The Ganga User Interface for Physics Analysis on Distributed Resources
R. Graciani for LHCb Mumbay, Feb 2006
Status of CVS repository Production databases Production tools
Chapter 5 Architectural Design.
Gridifying the LHCb Monte Carlo production system
Chapter 5 Architectural Design.
LHCb Data Quality Check web. cern
Status and plans for bookkeeping system and production tools
Production Manager Tools (New Architecture)
Production client status
Scientific Workflows Lecture 15
The LHCb Computing Data Challenge DC06
Presentation transcript:

DIRAC Production Manager Tools Presenter: Gennady Kuznetsov for the LHCb collaboration CHEP 2006, 13-18 February, Mumbai

Outline Introduction to DIRAC DIRAC Authoring Tools Production Repository and Production Agent Automatic Data Processing Conclusion skip this slide Presenter Name: Gennady Kuznetsov

DIRAC – Distributed Infrastructure with Remote Agent Control DIRAC Introduction DIRAC – Distributed Infrastructure with Remote Agent Control LHCb Workload and Data Management system for the distributed data production and physics analysis. (Monte-Carlo simulation, reconstruction…) Composed of a set of light-weight services and a network of distributed agents to deliver workload to computing resources. Integrates computing resources available at LHCb production sites as well as on the LCG grid. It was demonstrated to support 5000 simultaneous jobs across 65 sites with no observed limitation. Currently handles over 70 TB of data. (See talk by Dr. TSAREGORODTSEV, Andrei [301]-DIRAC, the LHCb Data Production and Distributed Analysis system) DIRAC stands for the Distributed Infrastructure with Remote Agent Control It is the LHCb Workload and Data Management system for the distributed data production and physics analysis. Its composed of a set of light-weight services and a network of distributed agents to deliver workload to computing resources. It also integrates computing resources available at LHCb production sites as well as on the LCG grid. It was demonstrated to support 5000 simultaneous jobs across 65 sites with no observed limitation. It currently handles over 70 TB of data. Presenter Name: Gennady Kuznetsov

Production Manager The Production Manager plays a major role in executing organised data production activities within LHCb on a very large scale. His main tasks are: Designing data processing sequences Creating many thousands of data processing jobs Submitting and monitoring jobs Verifying output files Replicating data files in different centres All of these activities can be efficiently fulfilled only by using automated tools to assist the Production Manager in handling many thousands of distributed jobs and files. The Production Manager plays a major role in executing organised data production activities within LHCb on a very large scale. His main tasks includes Designing data processing sequences Creating many thousands of data processing jobs Submitting and monitoring jobs Verifying output files Replicating data files in different centres All of these activities can be efficiently fulfilled only by using automated tools to assist the Production Manager in handling many thousands of distributed jobs and files. Presenter Name: Gennady Kuznetsov

Jobs and Workflow Gauss 1 Gauss 3 Gauss 2 Boole Brunel The Job performs a certain activity that can be described by a complex succession of operations (e.g. running applications). Run Monte Carlo Simulation and write a file xxxxx_1.sim and create spill over events in xxxxx_2.sim and xxxxx_3.sim Run Boole digitization program, read detector simulation and spillover files and reconstruct detector response output in xxxxxx_4.digi Run Brunel reconstruction program, read digitization data and reconstruct xxxxxx_5.dst Each operation takes input data, produces output data and defined by parameters. The Workflow in this case is an abstract description of the sequence of operations to be performed. Simulation workflow Workflow The Job which performs a certain activity can be described by a complex succession of operations For example: run Monte Carlo Simulation then run Boole digitization program and run Brunel reconstruction program Each operation takes input data, produces output data and can be defined by parameters. The Workflow in this case is an abstract description of the sequence of operations to be performed. For example this is a simulation workflow which consists of 5 operations In this case the Workflow Template is derived by specifying the internal parameters of Workflow, like the version of the package or event type Gauss 1 Gauss 3 Gauss 2 Boole Brunel The Workflow Template is derived by specifying the internal parameters of Workflow Gauss v15r5, LHCb v15r4, XMLDDDB v22r0, Boole v5r3, Brunel v23r5 (i.e. specifying version) Specifying event type = 61, Number of events = 500 Presenter Name: Gennady Kuznetsov

Step Gauss3 Gauss1 Gauss2 Boole Brunel Example of Simulation Workflow Gauss 1 Gauss 3 Gauss 2 Boole Brunel Each Workflow can be decomposed into a series of Steps connected to each other via input-output files. Each Workflow can be decomposed into a series of Steps connected to each other via input-output files. A Step in this case is the smallest entity that could run independently as a job (if persistent storage is used for the input-output files), although all steps can be combined into single job. A Step is the smallest entity that could run independently as a job (if persistent storage is used for the input-output files). Step Presenter Name: Gennady Kuznetsov

Module Example of Step “Brunel” (reconstruction program) Install package Brunel logfiles check Execute Brunel Bookkeeping Module1 Module4 Module3 Module2 Step usually consists of a series of smaller unbreakable operations - not necessarily producing output nor requiring input files, but may produce, exchange or share some data. Module To brand these operations as reusable components, we introduce the smallest entity called a Module. Step usually consists of a series of smaller unbreakable operations - not necessarily producing output nor requiring input files, but may produce, exchange or share some data. To brand these operations as reusable components, we introduce the smallest entity called a Module. Instances of the Modules can be configured or linked with each other by defining their parameters. Instances of the Modules can be configured or linked with each other by defining their parameters. Presenter Name: Gennady Kuznetsov

Authoring Tools XML file XML file Step Module Editor Step Editor Workflow Module Module Editor Step Editor Workflow Editor In order to create and manipulate mentioned above objects, we designed set of tools which allows the Production Manager to do a complex editing of Modules, Steps and Workflows. Each object can be stored as an XML file in the file system in the Workflow Library. Since those tools are general and completely content independent the only LHCb specific part is a Workflow Library. XML file XML file Workflow Library Presenter Name: Gennady Kuznetsov

Production Repository & Agent The Production Repository is a hierarchical storage designed to provide an area for the Jobs handling. It is populated by a Production Manager Console and its content is used by a DIRAC Production Agent Jobs Productions Production Templates Repository Submits Jobs; Checks Job Status; Downloads output sandbox; Workflow Library Production Manager Since our task is to generate Jobs from the Workflow we need a place for this kind of operations. The Production Repository is a hierarchical storage designed to provide an area for the Jobs handling. It is populated by a Production Manager Console and its content is used by a DIRAC Production Agent Production Manager Console Production Repository MCProduction template MCProduction1 MCProduction2 Job 1 …… Job n MCProduction3 Reconstruction template Reconstruction1 Production Agent Command line tools Dirac WMS Presenter Name: Gennady Kuznetsov

Submitting production jobs Creates and publishes Production Production Console Jobs Productions Production Templates Repository Template + Parameters Production Template Workflow + Parameters Production Manager Production Job Job Production Agent The sequence of operations is presented on this slide as an animation. The Production Manager creates a Workflow and adds some parameters (click) The Workflow published in the Production Repository as Production Template (click) The Console reads Production Template adds parameters (click) and publishes it as Production in the Repository (click) Then Production inside repository (click) generates requested number of Jobs (click) The Production Agent takes Jobs and submits it into the Dirac Workload Management System where the Job got executed DIRAC Computing Resource DIRAC CE DIRAC WMS Presenter Name: Gennady Kuznetsov

Automatic Data Processing Data processing is automated through a Processing Database and a series of Transformation Agents. Transformation File Records Transformations Processing DB Transformations ID Request ID Condition Files per Job Ref 00012 00001028 '(00000742)_[0-9]+_[0-9]+.dst' 80 Transformation Table File ID LFN 0000123 File ID File Status 0000123 Files File ID Replica The appropriate Transformation Agents are triggered by the presence of a sufficient number of file entries in the Processing Database. The Transformation Agent creates the required number of jobs in the Production Repository which are then picked up by the Production Agent. In order to create automatic data processing we had to introduce a Processing Database and Transformation Agent The Processing Database in fact a combination of File catalogue, Replica catalogue, Transformation tables and the Transformation Definition The Transformation Definition Table defines all Transformation Agents. The appropriate Transformation Agents are triggered by the presence of a sufficient number of file entries in the Processing Database. The Transformation Agent creates the required number of Jobs in the Production Repository which are then picked up by the Production Agent. File Records Transformations Processing DB Transformation Agent 1 Job Data Filter Agent 2 Presenter Name: Gennady Kuznetsov

Automatic job submission Creates and publishes Transformation Production Console Jobs Productions Production Templates Repository Production Manager Associates with the Production Production Transformation File Records Transformations Processing DB Transformation Agent Production Agent The sequence of events for the automatic Job submission presented on this slide as an animation. The Production Manager creates the Transformation and (click) associates it with the specific production (click) (wait until Tr. disappears) (click) The Transformation will instantiate Transformation Agent (click) DIRAC Computing Resource Computing Resource DIRAC WMS Job Presenter Name: Gennady Kuznetsov

Automatic job submission Production Console File entry Jobs Productions Production Templates Repository Production Manager Production File Records Transformations Processing DB Job Transformation Agent Job Production Agent The Job we send before will publish a File Entry in the Processing Database (click) or alternatively the Production Manager can do the same (click) The Transformation Agent will pick-up File Records satisfying its own requirement (click) and activate associated Production (click) The Production will (click) generate Jobs in the Repository and Production Agent (click) will send it to the Dirac Workload Management System DIRAC Computing Resource DIRAC WMS File entry Job Presenter Name: Gennady Kuznetsov

Operational experience in DC04 The Production Console was used during the DC04 production period more than 100.000 jobs were created and executed 10 Workflows of 5 to 12 application Steps were used The whole production was handled by a small team of production managers with just one person on shift at each moment The Production Console was used during the DC04 production period more than 100.000 jobs were created and executed 10 Workflows of 5 to 12 application Steps were used The whole production was handled by a small team of production managers with just one person on shift at each moment ( 10 workflows, 120 productions, 55 event types, 50M events) Presenter Name: Gennady Kuznetsov

Use cases in DC06 In the DC06 LHCb will test its complete Computing Model which includes Data acquisition at CERN with subsequent inline replication to Tier1 centres Automatic submission of reconstruction and filtering jobs at Tier1 centres as soon as data become available DST data replication across the Tier1 centres The Production Manager tools will be used to cover all these new tasks to minimize the human intervention and to increase responsiveness of the production system In the DC06 LHCb will test its complete Computing Model which includes Data acquisition at CERN with subsequent inline replication to Tier1 centres Automatic submission of reconstruction and filtering jobs at Tier1 centres as soon as data become available DST data replication across the Tier1 centres The Production Manager tools will be used to cover all these new tasks to minimize the human intervention and to increase responsiveness of the production system Presenter Name: Gennady Kuznetsov

Conclusion The DIRAC Production Manager tool suite allows to define Workflows of arbitrary complexity It allows to handle seamlessly large numbers of jobs by a single Production Manager The automatic job preparation and submission triggered by the input data availability reduces the need for the human intervention The Production Console was successfully used in the LHCb DC’04. The complete system to be used in DC06 will support all the production operations defined in the LHCb Computing Model The conlcusion The DIRAC Production Manager suite allows to define Workflows of arbitrary complexity It allows to handle seamlessly large numbers of jobs by a single Production Manager The automatic job preparation and submission triggered by the input data availability reduces the need for the human intervention The Production Console was successfully used in the LHCb DC’04. The complete system to be used in DC06 will support all the production operations defined in the LHCb Computing Model Presenter Name: Gennady Kuznetsov

Backup slides Backup Slides Presenter Name: Gennady Kuznetsov

Instance 1 (list of Options) Variables Global scope How we can avoid variable clash if different programmers are creating Modules and Steps? If we use multiple instances of the same Module – we shall have a local scope. $VAR2 = “string” var3 = $VAR2 var1 = $VAR2 Instance 1 (list of Options) Name space based scope Option Name: var1 Type Value Linked Instance Linked Option Input Flag Output Flag Description Option Name: var2 Type Value Linked Instance Linked Option Input Flag Output Flag Description var3 = inst1.var2 var1 = self.var2 Instance 2 Option Name: var3 Type Value Linked Instance Linked Option Input Flag Output Flag Description Options grouped in a form of list. All our objects (Module, Step, Workflow) are lists of Options. Gennady Kuznetsov

Architecture Module Editor Step Editor Workflow Editor Variable Option Name: var1 Type Value Linked Instance Linked Option Input Flag Output Flag Description Variable n 1 VariableCollection Module edit Module Editor ModuleDefinition ModuleInstance n n Step 1 1 edit Step Editor StepDefinition StepInstance n n 1 1 Workflow edit Workflow Editor WorkflowDefinition Presenter Name: Gennady Kuznetsov

Step Editor Generated Python Code M2 M1 M3 ins1 ins4 ins3 ins2 class HistogramTest: seed = 3 numbers = 200 filename = "WorkflowHistogramTest1.155" def execute(self): ins1 = M1() <skipped> ins1.execute() ins2 = M2() ins2.seed = self.seed ins2.filename = 'default_file_name' ins2.mu = inst1.mu ins2.sigma = 2.5 ins2.numbers = self.numbers ins2.execute() M2 M1 M3 ins1 ins4 ins3 ins2 Step Variables Gennady Kuznetsov

Conclusion DIRAC Production Manager tool includes flexible and experiment (content) independent framework to formulate complex multi-staged jobs LHCb specific Workflow library combined from the smaller and reusable components tools for the bulk Job preparation, submission and control based on Production Repository and Production Agent automated data processing tools based on Processing Data Base and Transformation Agents Presenter Name: Gennady Kuznetsov