CMS Production Management Software Julia Andreeva CERN CHEP conference 2004.

Slides:

Advertisements

Similar presentations

1 14 Feb 2007 CMS Italia – Napoli A. Fanfani Univ. Bologna A. Fanfani University of Bologna MC Production System & DM catalogue.

Advertisements

CMS Grid Batch Analysis Framework

RunJob in CMS Greg Graham Discussion Slides. RunJob in CMS RunJob is an Application Configuration and Job Creation Tool –RunJob uses metadata to abstract.

Workload Management meeting 07/10/2004 Federica Fanzago INFN Padova Grape for analysis M.Corvo, F.Fanzago, N.Smirnov INFN Padova.

1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.

The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.

A tool to enable CMS Distributed Analysis

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.

Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term.

CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.

5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

Distribution After Release Tool Natalia Ratnikova.

Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.

Grid Workload Management Massimo Sgaravatto INFN Padova.

Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.

Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.

Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.

LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)

Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.

DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.

ARDA Prototypes Andrew Maier CERN. ARDA WorkshopAndrew Maier, CERN2 Overview ARDA in a nutshell –Experiments –Middleware Experiment prototypes (basic.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

LCG ARDA status Massimo Lamanna 1 ARDA in a nutshell ARDA is an LCG project whose main activity is to enable LHC analysis on the grid ARDA is coherently.

ATLAS Production System Monitoring John Kennedy LMU München CHEP 07 Victoria BC 06/09/2007.

29 Sept 2004 CHEP04 A. Fanfani INFN Bologna 1 A. Fanfani Dept. of Physics and INFN, Bologna on behalf of the CMS Collaboration Distributed Computing Grid.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.

Julia Andreeva on behalf of the MND section MND review.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.

A proposal: from CDR to CDH 1 Paolo Valente – INFN Roma [Acknowledgements to A. Di Girolamo] Liverpool, Aug. 2013NA62 collaboration meeting.

David Stickland CMS Core Software and Computing

Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.

INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.

RefDB: The Reference Database for CMS Monte Carlo Production Véronique Lefébure CERN & HIP CHEP San Diego, California 25 th of March 2003.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.

D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.

ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko

EDG Project Conference – Barcelona 13 May 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – Grid Planning in CMS Outline  CMS Data Challenges  CMS Production.

DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.

Real Time Fake Analysis at PIC

Overview of the Belle II computing

U.S. ATLAS Grid Production Experience

Data Challenge with the Grid in ATLAS

BOSS: the CMS interface for job summission, monitoring and bookkeeping

BOSS: the CMS interface for job summission, monitoring and bookkeeping

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

CMS Data Challenge 2004 Claudio Grandi CMS Grid Coordinator

BOSS: the CMS interface for job summission, monitoring and bookkeeping

LCG middleware and LHC experiments ARDA project

ATLAS DC2 & Continuous production

Status and plans for bookkeeping system and production tools

Production client status

The LHCb Computing Data Challenge DC06

Presentation transcript:

CMS Production Management Software Julia Andreeva CERN CHEP conference 2004

CHEP20042Julia Andreeva, CERN Overview Role of CMS production system in the CMS Data Challenge of 2004 (DC04) Software components of the CMS production system (functionality, implementation, communication between them), their use in DC04 Encountered problems and DC04 lessons Conclusions

CHEP20043Julia Andreeva, CERN CMS DC04 tasks 2004 Data Challenge was focused to implement the following tasks: Simulate a sustained data-taking rate equivalent to 25 Hz at a luminosity of 0,2x10 34 cm –2 s –1 for one month, which would correspond to 25% of LHC startup. Distribute data files produced at Tier-0 (CERN) to Tiers-1 Run analysis at Tiers-1 as soon as data files get available at Tier-1, “real-time” analysis User analysis Production system was responsible for the first task

CHEP20044Julia Andreeva, CERN CMS DC04 phases Pre-challenge Production, started in July 2003 RefDB MSS META information DATA files Generation Simulation Digitization Reconstruction at Tier-0 (CERN), March-April 2004 Transfer agent drop-box Castor POOL META information DATA files Data distribution to Tiers-1, March-April 2004 MSS Tier-0 RefDB MSS Tier-1 RLSTMDB Updates Data files transfer Real-time analysis, March-April 2004 User analysis

CHEP20045Julia Andreeva, CERN Pre-challenge production (PCP) Pre-challenge production included generation, simulation and digitization production steps Pre-challenge production ran in the distributed heterogeneous environment : - 35 regional centers in Asia, Europe and USA participated - Different local batch systems as well as two different Grid flavors (Grid3 and LCG) were used Big scale of requested/produced data as well as environment variety both contribute to the complexity of the production software.

CHEP20046Julia Andreeva, CERN PCP, planning and reality According to the original planning: –50 million events –Simulation: 5 months until ~October 2003 –Digitisation: October+November 2003 (1Mevt/day) –Shipping data to the T0: November+December 2003 ~1 TB per day for 2 months, not at all trivial at that time –DC04 rehearsals scaling up Q403  Jan04 The reality: –>75 million events requested –50 million events simulated by the start of DC04 –Digitization code delivered in January Only ~10 million of digitized events delivered before the start of DC04 Digitisation continued through DC04 –DC04 began (with essentially no rehearsal) on March 1st

CHEP20047Julia Andreeva, CERN Production progress for digitization requests July 2003-May 2004 ORCA_7_6_1 released 2x1033 digitisation Start of DC04 End of DC04 24 Mevents, 6 weeks

CHEP20048Julia Andreeva, CERN Subsystems of the OCTOPUS project ( CMS production software project) RefDB - CMS Monte Carlo Reference Database MySQL database + WEB interface to it (PHP,CGI) McRunjob – Monte Carlo Run Job, workflow planner for production processing, Framework (OOPython) for creating/submission of large batches of production jobs BOSS - Batch Object Submission System Local book-keeping and real –time monitoring system (C++, MySQL) UpdateRefDB – update of RefDB with the meta information sent by every successful job (Perl module running at CERN as a crontab job) DAR – Distribution After Release, distribution system for CMS application software (Perl module providing a system for distributing of a required version of a given CMS physics application to the production regional centers)

CHEP20049Julia Andreeva, CERN Overview of the CMS production cycle RefDB McRunjob Phys. Group Prod. Manager Site Manager Exec. Script JDL DAG LocalBatch System Grid (LCG) Scheduler DAGMan (MOP) BOSS Local Farm LCG GRID3

CHEP200410Julia Andreeva, CERN RefDB McRunjob Successfully Accomplished jobs RefDBUpdate All necessary instructions and meta information for jobs creation, job-splitting Job meta information Assigning requests, following request processing, getting production statistics, managing applications, templates, datacards, software distributions, monitoring components, publishing information, requesting DAR distribution Production manager Getting information about available collections, submitting requests, browsing existing and inserting new datacards, executables, applications Physicist Getting an assignment, following request processing, getting production statistics, updating publishing information RC manager Functionality Recording of the production requests done by the physicists Distribution of the work to the production RCs and tracing of the production progress Central source of production instructions for the workflow planner Book-keeping and metadata catalog of the produced data DAR Request for creating of DAR distribution with description of the distribution content

CHEP200411Julia Andreeva, CERN McRunjob Modular framework for - creating batches of production jobs, possible combining several processing steps in one job - submitting them to different type of environment: different batch systems or different Grid flavors - following the progress of job processing through a tracking directory - publishing of the processed data collections - creating of the POOL xml catalog for the collection chain - updating initialized COBRA META files with produced collection data

CHEP200412Julia Andreeva, CERN BOSS McRunjob Executable Scheduler BOSS BOSS DB Worker node Executable BOSS wrapper User creates the filters where he defines which parameters in the standard output he wants to trace, or what actions to implement when a given pattern is found in the STDOUT Filters used by CMS production jobs are stored in RefDB Provides local book-keeping and real time monitoring

CHEP200413Julia Andreeva, CERN Data formats and applications used for reconstruction Reconstruction is done with ORCA – Object Oriented Reconstructed for CMS Analysis which uses CMS COBRA (Coherent Object Oriented Base for Reconstruction, Analysis and Simulation) framework (C++) Reconstruction uses as an input an output of digitization step (simulation of the DAQ process). Output of digitization step is called “Digis” An output of reconstruction step is called DST ( data summary tapes). Both input and output files are in the POOL format

CHEP200414Julia Andreeva, CERN DC04 reconstruction Reconstruction was run at Tier-0 (CERN) No chance for testing of the reconstruction code in the production scale before the start of DC04 Several releases of ORCA during 2 months of data challenge, bug fixing and code improvements (ORCA 7_7_1, ORCA 8_0_0, ORCA 8_0_1) Many reruns on the same input data, in total ~24Mln events have been reconstructed 25 Hz corresponds to 1000 jobs per day using 100% of 500 CPUs, this rate was reached during limited periods of time, not permanently. Bottleneck was not running reconstruction at a given rate but further data distribution.

CHEP200415Julia Andreeva, CERN Integrating reconstruction in the production machinery Reconstruction code was released just before DC04 started Integrating of the reconstruction in the production machinery was done very quickly (order of couple of days) Development of the procedure for the publishing of the produced data took longer, since multiple output collections produced by writeStreams executable were not supported by RefDB schema and required some work around.

CHEP200416Julia Andreeva, CERN DC04 data flow at Tier-0 RefDB McRunjob T0 worker nodes GDB castor pool Export Buffers Transfer agent RLSTMDB Reconstruction Instructions and meta information Reconstruction jobs Reconstructed data Reconstructed data Checks what had arrived Updates Summaries of successful jobs Castor tapes Input Buffer Input data files (Digis) Transfer agent drop-box Job meta information

CHEP200417Julia Andreeva, CERN META information of the reconstruction job Sent by reconstruction job to RefDB: RunNumber (set equal to input run number, predefined by RefDB, key value) POOL XML catalog fragment, containing only data files produced by a given RUN ( no real PFN information there, ‘./’ instead) Used for recreating of the POOL XML catalog for the whole data collection RUNID value (used by COBRA for attachment of data files to COBRA META files) Time in seconds (clock time) to run executable Validation status This information is sent by mail to RefDB by every successfully accomplished job in a summary file, which gets processed by UpdateRefDB module. As a result run is validated in RefDB. Put to the transfer-agent drop-box POOL XML catalog fragment, used by transfer agent to update RLS catalog CheckSum file, containing check sums of the produced files, used by transfer agent to update TMDB “Go” flag, indicates that files of a given run are ready for distribution This information is used for data distribution to the Tiers-1

CHEP200418Julia Andreeva, CERN DC04 lessons Main problems in the production machinery had been discovered at the level of publishing information about available data collections to the users and distribution of the META information required for data access These problems are addressed by the current development driven by the production team: Publishing procedure was better automated and improved to solve performance problems. Distributed system for publishing of the catalogs for data collections available CMS-wide is developed. First prototype of PubDB (Publishing Data Base ) is already deployed at CERN, FZK, INFN and PIC User interface to RefDB for getting information about available collections is improved following suggestions of the physicists community

CHEP200419Julia Andreeva, CERN Conclusions CMS production software proved to be quite flexible and was very quickly updated for supporting of DC04 reconstruction No serious problems were discovered in the part related to running jobs at the given rate and in the production book-keeping system. Ongoing development is focused to improve a system for publication to the CMS physics community information about available data collections and for distribution of meta information required for data access.