Runjob: A HEP Workflow Planner for Production Processing Greg Graham CD/CMS Fermilab CD Project Status Meeting 13-Nov-2003.

Slides:



Advertisements
Similar presentations
1 14 Feb 2007 CMS Italia – Napoli A. Fanfani Univ. Bologna A. Fanfani University of Bologna MC Production System & DM catalogue.
Advertisements

GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Data Management Expert Panel - WP2. WP2 Overview.
RunJob in CMS Greg Graham Discussion Slides. RunJob in CMS RunJob is an Application Configuration and Job Creation Tool –RunJob uses metadata to abstract.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,
CMS-ARDA Workshop 15/09/2003 CMS/LCG-0 architecture Many authors…
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Experience with ATLAS Data Challenge Production on the U.S. Grid Testbed Kaushik De University of Texas at Arlington CHEP03 March 27, 2003.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Workload Management Massimo Sgaravatto INFN Padova.
A tool to enable CMS Distributed Analysis
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
MCRunjob: An HEP Workflow Planner for Grid Production Processing Greg Graham CD/CMS Fermilab Iain Bertram and Dave Evans Lancaster university CHEP 2003.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Distribution After Release Tool Natalia Ratnikova.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Grid Workload Management Massimo Sgaravatto INFN Padova.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
DIRAC Review (13 th December 2005)Stuart K. Paterson1 DIRAC Review Exposing DIRAC Functionality.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5 November 2001.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
RefDB: The Reference Database for CMS Monte Carlo Production Véronique Lefébure CERN & HIP CHEP San Diego, California 25 th of March 2003.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
DZero Monte Carlo Production Ideas for CMS Greg Graham Fermilab CD/CMS 1/16/01 CMS Production Meeting.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
CMS Production Management Software Julia Andreeva CERN CHEP conference 2004.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
DZero Monte Carlo Status, Performance, and Future Plans Greg Graham U. Maryland - Dzero 10/16/2000 ACAT 2000.
Shahkar/MCRunjob: An HEP Workflow Planner for Grid Production Processing Greg Graham CD/CMS Fermilab GruPhyN 15 October 2003.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EDG Project Conference – Barcelona 13 May 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – Grid Planning in CMS Outline  CMS Data Challenges  CMS Production.
(on behalf of the POOL team)
U.S. ATLAS Grid Production Experience
Presentation transcript:

Runjob: A HEP Workflow Planner for Production Processing Greg Graham CD/CMS Fermilab CD Project Status Meeting 13-Nov-2003

Purpose of MCRunjob Applications in complex production processing environments often need to be tamed –Hundreds of input parameters encountered during MC Production –Heterogeneous runtime environments, many different Regional Centers –Complex multi-application workflows, spanning both MC production and Analysis –Dependencies and relationships among the metadata are often modeled inside of obscure shell scripts (MC)Runjob captures specialized knowledge involved in workflow planning and makes it available to users and to higher level tools. –Metadata and schema oriented descriptions of workflow components –Tracks dependencies among the metadata –Tracks synonyms between groups of metadata, allows schema evolution and versioning –User registered functions do the actual work within a framework, leading to enhanced modularity (MC)Runjob has been in use since 1999 at DZero and since 2002 at CMS: This is a mainstream tool already supported by the respective experiments

A user who wants to run applications A,B, and C attaches corresponding Configurators to a Linker. The Linker verifies that dependencies are satisfied. Once attached, the user sets values for the various schema elements defined in each configurator, and defines filename rules, random seed rules, etc. The user then executes the framework. Each Configurator may generate scripts used to run the corresponding application. The scripts are collected by a ScriptGen object.

Use Cases - Current Generation of complex multi-application workflows. Derived data products may require multiple processing steps. MCRunjob chains individual steps together into tree structures, and allows for logical and functional dependencies to be declared among the metadata and groups of metadata. (For example, “Use the Run Number for the Random Number Seed,” or “Filename=‘Metadata1_Metadata2.Metadata3’ ”) Planning of Monte Carlo requests. MCRunjob can be used to split a large request into multiple smaller requests tailored for a specific farm or situation. Tools to give non-experts access to full spectrum of applications and services. MCRunjob exposes all applications, tasks, and services to the user as metadata and takes care of hiding regional center or runtime environment eccentricities.

Use Cases - Current Bringing a new Regional Center On Line Quickly. As a part of the overall package of software needed to install an experiment software environment at a Regional Center, MCRunjob can be used to quickly test the installation by running the actual applications in complex workflows right away. MCRunjob can expose a uniform interface that hides Regional Center differences. Integration of Applications with experiment databases. MCRunjob can be used to generate metadata needed to track derived or created data products as part of the workflow description. MCRunjob can also be used to retrieve production processing requests from experiment databases. (In DZero, this is SAM; in CMS it is the RefDB.)

CMS and DZero Computing Runjob is the official tool for Monte Carlo Production at DZero and CMS. –Used to generate millions of events at DZero regional centers worldwide. –Used to generate 1.5M events on the CMS Integration Grid Testbed using MOP and Condor-G. (WBS 1.3.3) –Used in Spring 2003 production and PCP'04 production CMS and DZero continue to work on experiment specific extensions to MCRunjob. –SAM/JIM execution environment in DZero, runtime extensions –Inclusion of new runtime scripts (CMSProd) and different Grid environments in CMS (MOP, VDL, EDG,etc) (WBS 1.3.2, 1.3.3) Common code project started this year: Shahkar –holds common base classes for (MC)Runjob

MCRunjob Core Services Linker and Configurator Base Classes Metadata Description of Workflows Methods for Job Generation Project Planning, Release Mgmnt, Documentation, and GUI

MCRunjob Core Services Linker and Configurator Base Classes Metadata Description of Workflows Methods for Job Generation Project Planning, Release Mgmnt, Documentation, and GUI Metadata Descripti ons of Applicati ons Metadata Descripti ons of Workflo ws and Derived Data Container s for Results of Catalog or Param Lookup Metadata Descripti ons of Services and Portals Smart Wrappers ; Hooks into Experime nt framewor ks Delayed Abstract Planning Metadata Depende ncies, Ontologi es Data Provenan ce

MCRunjob Core Services Linker and Configurator Base Classes Metadata Description of Workflows Methods for Job Generation Project Planning, Release Mgmnt, Documentation, and GUI HEP Experiments Database Driven MC Request Planning And Tracking Runtime Services Monitoring, Experiment Frameworks, Production Architecture Grid Services Job Submission Replica Management Dataset Management Knowledge Management Provenance, Virtual Data, Dependencies Metadata Descripti ons of Applicati ons Metadata Descripti ons of Workflo ws and Derived Data Container s for Results of Catalog or Param Lookup Metadata Descripti ons of Services and Portals Smart Wrappers ; Hooks into Experime nt framewor ks Delayed Abstract Planning Metadata Depende ncies, Ontologi es Data Provenan ce

MCRunjob Core Services Linker and Configurator Base Classes Metadata Description of Workflows Methods for Job Generation Project Planning, Release Mgmnt, Documentation, and GUI HEP Experiments DZero, CMS (LHC) Runtime Services SAM/JIM ORCA/COBRA Grid Services SAM, PPDG Knowledge Management SAM, GriPhyN DAWN (Large ITR) RefDB Interface SAM MC Request Interface, and Data Desc. File Iterators Replica Cats Paramete r Sweeps MOP DAGMa n JDL BOSS Interface SAM/JI M Monitori ng Services VDL ScriptGe n SAM Metadata descripti ons

Project Definition Scope of Project –Support of core functionality needed by the experiments DZero and CMS Chaining individual production processing descriptions together to form complex workflow descriptions Modularity to produce executable jobs from workflow descriptions for a variety of runtime environments, grid portals, and regional centers. General APIs for connecting to experiment specific processing request DBs and tracking DBs Common code project already started, code named “Shahkar” –Support efforts to extend functionality of MCRunjob to include more interfaces to Grid Services Build upon current experience with MOP and DAGMan. Extend this methodology to modules for JDL and work closely with the LCG. Extend the current experiences with File Iterators to make use of Replica Catalogues. Extend the RunJob configurators to expose Grid Portals to MCRunjob

Project Definition Scope of Project –Support of core functionality needed by the experiments DZero and CMS Chaining individual production processing descriptions together to form complex workflow descriptions Modularity to produce executable jobs from workflow descriptions for a variety of runtime environments, grid portals, and regional centers. General APIs for connecting to experiment specific processing request DBs and tracking DBs Common code project already started, code named “Shahkar” –Support efforts to extend functionality of MCRunjob to include more interfaces to Grid Services Build upon current experience with MOP and DAGMan. Extend this methodology to modules for JDL and work closely with the LCG. Extend the current experiences with File Iterators to make use of Replica Catalogues. Extend the RunJob configurators to expose Grid Portals to MCRunjob

Project Definition Scope of Project (cont’d) –Support efforts to extend the work on Knowledge Management issues Refine what is already there for Virtual Data Language by allowing MCRunjob generated scripts to be used as Chimera transformations. Refine the MCRunjob macro language to make it more terse, support expressions, and more flavors of dependencies –Perhaps embedding the existing MCRunjob Macro Language directly in Python. –Support efforts to explore the possible runtime applications of MCRunjob Runtime monitoring in SAM/JIM context Write extension configurators to insert monitoring wrappers into existing workflows in a generic way. Bridge experiment application frameworks to external services, such as Grid or experiment supported DBs. –Interested in talking to GANGA about their experiences.

Effort To Date (Annualized) Current Effort on Core Project 0.6 FTE: –Core maintenance and Testing: 0.5 FTE Eric Wicklund, CEPA –CMS Integration Task: 0.1 FTE Gerald Guglielmo, CMS –Dzero Integration Task: 0.0 FTE Peter Love, Lancaster Current Effort on Related Projects: – DZero maintenance and extensions: 1.0 FTE Dave Evans, Peter Love, Iain Bertram (Lancaster) –CMS maintenance and extensions: 1.0 FTE Julia Andreeva, Veronique Lefebure, Greg Graham (CMS CCS) –Dzero Grid Extensions: Igor Terekhov, CEPA(?) –CMS Grid Extensions: 0.5 FTE Anzar Afaq, PPDG; Alessandra Fanfani, EDG and INFN

Relationship to Other Projects SAM –One of the first great applications of MCRunjob was to automatically generate the metadata needed by the SAM system in order to store MC production results. –Closer integration with SAM is proceeding apace in the context of automatic generation of MC jobs from request metadata stored in SAM GriPhyN –MCRunjob has a ScriptGen which produces Virtual Data Language –Conceptually, Configurator schemas are like transformations, Configurators with values are like derivations, and ConfiguratorDescriptions and dependencies define “types” on the data appearing at the endpoints of a transformation. –MCRunjob can either generate VDL, VDL+wrapper scripts (custom transformations), or function as an abstract planner.

Relationship to Other Projects SAM/JIM –In the JIM grid execution environment, “abstract” MCRunjob scripts are sent as the job instead of shell scripts or conventional executables. –MCRunjob macro scripts are re-parallelized by a remote MCRunjob Linker process started up by Condor-G. –Delayed abstract planning! Data Provenance –MCRunjob is already capable of a fully declarative specification of workflow, and can communicate with external databases and servers. –Besides a bare specification of parameters, MCRunjob keeps track of the dependencies that existed among parameters when they were created.

Relationship to Other Projects Context Oriented –New paradigm: Context Oriented Programming, supported by CMS MCRunjob. –Behavior of workflow specification is context dependent to allow for more abstract workflow specification MOP –Uses Condor-G/DAGMan to submit jobs to remote globus job managers. –This is ROUTINE in CMS. EDG –Uses RLS and EDG Resource Broker to submit jobs to EDG sites and the CMS LCG-0 based grid.

Conclusions/Questions MCRunjob provides functionality to model complex workflows found in MC Production. MCRunjob is a powerful workflow planner with modular component based interfaces to external services. Metadata from any one application area should be exposed to the other areas without compromising existing architectures Should be able to run jobs on any resources in a unified way In preparation for Analysis environments: –Take it from a former Kaon physicist: Sharpening our understanding of coarse grained production processing still has much to teach us about the more complex environments expected in physics analysis. Understanding the behavior of the underlying Grid services and the coming challenges of knowledge management in the face of clean predictable input and measurable results has a lot of value.

References USCMS MCRunjob page : – DZero MCRunjob page : – Previous Talks and Papers: –MCRunjob: A Workflow Planner for HEP, G.E. Graham, Dave Evans, and Iain Bertram. Proceedings of Computers in High Energy Physics 2003 (CHEP 2003), San Diego, CA –Tools and Infrastructure for CMS Distributed Production (4-033), G.E. Graham, et al. Proceedings of Computers in High Energy Physics 2001 (CHEP 2001), Beijing, China –Dzero Monte Carlo Production Tools (8-027), G.E. Graham, et al.. Proceedings of Computers in High Energy Physics 2001 (CHEP 2001), Beijing, China –Dzero Monte Carlo, G.E. Graham. Proceeding of Advanced Computing and Analysis Techniques 2000 (ACAT 2000), Fermilab, Batavia, IL