Presentation is loading. Please wait.

Presentation is loading. Please wait.

A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,

Similar presentations


Presentation on theme: "A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,"— Presentation transcript:

1 A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics, 2003 UC San Diego Virtual Data In CMS Production

2 03.25.2003CHEP 20032 Virtual Data Motivations in Production l Data track-ability and result audit-ability –Universally sought by scientists l Facilitates tool and data sharing and collaboration –Data can be sent along with its recipe –Recipe is useful in searching for data l Workflow management –A new, structured paradigm for organizing, locating, and specifying data products l Performance optimizations –Ability to delay execution planning until as late as possible

3 03.25.2003CHEP 20033 Initial CMS Production tests using the Chimera Virtual Data System l Motivation –Simplify CMS production in a Grid environment –Evaluate current state of Virtual Data technology –Understand issues related to provenance of CMS data l Use-case –Implement a simple 5-stage CMS production pipeline on the US CMS Test Grid l Solution –Wrote an interface between Chimera and the CMS production software –Wrote a simple grid scheduler –Ran sample simulations to evaluate system

4 03.25.2003CHEP 20034 What is a DAG? Directed Acyclic Graph l A DAG is the data structure used to represent job dependencies. l Each job is a “node” in the DAG. l Each node can have any number of “parent” or “children” nodes – as long as there are no loops! l We usually talk about workflow in units of "DAGs" Job A Job BJob C Job D Picture Taken from Peter Couvares

5 03.25.2003CHEP 20035 ODBMS Generator Simulator Formator writeESD writeAOD writeTAG writeESD writeAOD writeTAG Analysis Scripts Digitiser Calib. DB Example CMS Data/ Workflow

6 03.25.2003CHEP 20036 ODBMS Generator Simulator Formator writeESD writeAOD writeTAG writeESD writeAOD writeTAG Analysis Scripts Digitiser Calib. DB Online Teams (Re)processing Team MC Production Team Physics Groups Data/workflow is a collaborative endeavour!

7 03.25.2003CHEP 20037 CMKIN CMSIM OOHITS OODIGI NTUPLE.ntpl.fz Event Database.ntpl Events are generated (pythia). Detector’s response is simulated for each event (geant3). Events are reformatted and written into a database. Original events are digitised and reconstructed. Reconstructed data is reduced and written to flat file. A Simple CMS Production 5-Stage Workflow Use-case

8 03.25.2003CHEP 20038 Fortran DB 2-stage DAG Representation of the 5-stage Use-case Fortran job wraps the CMKIN and CMSIM stages. DB job wraps the OOHITS, OODIGI, and NTUPLE stages. This structure was used to enforce policy constraints on the Workflow (i.e. Objectivity/DB license required for DB stages) Initially used a simple script to generate Virtual Data Language (VDL) McRunJob is now used to generate the Workflow in VDL (see talk by G. Graham) CMKIN CMSIM OOHITS OODIGI NTUPL.ntpl.fz Event DB.ntpl Responsibility of a Workflow Generator: creates the abstract plan

9 03.25.2003CHEP 20039 Mapping Abstract Workflows onto Concrete Environments l Abstract DAGs (virtual workflow) –Resource locations unspecified –File names are logical –Data destinations unspecified –build style l Concrete DAGs (stuff for submission) –Resource locations determined –Physical file names specified –Data delivered to and returned from physical locations –make style Abs. Plan VDC RC C. Plan. DAX DAGMan DAG VDL Logical Physical XML In general there are a range of planning steps between abstract workflows and concrete workflows

10 03.25.2003CHEP 200310 Fortran DB Stage File In Execute Job Stage File Out Register File Concrete DAG Representation of the CMS Pipeline Use-case Responsibility of the Concrete Planner: Binds job nodes with physical grid sites Queries Replica and Transformation Catalogs for existence and location. Dresses job nodes with stage-in/out nodes.

11 03.25.2003CHEP 200311 compute machines Condor-G Chimera DAGman gahp_server submit hostremote host gatekeeper Local Scheduler (Condor, PBS, etc.) Default middleware configuration from the Virtual Data Toolkit

12 03.25.2003CHEP 200312 compute machines Condor-G Chimera DAGman gahp_server submit hostremote host gatekeeper Local Scheduler (Condor, PBS, etc.) WorkRunner RefDB McRunJob: Generic Workflow Generator Modified middleware configuration (to enable massive CMS production workflows)

13 03.25.2003CHEP 200313 compute machines Condor-G Chimera DAGman gahp_server submit hostremote host gatekeeper Local Scheduler (Condor, PBS, etc.) WorkRunner McRunJob: Generic Workflow Generator Modified middleware configuration (to enable massive CMS production workflows) RefDB The CMS Metadata Catalog: - contains parameter/cards files - contains production requests - contains production status - etc See Veronique Lefebure's talk on RefDB

14 03.25.2003CHEP 200314 compute machines Condor-G Chimera DAGman gahp_server submit hostremote host gatekeeper Local Scheduler (Condor, PBS, etc.) WorkRunner RefDB Modified middleware configuration (to enable massive CMS production workflows) LinkerVDL Generator VDL Config RefDB Module The CMS Workflow Generator: - Constructs production workflow from a request in the RefDB - Writes workflow description in VDL (via ScriptGen) See Greg Graham's talk on MCRunJob McRunJob: Generic Workflow Generator

15 03.25.2003CHEP 200315 compute machines Condor-G Chimera DAGman gahp_server submit hostremote host gatekeeper Local Scheduler (Condor, PBS, etc.) RefDB McRunJob: Generic Workflow Generator Modified middleware configuration (to enable massive CMS production workflows) WorkRunner Condor-G Monitor Chimera Interface Job Tracking Module Workflow Grid Scheduler - very simple placeholder (due to lack of interface to resource broker) - submits Chimera workflows based on simple job monitoring information from Condor-G

16 03.25.2003CHEP 200316 compute machines Condor-G Chimera DAGman gahp_server submit hostremote host gatekeeper Local Scheduler (Condor, PBS, etc.) WorkRunner RefDB McRunJob: Generic Workflow Generator Modified middleware configuration (to enable massive CMS production workflows)

17 03.25.2003CHEP 200317 Initial Results l Production Test –Results >678 DAG’s (250 events each) >167,500 test events computed (not delivered to CMS) >350 CPU/days on 25 dual-processor Pentium (1 GHz) machines over 2 weeks of clock time >200 GB simulated data –Problems >8 failed DAG’s >Cause l Pre-emption by another user

18 03.25.2003CHEP 200318 Initial Results (cont) l Scheduling Test –Results >5954 DAG’s (1 event each, not used by CMS) >300 CPU/days on 145 CPU’s in 6 sites l University of Florida: USCMS Cluster (8), HCS Cluster (64), GriPhyN Cluster (28) l University of Wisconsin, Milwaukee, CS Dept. Cluster (30) l University of Chicago, CS Dept. Cluster (5) l Argonne National Lab DataGrid Cluster (10) –Problems >395 failed DAG’s >Causes l Failure to post final data from UF GriPhyN Cluster (200-300) l Globus Bug, 1 DAG in 50 fails when communication is lost >Primarily limited by the performance of lower-level grid middleware

19 03.25.2003CHEP 200319 The Value of Virtual Data l Provides full reproducibility (fault tolerance) of one's results: –tracks ALL dependencies between transformations and their derived data products –something like a "Virtual Logbook" –records the provenance of data products l Provides transparency with respect to location and existence. The user need not know: –the data location –how many data files are in a data set –if the requested derived data exists l Allows for optimal performance in planning. Should the derived data be: –staged-in from a remote site? –send the job to the data –send the data to the job –re-created locally on demand?

20 03.25.2003CHEP 200320 Summary: Grid Production of CMS Simulated Data l CMS production of simulated data (to date) –O(10) sites –O(1000) CPUs –O(100) TB of data –O(10) production managers l Goal is to double every year—without increasing the number of production managers! –More automation will be needed for upcoming Data Challenges! l Virtual Data provides –parts of the necessary abstraction required for automation and fault tolerance. –mechanisms for data provenance (important for search engines) l Virtual Data technology is "real" and maturing, but still in its childhood –much functionality currently exists –still requires placeholder components for intelligent planning and optimisation


Download ppt "A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,"

Similar presentations


Ads by Google