FCC HtCondor Submission:

Slides:



Advertisements
Similar presentations
Cluster Computing at IQSS Alex Storer, Research Technology Consultant.
Advertisements

Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Computing Lectures Introduction to Ganga 1 Ganga: Introduction Object Orientated Interactive Job Submission System –Written in python –Based on the concept.
en-us/sharepoint/fp
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
Functions.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Asynchronous Solution Appendix Eleven. Training Manual Asynchronous Solution August 26, 2005 Inventory # A11-2 Chapter Overview In this chapter,
Israel Cluster Structure. Outline The local cluster Local analysis on the cluster –Program location –Storage –Interactive analysis & batch analysis –PBS.
New condor_submit features in HTCondor 8.3/8.4 John (TJ) Knoeller Condor Week 2015.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Compiled Matlab on Condor: a recipe 30 th October 2007 Clare Giacomantonio.
Launch SpecE8 and React from GSS. You can use the chemical analyses in a GSS data sheet to set up and run SpecE8 and React calculations. Analysis → Launch…
GNU Compiler Collection (GCC) and GNU C compiler (gcc) tools used to compile programs in Linux.
Nadia LAJILI User Interface User Interface 4 Février 2002.
A Networked Machine Management System 16, 1999.
Ganga A quick tutorial Asterios Katsifodimos Trainer, University of Cyprus Nicosia, Feb 16, 2009.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Working with AliEn Kilian Schwarz ALICE Group Meeting April
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
A brief introduction to javadoc and doxygen. What’s in a program file? 1. Comments 2. Code.
GRID activities in Wuppertal D0RACE Workshop Fermilab 02/14/2002 Christian Schmitt Wuppertal University Taking advantage of GRID software now.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
A python-based testing infrastructure Colin Bernet (IPNL)
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
Introduction to FCC Software FCC Istanbul 11 March, 2016 Alice Robson (CERN/UNIGE) on behalf of / with thanks to the FCC software group.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Five todos when moving an application to distributed HTC.
Aa Scripting SPM analyses using aa Rhodri Cusack.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Status of Analysis Software for FCC-ee Studies Colin Bernet (IPNL) FCC-ee Higgs Workshop 24th of September,
Papeete, French Polynesia
Problem Solving With C++ Doxygen Oct/Nov Introduction Doxygen is a documentation generator, a tool for writing software reference documentation.
A to Zh Tutorial Progress
Condor DAGMan: Managing Job Dependencies with Condor
Daniel Templeton, Cloudera, Inc.
OpenPBS – Distributed Workload Management System
MCproduction on the grid
U.S. ATLAS Grid Production Experience
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
IW2D migration to HTCondor
FCC Software Status Readiness for FCC-ee Physics
Work report Xianghu Zhao Nov 11, 2014.
Topics Introduction to Repetition Structures
HTCondor and LSST Stephen Pietrowicz Senior Research Programmer National Center for Supercomputing Applications HTCondor Week May 2-5, 2017.
Submit BOSS Jobs on Distributed Computing System
Configuration Management
Figure 2: Make a component
Building and Testing using Condor
HTCondor Command Line Monitoring Tool
Introduction to javadoc
Haiyan Meng and Douglas Thain
Compiling and Job Submission
Privilege Separation in Condor
Status of CVS repository Production databases Production tools
Topics Introduction Hardware and Software How Computers Store Data
This is where R scripts will load
Introduction to javadoc
Periodic Processes Chapter 9.
This is where R scripts will load
Credential Management in HTCondor
Presentation transcript:

FCC HtCondor Submission: Scripts to support production of a large number of FCC events: part of fcc_datasets 7 December 2017 Alice Robson

Example Example submission: source fcc_condor_submit.sh –p input_parameters.yaml –e 100000 –r 100 Number of events per job Input parameters yaml Number of condor jobs

Input Parameters yaml (1) Label Directory for outputs name: "CMS_eeZZ” rate: 100000 base_outputdir: outputs/ gaudi_command: ''' $FCCSWBASEDIR/run fccrun.py simple_papas_condor.py --rpythiainput ee_ZZ.txt --routput output.root --rmaxevents 100000''' Main command Expected events per hour => choose queue

Input Parameters yaml (2) events per job name: "CMS_eeZZ” events: 10 runs: 2 rate: 100000 base_outputdir: /eos/experiment/fcc/ee/datasets/papas/ xrdcp_base: root://eospublic.cern.ch/ input: $FCCDATASETS/htcondor/examples/papas/ee_ZZ.txt script: $FCCDATASETS/htcondor/examples/papas/simple_papas_condor.py gaudi_command: ''' $FCCSWBASEDIR/run fccrun.py {} --rpythiainput {} --routput output.root --rmaxevents {}''.format(condor_pars["script"], condor_pars["input"], condor_pars["events"])' number of jobs for xrdcp with eos Input parameters used by gaudi_command source fcc_condor_submit.sh –p input_parameters.yaml –e 100000 –r 100 overrides values in yaml

FCC_condor_submit.sh What does fcc_condor_submit.sh do? Calls fcc_condor_setup.py:- creates uniquely named directory inside - working directory (not EOS) - output directory (may be EOS) writes a parameter.yaml file creates error/log/output directories in working directory (for condor) copies across files needed for condor runs chooses the queue type (timing) writes a condor dag.sub to submit several jobs (run.sub/run.sh/run.py) sets final summary stage of condor dag (finish.sub/finish.sh/finish.py) Unsets some local python related env variables Submits dag job Resets env variables

Working Directory CMS_eeZZ_20171115_e100000_r10_0 NAME_YYYYMMDD_eEVENTS_rRUNS_COUNTER Unique working directory name creates final summary info.yaml used by condor Main condor dag submission file submission for each run Working directory after completion of runs All other files are logs/errors/outputs from condor

Dag submission file Automatically generated Dag submission file ######DAG file Job A0 run.sub Vars A0 runnumber="0" Job A1 run.sub Vars A1 runnumber="1" Job A2 run.sub ... Job A9 run.sub Vars A9 runnumber="9" FINAL FO finish.sub runs when all the other jobs have finished and produces a summary info.yaml file

Outputs (EOS or otherwise) CMS_eeZZ_20171115_e100000_r10_0 Summary yaml file Output files Other files are parameter/configuration files

Summary info.yaml file Run details unique directory_name parameters:   base_outputdir: /eos/experiment/fcc/ee/datasets/papas/   events: 3   gaudi_command: '''LD_PRELOAD=$FCCSWBASEDIR/build.$BINARY_TAG/lib/libPapasUtils.so     $FCCSWBASEDIR/run  fccrun.py {}   --rpythiainput  {} --routput output.root --rmaxevents     {}''.format(condor_pars["script"], condor_pars["input"], condor_pars["events"])'   input: /afs/cern.ch/work/a/alrobson/papasdagruns/fcc-ee-higgs/ee_ZZ.txt   name: CMS_ee_ZZ   parameters: papas_CMS_ee_ZZ.yaml   rate: 50000   runs: 2   script: $FCCDATASETS/htcondor/examples/papas/simple_papas_condor.py   subdirectory: CMS_ee_ZZ_20171204_e3_r2_20   xrdcp_base: root://eospublic.cern.ch// sample:   id: !!python/object:uuid.UUID     int: 220006486981591175498105702182002049109   jobtype: fccsw   mother: null   nevents: 6   nfiles: 2   ngoodfiles: 2   pattern: '*.root' software:   fccdag: /cvmfs/fcc.cern.ch/sw/0.8.1/dag/0.1/x86_64-slc6-gcc62-opt   fccedm: /cvmfs/fcc.cern.ch/sw/0.8.1/fcc-edm/0.5.1/x86_64-slc6-gcc62-opt   fccpapas: /cvmfs/fcc.cern.ch/sw/0.8.1/papas/1.2.0/x86_64-slc6-gcc62-opt   fccphysics: /cvmfs/fcc.cern.ch/sw/0.8.1/fcc-physics/0.2.1/x86_64-slc6-gcc62-opt   fccsw: !!python/unicode 'dddd362ea142b51c25d11eb357155fcd2a19a38a'   fccswstack: /cvmfs/fcc.cern.ch/sw/0.8.1   podio: /cvmfs/fcc.cern.ch/sw/0.8.1/podio/0.7/x86_64-slc6-gcc62-opt   pythia8: /cvmfs/sft.cern.ch/lcg/views/LCG_88/x86_64-slc6-gcc62-opt   root: /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.08.06-c8fb4/x86_64-slc6-gcc62-opt Run details unique directory_name How many events were successfully produced Software versions

Not just for Papas: delphes example source fcc_condor_submit.py –p delphes_parameters.yaml –e 10000 –r 10 default_parameters: name: delphes events: 10   runs: 2   rate: 10000   base_outputdir: /eos/experiment/fcc/ee/datasets/papas/ xrdcp_base: root://eospublic.cern.ch   script: $FCCDATASETS/htcondor/examples/delphes/PythiaDelphes_config.py   gaudi_command: ''' $FCCSWBASEDIR/run  fccrun.py  {} --nevents {} ''.format( condor_pars["script"], condor_pars["events"])'  

How to get going Install FCCSW >source init.sh Install fcc_datasets Create a base working directory Create input_parameters.yaml (see examples in fcc_datasets) Submit: source fcc_condor_submit.sh –p input_parameters.yaml –e 100000 –r 10 See also: fcc_datasets/htcondor/CondorSubmit.md

How fast? Pythia Generation with Papas Runs Depends on configuration... and on batch machine... Typical so far:- 100000 events/hour (25 events/sec) (roughly between 50 and 20 events/sec) So with 100 runs of 1 000 000 events run overnight could produce 100 000 000 events. NB Condor time not normalized: a job in the « 1 hour » queue finishes 1 hour after execution starts (real time)

Condor Comments Documentation is not great Need to be careful with python environment Cannot have the condor working directory on EOS Condor not stable:- issues with failed submissions (store_cred) eos out of order for several days several machines await a cvfms patch and cause jobs to fail ... but hopefully scripts are now more robust 