FCC HtCondor Submission: Scripts to support production of a large number of FCC events: part of fcc_datasets 7 December 2017 Alice Robson
Example Example submission: source fcc_condor_submit.sh –p input_parameters.yaml –e 100000 –r 100 Number of events per job Input parameters yaml Number of condor jobs
Input Parameters yaml (1) Label Directory for outputs name: "CMS_eeZZ” rate: 100000 base_outputdir: outputs/ gaudi_command: ''' $FCCSWBASEDIR/run fccrun.py simple_papas_condor.py --rpythiainput ee_ZZ.txt --routput output.root --rmaxevents 100000''' Main command Expected events per hour => choose queue
Input Parameters yaml (2) events per job name: "CMS_eeZZ” events: 10 runs: 2 rate: 100000 base_outputdir: /eos/experiment/fcc/ee/datasets/papas/ xrdcp_base: root://eospublic.cern.ch/ input: $FCCDATASETS/htcondor/examples/papas/ee_ZZ.txt script: $FCCDATASETS/htcondor/examples/papas/simple_papas_condor.py gaudi_command: ''' $FCCSWBASEDIR/run fccrun.py {} --rpythiainput {} --routput output.root --rmaxevents {}''.format(condor_pars["script"], condor_pars["input"], condor_pars["events"])' number of jobs for xrdcp with eos Input parameters used by gaudi_command source fcc_condor_submit.sh –p input_parameters.yaml –e 100000 –r 100 overrides values in yaml
FCC_condor_submit.sh What does fcc_condor_submit.sh do? Calls fcc_condor_setup.py:- creates uniquely named directory inside - working directory (not EOS) - output directory (may be EOS) writes a parameter.yaml file creates error/log/output directories in working directory (for condor) copies across files needed for condor runs chooses the queue type (timing) writes a condor dag.sub to submit several jobs (run.sub/run.sh/run.py) sets final summary stage of condor dag (finish.sub/finish.sh/finish.py) Unsets some local python related env variables Submits dag job Resets env variables
Working Directory CMS_eeZZ_20171115_e100000_r10_0 NAME_YYYYMMDD_eEVENTS_rRUNS_COUNTER Unique working directory name creates final summary info.yaml used by condor Main condor dag submission file submission for each run Working directory after completion of runs All other files are logs/errors/outputs from condor
Dag submission file Automatically generated Dag submission file ######DAG file Job A0 run.sub Vars A0 runnumber="0" Job A1 run.sub Vars A1 runnumber="1" Job A2 run.sub ... Job A9 run.sub Vars A9 runnumber="9" FINAL FO finish.sub runs when all the other jobs have finished and produces a summary info.yaml file
Outputs (EOS or otherwise) CMS_eeZZ_20171115_e100000_r10_0 Summary yaml file Output files Other files are parameter/configuration files
Summary info.yaml file Run details unique directory_name parameters: base_outputdir: /eos/experiment/fcc/ee/datasets/papas/ events: 3 gaudi_command: '''LD_PRELOAD=$FCCSWBASEDIR/build.$BINARY_TAG/lib/libPapasUtils.so $FCCSWBASEDIR/run fccrun.py {} --rpythiainput {} --routput output.root --rmaxevents {}''.format(condor_pars["script"], condor_pars["input"], condor_pars["events"])' input: /afs/cern.ch/work/a/alrobson/papasdagruns/fcc-ee-higgs/ee_ZZ.txt name: CMS_ee_ZZ parameters: papas_CMS_ee_ZZ.yaml rate: 50000 runs: 2 script: $FCCDATASETS/htcondor/examples/papas/simple_papas_condor.py subdirectory: CMS_ee_ZZ_20171204_e3_r2_20 xrdcp_base: root://eospublic.cern.ch// sample: id: !!python/object:uuid.UUID int: 220006486981591175498105702182002049109 jobtype: fccsw mother: null nevents: 6 nfiles: 2 ngoodfiles: 2 pattern: '*.root' software: fccdag: /cvmfs/fcc.cern.ch/sw/0.8.1/dag/0.1/x86_64-slc6-gcc62-opt fccedm: /cvmfs/fcc.cern.ch/sw/0.8.1/fcc-edm/0.5.1/x86_64-slc6-gcc62-opt fccpapas: /cvmfs/fcc.cern.ch/sw/0.8.1/papas/1.2.0/x86_64-slc6-gcc62-opt fccphysics: /cvmfs/fcc.cern.ch/sw/0.8.1/fcc-physics/0.2.1/x86_64-slc6-gcc62-opt fccsw: !!python/unicode 'dddd362ea142b51c25d11eb357155fcd2a19a38a' fccswstack: /cvmfs/fcc.cern.ch/sw/0.8.1 podio: /cvmfs/fcc.cern.ch/sw/0.8.1/podio/0.7/x86_64-slc6-gcc62-opt pythia8: /cvmfs/sft.cern.ch/lcg/views/LCG_88/x86_64-slc6-gcc62-opt root: /cvmfs/sft.cern.ch/lcg/releases/ROOT/6.08.06-c8fb4/x86_64-slc6-gcc62-opt Run details unique directory_name How many events were successfully produced Software versions
Not just for Papas: delphes example source fcc_condor_submit.py –p delphes_parameters.yaml –e 10000 –r 10 default_parameters: name: delphes events: 10 runs: 2 rate: 10000 base_outputdir: /eos/experiment/fcc/ee/datasets/papas/ xrdcp_base: root://eospublic.cern.ch script: $FCCDATASETS/htcondor/examples/delphes/PythiaDelphes_config.py gaudi_command: ''' $FCCSWBASEDIR/run fccrun.py {} --nevents {} ''.format( condor_pars["script"], condor_pars["events"])'
How to get going Install FCCSW >source init.sh Install fcc_datasets Create a base working directory Create input_parameters.yaml (see examples in fcc_datasets) Submit: source fcc_condor_submit.sh –p input_parameters.yaml –e 100000 –r 10 See also: fcc_datasets/htcondor/CondorSubmit.md
How fast? Pythia Generation with Papas Runs Depends on configuration... and on batch machine... Typical so far:- 100000 events/hour (25 events/sec) (roughly between 50 and 20 events/sec) So with 100 runs of 1 000 000 events run overnight could produce 100 000 000 events. NB Condor time not normalized: a job in the « 1 hour » queue finishes 1 hour after execution starts (real time)
Condor Comments Documentation is not great Need to be careful with python environment Cannot have the condor working directory on EOS Condor not stable:- issues with failed submissions (store_cred) eos out of order for several days several machines await a cvfms patch and cause jobs to fail ... but hopefully scripts are now more robust