Presentation is loading. Please wait.

Presentation is loading. Please wait.

IW2D migration to HTCondor

Similar presentations


Presentation on theme: "IW2D migration to HTCondor"— Presentation transcript:

1 IW2D migration to HTCondor
D.Amorim Thanks to N.Biancacci, A.Mereghetti IW2D migration to HTCondor

2 IW2D migration to HTCondor
Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion IW2D migration to HTCondor

3 IW2D migration to HTCondor
Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion IW2D migration to HTCondor

4 IW2D migration to HTCondor
Motivation ImpedanceWake2D jobs can be run on the batch system from an lxplus machine allowing to have multiple computations running in parallel. Extensively used for LHC, HL-LHC and FCC impedance scenarios (~40 jobs for the collimators and ~10 for the different beam screens) The batch service has been migrated from LSF (IBM, proprietary) to HTCondor (U. of Wisconsin-Madison, open-source) Only 10% of the computers will remain on LSF until the end of 2017 LSF will be shutdown in 2018 and the remaining computers will be transferred to HTCondor IW2D migration to HTCondor

5 IW2D migration to HTCondor
Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion IW2D migration to HTCondor

6 What is different for the user
Changes are mostly transparent for the users workflow: Python functions keep the same arguments Results files are written in the same folders The queue argument used for LSF (1nh, 8nh, 1nd…) is not used by HTCondor lxplusbatch = None : run on local computer lxplusbatch = ‘launch’ : submit the jobs to HTCondor lxplusbatch = ‘retrieve’ : retrieve the results IW2D migration to HTCondor

7 What is different for the user
A job is submitted to a cluster identified by a unique number There are different ways to monitor the jobs From the command line: condor_q –nobatch will show all the jobs currently running From the website IW2D migration to HTCondor

8 IW2D migration to HTCondor
Monitoring the jobs From the command line: condor_q –nobatch Job cluster Run time Executable launched Job state R: Run I: Idle H: Held watch condor_q –nobatch to get a live view of the jobs (watch launch the command every two seconds) IW2D migration to HTCondor

9 IW2D migration to HTCondor
Monitoring the jobs From the website Data is refreshed every 5 minutes IW2D migration to HTCondor

10 IW2D migration to HTCondor
Managing the jobs condor_rm is used to delete jobs condor_rm <cluster> to delete a specific job condor_rm –all to delete all the user jobs HTCondor generates for each job (cluster) a log file, an output file and an error file log file contains the submission time, the execution time and machine, information on the job… output file contains the STDOUT of the executable: for IW2D it contains what is printed on the screen (calculation time) error file contains the errors encountered during execution (wrong input file format…) These files are stored along with the resulting impedance files No mail is sent to the user when the job finishes/fails/is removed IW2D migration to HTCondor

11 IW2D migration to HTCondor
Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion IW2D migration to HTCondor

12 How to get the latest version of IW2D
If git is used to manage the repository (git clone was used to download it): Go to the IW2D repository Do git pull Or download the archive from IW2D migration to HTCondor

13 IW2D migration to HTCondor
Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion IW2D migration to HTCondor

14 IW2D migration to HTCondor
Current issues Errors during job submission might arise ERROR: store_cred failed ERROR: failed to read any data from /usr/bin/batch_krb5_credential Seems to be a credential issue Problem submitted to IT, under investigation Job submission is slow: can take more to 10 minutes to submit 50 jobs Check that all the jobs were properly submitted, otherwise relaunch the script Problem solved by IT: No more credential errors and job submission is much faster IW2D migration to HTCondor

15 IW2D migration to HTCondor
Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion IW2D migration to HTCondor

16 IW2D migration to HTCondor
Conclusions HTCondor is now the default batch system at CERN. ImpedanceWake2D has been modified to handle HTCondor The change is mostly transparent for the user workflow The Python functions work the same The commands to monitor and manage the jobs change The IW2D repository on is up-to-date Problems remain during job submission, the issue is followed-up by IT Remarks/suggestions/bug reports on IW2D are welcome! Migration of DELPHI is also finished and will soon be uploaded IW2D migration to HTCondor

17 IW2D migration to HTCondor
References A list of useful commands for HTCondor CERN documentation for HTCondor Quick start guide for HTCondor from U. Winsconsin-Madison IW2D migration to HTCondor


Download ppt "IW2D migration to HTCondor"

Similar presentations


Ads by Google