IW2D migration to HTCondor

Slides:



Advertisements
Similar presentations
Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
Advertisements

Cluster Computing at IQSS Alex Storer, Research Technology Consultant.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Computing Lectures Introduction to Ganga 1 Ganga: Introduction Object Orientated Interactive Job Submission System –Written in python –Based on the concept.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Asynchronous Solution Appendix Eleven. Training Manual Asynchronous Solution August 26, 2005 Inventory # A11-2 Chapter Overview In this chapter,
Israel Cluster Structure. Outline The local cluster Local analysis on the cluster –Program location –Storage –Interactive analysis & batch analysis –PBS.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
1 port BOSS on Wenjing Wu (IHEP-CC)
AQS Web Quick Reference Guide Changing Raw Data Values Using Maintenance 1. From Main Menu, click Maintenance, Sample Values, Raw Data 2. Enter monitor.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
June 6 th – 8 th 2005 Deployment Tool Set Synergy 2005.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
Information Systems and Network Engineering Laboratory II DR. KEN COSH WEEK 1.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Grid job submission using HTCondor Andrew Lahiff.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
July 28' 2011INDIA-CMS_meeting_BARC1 Tier-3 TIFR Makrand Siddhabhatti DHEP, TIFR Mumbai July 291INDIA-CMS_meeting_BARC.
Ganga A quick tutorial Asterios Katsifodimos Trainer, University of Cyprus Nicosia, Feb 16, 2009.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.
© Geodise Project, University of Southampton, Geodise Middleware Graeme Pound, Gang Xue & Matthew Fairman Summer 2003.
Module 6 Creating and Configuring Group Policy. Module Overview Overview of Group Policy Configuring the Scope of Group Policy Objects Evaluating the.
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013.
The Basics for BI How the data is constructed. In the current version Make it a habit at EVERY Attache site to Create the folders as shown on the left,
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
CACI Proprietary Information | Date 1 Upgrading to webMethods Product Suite Name: Semarria Rosemond Title: Systems Analyst, Lead Date: December 8,
Getting the Most out of HTC with Workflows Friday Christina Koch Research Computing Facilitator University of Wisconsin.
Advanced Computing Facility Introduction
Welcome to Indiana University Clusters
Development Environment
Information Systems and Network Engineering Laboratory II
OpenPBS – Distributed Workload Management System
Welcome to Indiana University Clusters
MCproduction on the grid
Recent Improvements in Collsoft
Intermediate HTCondor: Workflows Monday pm
Work report Xianghu Zhao Nov 11, 2014.
BOSS: the CMS interface for job summission, monitoring and bookkeeping
High Availability in HTCondor
BOSS: the CMS interface for job summission, monitoring and bookkeeping
FCC HtCondor Submission:
Condor: Job Management
FCT Follow-up Meeting 31 March, 2017 Fernando Meireles
PIC + TransNet.
Troubleshooting Your Jobs
Mike Becher and Wolfgang Rehm
Exploring the Power of EPDM Tasks - Working with and Developing Tasks in EPDM By: Marc Young XLM Solutions
Introduction to High Throughput Computing and HTCondor
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
Michael P. McCumber Task Force Meeting April 3, 2006
gLite Job Management Christos Theodosiou
A Scripting Server for Domain Automation Tasks
Credential Management in HTCondor
Troubleshooting Your Jobs
PU. Setting up parallel universe in your pool and when (not
Short Read Sequencing Analysis Workshop
Presentation transcript:

IW2D migration to HTCondor D.Amorim Thanks to N.Biancacci, A.Mereghetti 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Motivation ImpedanceWake2D jobs can be run on the batch system from an lxplus machine allowing to have multiple computations running in parallel. Extensively used for LHC, HL-LHC and FCC impedance scenarios (~40 jobs for the collimators and ~10 for the different beam screens) The batch service has been migrated from LSF (IBM, proprietary) to HTCondor (U. of Wisconsin-Madison, open-source) Only 10% of the computers will remain on LSF until the end of 2017 LSF will be shutdown in 2018 and the remaining computers will be transferred to HTCondor 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion 2017-06-02 IW2D migration to HTCondor

What is different for the user Changes are mostly transparent for the users workflow: Python functions keep the same arguments Results files are written in the same folders The queue argument used for LSF (1nh, 8nh, 1nd…) is not used by HTCondor lxplusbatch = None : run on local computer lxplusbatch = ‘launch’ : submit the jobs to HTCondor lxplusbatch = ‘retrieve’ : retrieve the results 2017-06-02 IW2D migration to HTCondor

What is different for the user A job is submitted to a cluster identified by a unique number There are different ways to monitor the jobs From the command line: condor_q –nobatch will show all the jobs currently running From the website https://batch-carbon.cern.ch/grafana/ 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Monitoring the jobs From the command line: condor_q –nobatch Job cluster Run time Executable launched Job state R: Run I: Idle H: Held watch condor_q –nobatch to get a live view of the jobs (watch launch the command every two seconds) 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Monitoring the jobs From the website https://batch-carbon.cern.ch/grafana Data is refreshed every 5 minutes 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Managing the jobs condor_rm is used to delete jobs condor_rm <cluster> to delete a specific job condor_rm –all to delete all the user jobs HTCondor generates for each job (cluster) a log file, an output file and an error file log file contains the submission time, the execution time and machine, information on the job… output file contains the STDOUT of the executable: for IW2D it contains what is printed on the screen (calculation time) error file contains the errors encountered during execution (wrong input file format…) These files are stored along with the resulting impedance files No mail is sent to the user when the job finishes/fails/is removed 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion 2017-06-02 IW2D migration to HTCondor

How to get the latest version of IW2D If git is used to manage the repository (git clone was used to download it): Go to the IW2D repository Do git pull Or download the archive from https://gitlab.cern.ch/IRIS/IW2D 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Current issues Errors during job submission might arise ERROR: store_cred failed ERROR: failed to read any data from /usr/bin/batch_krb5_credential Seems to be a credential issue Problem submitted to IT, under investigation Job submission is slow: can take more to 10 minutes to submit 50 jobs Check that all the jobs were properly submitted, otherwise relaunch the script Problem solved by IT: No more credential errors and job submission is much faster 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Outline Motivation In practice What is different for the user Monitoring the jobs Managing the jobs How to get the latest version Issues with HTCondor Conclusion 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor Conclusions HTCondor is now the default batch system at CERN. ImpedanceWake2D has been modified to handle HTCondor The change is mostly transparent for the user workflow The Python functions work the same The commands to monitor and manage the jobs change The IW2D repository on https://gitlab.cern.ch/IRIS/ is up-to-date Problems remain during job submission, the issue is followed-up by IT Remarks/suggestions/bug reports on IW2D are welcome! Migration of DELPHI is also finished and will soon be uploaded 2017-06-02 IW2D migration to HTCondor

IW2D migration to HTCondor References A list of useful commands for HTCondor http://www.iac.es/sieinvens/siepedia/pmwiki.php?n=HOWTOs.CondorUsefulCommands CERN documentation for HTCondor http://batchdocs.web.cern.ch/batchdocs/index.html Quick start guide for HTCondor from U. Winsconsin-Madison https://research.cs.wisc.edu/htcondor/manual/quickstart.html 2017-06-02 IW2D migration to HTCondor