Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Job Router
2 Dan, Condor Week 2008 The Job Router A Flexible Job Transformer › Acts upon jobs in queue › Policy controls when: (jobs currently routed to site X) < max (idle jobs routed to site X) < max (rate of recent failure at site X) < max › And how to: Change attribute values (e.g. Universe) Insert new attributes (e.g. GridResource) Other arbitrary actions in hooks
3 Dan, Condor Week 2008 Example: sending excess vanilla jobs to a grid site Universe = “vanilla” Executable = “sim” Arguments = “seed=345” Output = “stdout.345” Error = “stderr.345” ShouldTransferFiles = True WhenToTransferOutput = “ON_EXIT” Universe = “grid” GridType = “gt2” GridResource = \ “cmsgrid01.hep.wisc.edu/jobmanager-condor” Executable = “sim” Arguments = “seed=345” Output = “stdout” Error = “stderr” ShouldTransferFiles = True WhenToTransferOutput = “ON_EXIT” JobRouter Routing Table: Site 1 … Site 2 … final status routed (grid) joboriginal (vanilla) job
4 Dan, Condor Week 2008 Using the Job Router › Job Router is a daemon › Disabled by default › To use… Modify the condor_config.local file Run condor_reconfig and condor_on Submit jobs that want the Job Router
# Enable the Job Router DAEMON_LIST = $(DAEMON_LIST) JOB_ROUTER # These settings become the default settings for all routes JOB_ROUTER_DEFAULTS = \ [ \ requirements=target.WantJobRouter is True; \ MaxIdleJobs = 50; \ MaxJobs = 200; \ /* now modify routed job attributes */ \ delete_WantJobRouter = true; \ set_x509userproxy = "/home/jfrey/epikh.proxy"; \ ] # Now we define each of the routes to send jobs on JOB_ROUTER_ENTRIES = \ [ GridResource = "cream cream/services/CREAM2 condor stress3.chtc.wisc.edu"; \ name = "CHTC"; ] # How often the job router should check for jobs JOB_ROUTER_POLLING_PERIOD = 10 Config File Settings 5
Sample Job › You can use any vanilla job, but here’s a simple example: #!/bin/sh /bin/date echo echo whoami /usr/bin/whoami echo echo hostname /bin/hostname sleep 300 6
Submit File › Note the added line universe = vanilla executable = job.sh output = out.$(cluster).$(process) error = err.$(cluster).$(process) log = job.log should_transfer_files = YES when_to_transfer_output = ON_EXIT +WantJobRouter = time() - EnteredCurrentStatus > 60 queue 50 7