The Condor JobRouter
aka “schedd on the side” Dan, Condor Week 2008
Status It’s in the current development series: Condor 7.1.0, unix (windows soonish) Used heavily by CMS physics experiment for simulation on Open Science Grid (millions of jobs routed) Dan, Condor Week 2008
What is “job routing”? original (vanilla) job routed (grid) job Universe = “vanilla” Executable = “sim” Arguments = “seed=345” Output = “stdout.345” Error = “stderr.345” ShouldTransferFiles = True WhenToTransferOutput = “ON_EXIT” Universe = “grid” GridType = “gt2” GridResource = \ “cmsgrid01.hep.wisc.edu/jobmanager-condor” Executable = “sim” Arguments = “seed=345” Output = “stdout” Error = “stderr” ShouldTransferFiles = True WhenToTransferOutput = “ON_EXIT” JobRouter Routing Table: Site 1 … Site 2 final status Dan, Condor Week 2008
Routing is just site-level matchmaking With feedback from job queue number of jobs currently routed to site X number of idle jobs routed to site X rate of recent success/failure at site X And with power to modify job ad change attribute values (e.g. Universe) insert new attributes (e.g. GridResource) add a “portal” grid proxy if desired Dan, Condor Week 2008
Configuring the Routing Table JOB_ROUTER_ENTRIES list site ClassAds in configuration file JOB_ROUTER_ENTRIES_FILE read site ClassAds periodically from a file JOB_ROUTER_ENTRIES_CMD read periodically from a script example: query a collector such as Open Science Grid Resource Selection Service Dan, Condor Week 2008
Syntax Read the 7.1 manual. It’s in the chapter on Grid Computing [ Name = “Grid Site 1”; GridResource = “gt2 gatekeeper…”; MaxIdleJobs = 10; FailureRateThreshold = 0.01; ] Dan, Condor Week 2008
What Types of Input Jobs? Vanilla Universe Self Contained (everything needed is in file transfer list) High Throughput (many more jobs than cpus) Dan, Condor Week 2008
What Target Grid Types? Globus, Condor-C work well others untested, but should be fine Why only target the grid universe? no reason at all 7.1.1 now allows any destination universe Dan, Condor Week 2008
Grid Gotchas Globus gt2 no exit status from job (reported as 0) must explicitly list desired output files Dan, Condor Week 2008
JobRouter vs. Glidein Glidein - Condor overlays the grid JobRouter job never waits in remote queue job runs in its normal universe private networks doable, but add to complexity need something to submit glideins on demand JobRouter some jobs wait in remote queue (MaxIdleJobs) job must be compatible with target grid semantics simple to set up, fully automatic to run Dan, Condor Week 2008