Condor Administration in the Open Science Grid University of Wisconsin, Madison Dan Bradley Representing Condor & DISUN (CMS) Teams
What is Condor to OSG? Batch System Multi-purpose Job Queue Flexible Multi-purpose Job Queue Submit jobs to local cluster Submit jobs to the grid (a.k.a. Condor-G) Queue for fork jobs (managed-fork) Grid Overlay Condor Glidein Grid-wide Collector/Negotiator OSG ReSS service
Example: Submitting Jobs within UW Campus Grid Requirements = Arch == “INTEL” && Disk >= DiskUsage && Memory*1024 >= ImageSize ImageSize = 300000 DiskUsage = 100000 OSG_VO = “uscms” Requirements = MY.PassedTest =!= False Rank = OSG_VO =?= “uscms” Arch = “INTEL” Disk = 20000000 Memory = 2000 PassedTest = True HEP matchmaker CS matchmaker GLOW matchmaker schedd (Job caretaker) condor_submit job ClassAd flocking startd (Job Executor) machine ClassAd job runs
Example: Submitting jobs through OSG to UW Campus HEP matchmaker CS matchmaker GLOW matchmaker flocking Globus gatekeeper schedd (Job caretaker) condor_submit schedd (Job caretaker) job ClassAd startd (Job Executor) machine ClassAd condor gridmanager job runs
condor_status gives information about the pool: Name OpSys Arch State Activ LoadAv Mem ActvtyTime perdita.cs.wi LINUX INTEL Owner Idle 0.020 511 0+02:28:42 coral.cs.wisc LINUX INTEL Claimed Busy 0.990 511 0+01:27:21 doc.cs.wisc.e LINUX INTEL Unclaimed Idle 0.260 511 0+00:20:04 dsonokwa.cs.w LINUX INTEL Claimed Busy 0.810 511 0+00:01:45 ferdinand.cs. LINUX INTEL Claimed Suspe 1.130 511 0+00:00:55 To inspect full ClassAds: condor_status -long
condor_submit & condor_q % condor_submit sim.submit Submitting job(s). 1 job(s) submitted to cluster 1. % condor_q -- Submitter: perdita.cs.wisc.edu : <128.105.165.34:1027> : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 frieda 6/16 06:52 0+00:00:00 I 0 0.0 sim.exe 1 jobs; 1 idle, 0 running, 0 held %
View the full ClassAd % condor_q -long -- Submitter: perdita.cs.wisc.edu : <128.105.165.34:1027> : MyType = “Job” TargetType = “Machine” ClusterId = 1 QDate = 1150921369 CompletionDate = 0 Owner = “frieda” RemoteWallClockTime = 0.000000 LocalUserCpu = 0.000000 LocalSysCpu = 0.000000 RemoteUserCpu = 0.000000 RemoteSysCpu = 0.000000 ExitStatus = 0 …
View Specific Job Attributes The unix way: % condor_q -long | grep UserLog /home/frieda/a.log /home/frieda/b.log The C way: % condor_q -format “ %s\n” UserLog 5.0 /home/frieda/a.log 6.0 /home/frieda/b.log The XML way … (and there’s a direct SOAP interface too) % condor_q -xml …
Looking at Held Jobs See why jobs are on hold: % condor_q –held -- Submiter: x.cs.wisc.edu : <128.105.121.53:510> :x.cs.wisc.edu ID OWNER HELD_SINCE HOLD_REASON 6.0 frieda 4/20 13:23 Error from starter on vm1@skywalker.cs.wisc 9 jobs; 8 idle, 0 running, 1 held See full details for a job: % condor_q –long 6.0
The Job’s “User Log” Look in the job log for clues: % cat b.log 000 (031.000.000) 04/20 14:47:31 Job submitted from host: <128.105.121.53:48740> ... 007 (031.000.000) 04/20 15:02:00 Shadow exception! Error from starter on gig06.stat.wisc.edu: Failed to open '/scratch.1/frieda/workspace/v67/condor-test/test3/run_0/b.input' as standard input: No such file or directory (errno 2) 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job
Using Constraints % condor_q -constraint ‘JobStatus == 5 && HoldReasonCode == 14 && HoldReasonSubCode == 2’ -- Submitter: login03.hep.wisc.edu : <144.92.180.6:32769> : login03.hep.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 2687.0 frieda 10/19 11:37 0+00:00:02 H 0 9.8 hello_world 2688.0 frieda 10/19 11:39 0+00:00:01 H 0 9.8 hello_world % sudo condor_rm -constraint ‘JobStatus == 5 && HoldReasonCode == 14 && HoldReasonSubCode == 2’ Jobs matching constraint (JobStatus == 5 && HoldReasonCode == 14 && HoldReasonSubCode == 2) have been marked for removal The queue super user: QUEUE_SUPER_USERS = root, condor
Job History % condor_history % condor_history -long 2.0 1.0 frieda 4/20 01:08 0+01:15:55 C 4/20 04:19 /home/frieda/a 2.0 frieda 4/20 01:08 0+00:38:01 C 4/20 04:20 /home/frieda/b Extract ClassAd details just like condor_q. Example: % condor_history -long 2.0 Configurable history depth: MAX_HISTORY_LOG = 20971520 (20MB)
Resource Allocation
Fair Share User priority Example: two users, 60 batch slots Inversely proportional to fair share Example: two users, 60 batch slots priority 50 - gets 40 slots priority 100 - gets 20 slots
Fair Share Dynamics User priority changes over time Example: wants to be equal to number of slots in use Example: User steadily running 100 jobs: priority 100 Stops running jobs: 1 day later: priority 50 2 days later: priority 25 Configure speed of adjustment: PRIORITY_HALFLIFE = 86400
Modified Fair Share User Priority Factor Example: multiplies the “real user priority” result is called “effective user priority” Example: condor_userprio -setfactor cdf 4.0 condor_userprio -setfactor cms 1.0 cdf steadily uses 10 slots - priority 40 cms steadily uses 20 slots - priority 20
Reporting Condor Pool Usage % condor_userprio -usage -allusers Last Priority Update: 7/30 09:59 Accumulated Usage Last User Name Usage (hrs) Start Time Usage Time ------------------------------ ----------- ---------------- ---------------- … osg_usatlas1@hep.wisc.edu 599739.09 4/18/2006 14:37 7/30/2007 07:24 jherschleb@lmcg.wisc.edu 799300.91 4/03/2006 12:56 7/30/2007 09:59 szhou@lmcg.wisc.edu 1029384.68 4/03/2006 12:56 7/30/2007 09:59 osg_cmsprod@hep.wisc.edu 2013058.70 4/03/2006 16:54 7/30/2007 09:59 Number of users: 271 8517482.95 4/03/2006 12:56 7/29/2007 10:00 When upgrading Condor, preserve the central manager’s AccountantLog Happens automatically if you follow general rule: preserve Condor’s LOCAL_DIR
A Flexible Mechanism for Resource Allocation: The ClassAd Way
Machine Rank Numerical expression: Example: higher number preempts lower number rank trumps user priority considerations, (including PREEMPTION_REQUIREMENTS) Example: CMS gets 1st prio, CDF gets 2nd, others 3rd RANK = 2*(Owner == “cms@hep.wisc.edu”) + 1*(Owner == “cdf@hep.wisc.edu”)
Another Rank Example Rank = (Group =?= "LMCG") * (1000 + RushJob)
Note on Scope of Condor Policies pool-wide scope: example negotiator user priorities, factors, etc. preemption policy related to user priority steering jobs via negotiator job rank execute machine/slot scope: startd machine rank, requirements preemption/suspension policy customized machine ClassAd values submit machine scope automatic additions to job rank, requirements, and insertion of arbitrary ClassAd attributes personal scope environmental configurations: _CONDOR_<config val>=value
Preemption Policy Should Condor jobs yield to non-condor activity on the machine? Should some types of jobs never be interrupted? After 4 days? Should some jobs immediately preempt others? After 30 minutes? Is suspension more desirable than killing? Can need for preemption be decreased by steering jobs towards the right machines?
Example Preemption Policy When a claim is preempted, do not allow killing of jobs younger than 4 days old. MaxJobRetirementTime = 3600 * 24 * 4 Applies to all forms of preemption: user priority, machine rank, machine activity
Another Preemption Policy Expression can refer to attributes of batch slot and job, so can be highly customized. MaxJobRetirementTime = 3600 * 24 * 4 * (OSG_VO =?= “uscms”)
Preemption Policy Pitfall If you disable all forms of preemption, you probably want to limit lifespan of claims: PREEMPTION_REQUIRMENTS = False PREEMPT = False RANK = 0 CLAIM_WORKLIFE = 3600 Otherwise, reallocation of resources will not happen until a user’s jobs dry up.
Resource Allocation: Group Accounting
Fair Sharing Between Groups Useful when: multiple user ids belong to same group group’s share of pool is not tied to specific machines # Example group settings GROUP_NAMES = group_cms, group_cdf GROUP_QUOTA_group_cms = 200 GROUP_QUOTA_group_cdf = 100 GROUP_AUTOREGROUP = True GROUP_PRIO_FACTOR_group_cms = 10 GROUP_PRIO_FACTOR_group_cdf = 10 DEFAULT_PRIO = 100 http://hepuser.ucsd.edu/twiki/bin/view/UCSDTier2/UCSDCondorGroupConfig
Group Sharing cntd There are different modes of group accounting users within group limited by same shared quota users in group also share same priority/usage history The job advertises its own group identity: +AccountingGroup = “group_cms.cmsprod” +AccountingGroup = “group_cms” For OSG jobs, that means modifying the Globus jobmanager for Condor. $(OSG_LOCATION)/globus/lib/perl/Globus/GRAM/JobManager/condor.pm
More Condor Jobmanager Hacks
condor.pm hacks Add an attribute to all jobs, specifying the VO affiliation of the user. Use Condor’s file staging to reduce reliance on NFS (UCSD’s NFSLite) Add a wrapper script to the job. Clean up orphaned jobs. http://www.hep.wisc.edu/~dan/condor_jobmanager_hacks/
Before hacking… Keep in mind the condor submit configuration options that exist: APPEND_REQUIREMENTS APPEND_RANK SUBMIT_EXPRS If you use managed fork, you probably do not want to append the same requirements to “vanilla” and “local” universe jobs. Use APPEND_REQ_VANILLA
Example: dedicated batch slot I want software installation jobs from CMS VO to run immediately on specific, dedicated machines. On the execute node: IsCMSSoftSlot = True STARTD_ATTRS = IsCMSSoftSlot START = Owner == "osg_cmssoft" On the submit node (i.e. the CE) APPEND_REQ_VANILLA = MY.Owner != "osg_cmssoft" || MY.Owner == "osg_cmssoft" && TARGET.IsCMSSoftSlot =?= True
Alternative to dedicated slot Rather than preempting other jobs, suspend them while high priority job runs. More complex, but doable Details in Condor manual: Job Suspension
My Favorite Condor Feature Condor is not perfect. But we try to be directly engaged with users and move it in a helpful direction.
Need scalability tuning advice for Condor 6.8? http://www.cs.wisc.edu/CondorWeek2007/large_condor_pools.html