Download presentation
Presentation is loading. Please wait.
Published byJames Hubbard Modified over 9 years ago
1
Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams dan@hep.wisc.edu http://www.cs.wisc.edu/condor Condor Administrator’s How-to
2
www.cs.wisc.edu/condor Dan, Condor Week 2008 Where to Find the Online How-to Collection 1. Go to http://www.cs.wisc.edu/condor/http://www.cs.wisc.edu/condor/ 2. Click on “Condor Admin How-to Recipes” Currently, that takes you here: http://nmi.cs.wisc.edu/node/1465 http://nmi.cs.wisc.edu/node/1465
3
www.cs.wisc.edu/condor Dan, Condor Week 2008 Brief Overview of Selected Bits
4
www.cs.wisc.edu/condor Dan, Condor Week 2008 Question › How does Condor decide which job gets to run on an execute machine?
5
www.cs.wisc.edu/condor Dan, Condor Week 2008 The Life of a Condor Job schedd (job queue) condor_submit startd (Job Executor) central manager (collector + negotiator) central manager 2 central manager 3 (collector + negotiator) flocking machine ClassAd job runs job ClassAd
6
www.cs.wisc.edu/condor Dan, Condor Week 2008 First Stop: Authorization › User must be authorized to submit to schedd ALLOW_WRITE = allow1, allow2, … DENY_WRITE = deny1, deny2, … user@uid_domain/network › By defualt, all authenticated users may submit jobs within trusted network ALLOW_WRITE = */network HOSTALLOW_WRITE = network (old style)
7
www.cs.wisc.edu/condor Dan, Condor Week 2008 Next Stop: The Job Queue › MAX_JOBS_RUNNING = 200 › Job priority = integer orders a user’s jobs higher priority will run sooner
8
www.cs.wisc.edu/condor Dan, Condor Week 2008 Authorization of the Schedd to Join Pool › ALLOW_ADVERTISE_SCHEDD DENY_ADVERTISE_SCHEDD Default: ALLOW/DENY_DAEMON Default: ALLOW/DENY_WRITE › COLLECTOR_REQUIREMENTS Default: true
9
www.cs.wisc.edu/condor Dan, Condor Week 2008 Next Stop: Negotiator Fair Share User priority Inversely proportional to fair share Example: two users, 60 batch slots priority 50- gets 40 slots priority 100- gets 20 slots
10
www.cs.wisc.edu/condor Dan, Condor Week 2008 Fair Share Dynamics › User priority changes over time wants to be equal to number of slots in use › Example: User steadily running 100 jobs: priority 100 Stops running jobs: 1 day later: priority 50 2 days later: priority 25 › Configure speed of adjustment: PRIORITY_HALFLIFE = 86400
11
www.cs.wisc.edu/condor Dan, Condor Week 2008 Modified Fair Share › User Priority Factor multiplies the “real user priority” result is called “effective user priority” › Example: condor_userprio -setfactor atlas@hep.wisc.edu 4.0 condor_userprio -setfactor cms@hep.wisc.edu 1.0 atlas steadily uses 10 slots - effective priority 40 cms steadily uses 20 slots - effective priority 20
12
www.cs.wisc.edu/condor Dan, Condor Week 2008 Reporting Condor Pool Usage % condor_userprio -usage -allusers Last Priority Update: 7/30 09:59 Accumulated Usage Last User Name Usage (hrs) Start Time Usage Time ------------------------------ ----------- ---------------- ---------------- … osg_usatlas1@hep.wisc.edu 599739.09 4/18/2006 14:37 7/30/2007 07:24 jherschleb@lmcg.wisc.edu 799300.91 4/03/2006 12:56 7/30/2007 09:59 szhou@lmcg.wisc.edu 1029384.68 4/03/2006 12:56 7/30/2007 09:59 osg_cmsprod@hep.wisc.edu 2013058.70 4/03/2006 16:54 7/30/2007 09:59 ------------------------------ ----------- ---------------- ---------------- Number of users: 271 8517482.95 4/03/2006 12:56 7/29/2007 10:00 › When upgrading Condor, preserve the central manager ’ s AccountantLog Happens automatically if you follow general rule: preserve Condor ’ s LOCAL_DIR
13
www.cs.wisc.edu/condor Dan, Condor Week 2008 Matchmaking › Job requirements and machine requirements must both be met › Machine requirements are configured via the START expression START = Owner == "appinstaller"
14
www.cs.wisc.edu/condor Dan, Condor Week 2008 Adding to Job Requirements APPEND_REQUIREMENTS = MY.Owner != "appinstaller" || TARGET.IsAppInstallerMachine =?= True
15
www.cs.wisc.edu/condor Dan, Condor Week 2008 Adding Attribute to Machine ClassAd IsAppInstallerMachine = True STARTD_ATTRS = $(STARTD_ATTRS) IsAppInstallerMachine
16
www.cs.wisc.edu/condor Dan, Condor Week 2008 Choosing Between Matching Machines 1. NEGOTIATOR_PRE_JOB_RANK 2. job rank expression 3. NEGOTIATOR_POST_JOB_RANK 4. PREEMPTION_RANK
17
www.cs.wisc.edu/condor Dan, Condor Week 2008 Example NEGOTIATOR_PRE_JOB_RANK = (IsDesktop =!= True && isUndefined(RemoteOwner)) + isUndefined(RemoteOwner) › Most desirable to least: 2 unclaimed and not a desktop 1 unclaimed and desktop 0 claimed
18
www.cs.wisc.edu/condor Dan, Condor Week 2008 Authorizing Schedd to Claim Startd › ALLOW/DENY_WRITE › It is the schedd which is authorized by the startd, not the user.
19
www.cs.wisc.edu/condor Dan, Condor Week 2008 Preemption
20
www.cs.wisc.edu/condor Dan, Condor Week 2008 Machine Rank › Numerical expression: higher number preempts lower number user priority is secondary to rank, because higher rank job preempts claim to machine › Example: CMS gets 1st prio, CDF gets 2nd, others 3rd RANK = 2*(User == “cms@hep.wisc.edu”) + 1*(User == “cdf@hep.wisc.edu”)
21
www.cs.wisc.edu/condor Dan, Condor Week 2008 Another Rank Example Rank = (Group =?= "LMCG") * (1000 + RushJob)
22
www.cs.wisc.edu/condor Dan, Condor Week 2008 Note on Scope of Condor Policies › pool-wide scope: example negotiator user priorities, factors, etc. preemption policy related to user priority steering jobs via negotiator job rank › execute machine/slot scope: startd machine rank, requirements preemption/suspension policy customized machine ClassAd values › submit machine scope queue policy, automatic additions to job requirements, and insertion of arbitrary ClassAd attributes into job › personal scope environmental configurations: _CONDOR_ =value
23
www.cs.wisc.edu/condor Dan, Condor Week 2008 Preemption Policy › Should Condor jobs yield to non-condor activity on the machine? › Should some types of jobs never be interrupted? After 4 days? › Should some jobs immediately preempt others? After 30 minutes? › Is suspension more desirable than killing? › Can need for preemption be decreased by steering jobs towards the right machines?
24
www.cs.wisc.edu/condor Dan, Condor Week 2008 Example Preemption Policy When a claim is preempted, do not allow killing of jobs younger than 4 days old. MaxJobRetirementTime = 3600 * 24 * 4 › Applies to all forms of preemption: user priority, machine rank, machine activity, graceful shutdown
25
www.cs.wisc.edu/condor Dan, Condor Week 2008 Another Preemption Policy › Expression can refer to attributes of batch slot and job, so can be highly customized. MaxJobRetirementTime = 3600 * 24 * 4 * (OSG_VO =?= “uscms”)
26
www.cs.wisc.edu/condor Dan, Condor Week 2008 More Preemption Controls › PREEMPTION_REQUIREMENTS controls user-priority based preemption at the level of the negotiator › PREEMPT/SUSPEND controls preemption by machine activity (e.g. keyboard or cpu activity) › RANK allows preemption by more desirable jobs
27
www.cs.wisc.edu/condor Dan, Condor Week 2008 Preemption Policy Pitfall › If you disable all forms of preemption, you probably want to limit lifespan of claims: PREEMPTION_REQUIRMENTS = False PREEMPT = False RANK = 0 CLAIM_WORKLIFE = 3600 Otherwise, reallocation of resources will not happen until a user runs out of matching jobs.
28
www.cs.wisc.edu/condor Dan, Condor Week 2008 What Happens to Preempted Jobs? › Back to idle in job queue NumJobStarts >= 1 › job policy: periodic_hold, periodic_remove › admin policy: SYSTEM_PERIODIC_HOLD SYSTEM_PERIODIC_REMOVE
29
www.cs.wisc.edu/condor Dan, Condor Week 2008 Back to the Negotiator: Group Accounting
30
www.cs.wisc.edu/condor Dan, Condor Week 2008 Fair Sharing Between Groups Useful when: multiple user ids belong to same group group’s share of pool is not tied to specific machines # Example group settings GROUP_NAMES = group_physics, group_chemistry GROUP_QUOTA_group_physics = 200 GROUP_QUOTA_group_chemistry = 100 GROUP_AUTOREGROUP = True GROUP_PRIO_FACTOR_group_physics = 10 GROUP_PRIO_FACTOR_group_chemistry = 10 DEFAULT_PRIO_FACTOR = 100
31
www.cs.wisc.edu/condor Dan, Condor Week 2008 Setting Group Identity The job advertises its own group identity: +AccountingGroup = “group_physics.dan” group name group user Anyone can declare any identity. This is not the unix/windows identity the job runs as. It is solely for accounting and prioritization purposes.
32
www.cs.wisc.edu/condor Dan, Condor Week 2008 Monitoring Usage % condor_userprio -usage -allusers Last Priority Update: 7/30 09:59 Accumulated Usage Last User Name Usage (hrs) Start Time Usage Time ------------------------------ ----------- ---------------- ---------------- … group_physics.atlas@hep.wisc.edu 599739.09 4/18/2006 14:37 7/30/2007 07:24 group_physics.cms@hep.wisc.edu 799300.91 4/03/2006 12:56 7/30/2007 09:59 group_chemistry.han@che.wisc.edu 1029384.68 4/03/2006 12:56 7/30/2007 09:59 group_chemistry.ben@che.wisc.edu 2013058.70 4/03/2006 16:54 7/30/2007 09:59 ------------------------------ ----------- ---------------- ---------------- Number of users: 271 8517482.95 4/03/2006 12:56 7/29/2007 10:00 % condor_userprio -all -allusers
33
www.cs.wisc.edu/condor Dan, Condor Week 2008 How do groups compete? › Group using least share of its quota gets top priority in matchmaking.
34
www.cs.wisc.edu/condor Dan, Condor Week 2008 How do user’s within group compete? › Each group user has its own user priority › Fair share between group members determined by the usual user priority mechanism
35
www.cs.wisc.edu/condor Dan, Condor Week 2008 May Group Exceed its Quota? › Yes, but only if GROUP_AUTOREGROUP = True OR, if undefined GROUP_AUTOREGROUP_ = True
36
www.cs.wisc.edu/condor Dan, Condor Week 2008 When Exceeding Quota, How do Users Compete? › All non-group users plus group users trying to exceed their quota compete for remaining machines. › The user priority of the group user (e.g. “group_physics.dan”) is used to determine fair share. Can set default priority factor for all members of group: GROUP_PRIO_FACTOR_ = 10
37
www.cs.wisc.edu/condor Dan, Condor Week 2008 The End of the Story
38
www.cs.wisc.edu/condor Dan, Condor Week 2008 The Life of a Condor Job schedd (job queue) condor_submit startd (Job Executor) central manager (collector + negotiator) central manager 2 central manager 3 (collector + negotiator) flocking machine ClassAd job runs job ClassAd
39
www.cs.wisc.edu/condor Dan, Condor Week 2008 Extending the Reach › FLOCK_TO = requires bi-directional connectivity in Linux, can use GCB to connect private networks › Grid Universe: Globus, Condor-C condor_glidein JobRouter
40
www.cs.wisc.edu/condor Dan, Condor Week 2008 Trivia › What’s the difference? IsHighPrioUser = Owner == “dan” 1. RANK = IsHighPrioUser 2. RANK = $(IsHighPrioUser) › case 1 needs: STARTD_ATTRS = IsHighPrioUser
41
www.cs.wisc.edu/condor Dan, Condor Week 2008 Where to Find the Online How-to Collection 1. Go to http://www.cs.wisc.edu/condor/http://www.cs.wisc.edu/condor/ 2. Click on “Condor Admin How-to Recipes” Currently, that takes you here: http://nmi.cs.wisc.edu/node/1465 http://nmi.cs.wisc.edu/node/1465
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.