Download presentation
Presentation is loading. Please wait.
1
Accounting in HTCondor
Greg Thain INFN Workshop 2016
2
HTCondor Architecture
3
Overview of Condor Architecture
Schedd A Central Manager worker Greg Job1 Greg Job2 Greg Job3 Ann Job1 Ann Job2 Ann Job3 worker worker Usage History Schedd B worker Greg Job4 Greg Job5 Greg Job6 Ann Job7 Ann Job8 Joe Job1 Joe Job2 Joe Job3 worker
4
2 Steps to Scheduling
5
Step 1: Schedd A Central Manager Schedd B
worker Greg Job1 Greg Job2 Greg Job3 Ann Job1 Ann Job2 Ann Job3 worker Usage History worker Schedd B worker Greg Job4 Greg Job5 Greg Job6 Ann Job7 Ann Job8 Joe Job1 Joe Job2 Joe Job3 1: The CM assigns SLOTS to USERS based on historical fair share Keeps tracks of usage persistently Informs schedds slots for their users worker This is called the negotiation cycle
6
Step 2: Schedd A Central Manager Schedd B
worker Greg Job1 Greg Job2 Greg Job3 Ann Job1 Ann Job2 Ann Job3 worker Usage History worker Schedd B worker Greg Job4 Greg Job5 Greg Job6 Ann Job7 Ann Job8 Joe Job1 Joe Job2 Joe Job3 1: Schedds assign SLOTS to JOBS based on job prio, fit will REUSE slots for > 1 job worker Many schedds in pools.
7
Consequences All accounting happens in CM Both monitoring and control
Accounting data is persistent Accounting is rolled-up, not a log CM knows NOTHING about jobs!
8
What’s a user? Bob in schedd1 same as Bob in schedd2?
If have same UID_DOMAIN, the are. Prevents cheating by adding shedds Map files can define the local user name
9
condor_userprio Command usage to view current priorities:
condor_userprio –most Effective Priority User Name Priority Factor In Use (wghted-hrs) Last Usage :46 :05 <now> :09 :37 :25 :42 Tool to view/ change user prio
10
Metric: Effective Priority
Negotiator computes, stores the user prio Inversely related to machines allocated (lower number is better priority) A user with priority of 10 will be able to claim twice as many machines as a user with priority 20
11
Effective User Priority
(Effective) User Priority is determined by multiplying two components Real Priority * Priority Factor
12
Real Priority Based on actual usage, starts at .5
Approaches actual number of machines used over time Configuration setting PRIORITY_HALFLIFE If PRIORITY_HALFLIFE = +Inf, no history Default one day (in seconds) Asymptotically grows/shrinks to current usage
13
Priority Factor Assigned by administrator
Set/viewed with condor_userprio Persistently stored in CM Defaults to 1000 (DEFAULT_PRIO_FACTOR) Allows admins to give prio to sets of users, while still having fair share within a group “Nice user”s have Prio Factors of 1,000,000
14
Condor principle #2
15
Condor principle #2 Condor provides access to operational info
You need to make it pretty…
16
Condor_userprio -l For machine-parseable formats
Parse yourself, and create graphs, reports.
17
condor_userprio -l Name = ResourcesUsed = 11 WeightedResourcesUsed = 11.0 LastHeardFrom = LastUpdate = Priority = WeightedAccumulatedUsage = UpdateSequenceNumber = 0 PriorityFactor = MyType = "Accounting" IsAccountingGroup = false AccumulatedUsage = BeginUsageTime =
18
Overview of Condor Architecture
Schedd A Central Manager worker Greg Job1 Greg Job2 Greg Job3 Ann Job1 Ann Job2 Ann Job3 worker worker Usage History Schedd B worker Greg Job4 Greg Job5 Greg Job6 Ann Job7 Ann Job8 Joe Job1 Joe Job2 Joe Job3 Schedds also keep logs rolled, not snapshots Have per-job data Gratia uses these worker
19
condor_history ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD gthain 10/25 02: :00:51 C 10/25 02:10 /scratch/gthai gthain 10/25 02: :00:50 C 10/25 02:10 /scratch/gthai gthain 10/25 02: :00:50 C 10/25 02:10 /scratch/gthai gthain 10/25 02: :00:49 C 10/25 02:10 /scratch/gthai
20
Worker nodes also have history
condor_history –f startd_history
21
Summary HTCondor uses two-steps for scheduling All accounting is handled in central manager Monitoring data also available in sched and startd
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.