Matchmaker Policies: Users and Groups HTCondor Week, Madison 2016 Zach Miller Jaime Frey Center for High Throughput.

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison The Bologna.
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
Paging: Design Issues. Readings r Silbershatz et al: ,
Abdulrahman Idlbi COE, KFUPM Jan. 17, Past Schedulers: 1.2 & : circular queue with round-robin policy. Simple and minimal. Not focused on.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
HTCondor scheduling policy
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Priority and Provisioning Greg Thain HTCondorWeek 2015.
Usage Policy (UPL) Research for GriPhyN & iVDGL Catalin L. Dumitrescu, Michael Wilde, Ian Foster The University of Chicago.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
CPU Scheduling Chapter 6 Chapter 6.
Large, Fast, and Out of Control: Tuning Condor for Film Production Jason A. Stowe Software Engineer Lead - Condor CORE Feature Animation.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
No Idle Cores DYNAMIC LOAD BALANCING IN ATLAS & OTHER NEWS WILLIAM STRECKER-KELLOGG.
Tier-1 Batch System Report Andrew Lahiff, Alastair Dewhurst, John Kelly, Ian Collier 5 June 2013, HEP SYSMAN.
Peter Keller Computer Sciences Department University of Wisconsin-Madison Quill Tutorial Condor Week.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Chapter 7 Operating Systems. Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the.
Multi-core jobs at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly February 25 th 2014.
Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams Condor Administrator’s How-to.
Migration to 7.4, Group Quotas, and More William Strecker-Kellogg Brookhaven National Lab.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Derek Wright Computer Sciences Department University of Wisconsin-Madison New Ways to Fetch Work The new hook infrastructure in Condor.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
Greg Thain Computer Sciences Department University of Wisconsin-Madison Configuring Quill Condor Week.
Landing in the Right Nest: New Negotiation Features for Enterprise Environments Jason Stowe.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
How High Throughput was my cluster? Greg Thain Center for High Throughput Computing.
Condor Project Computer Sciences Department University of Wisconsin-Madison Using New Features in Condor 7.2.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
HTCondor Security Basics HTCondor Week, Madison 2016 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Lecture 5 Scheduling. Today CPSC Tyson Kendon Updates Assignment 1 Assignment 2 Concept Review Scheduling Processes Concepts Algorithms.
Five todos when moving an application to distributed HTC.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
CHTC Policy and Configuration
Experience on HTCondor batch system for HEP and other research fields at KISTI-GSDC Sang Un Ahn, Sangwook Bae, Amol Jaikar, Jin Kim, Byungyun Kong, Ilyeon.
HTCondor Security Basics
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Matchmaker Policies: Users and Groups HTCondor Week, Madison 2017
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
High Availability in HTCondor
CREAM-CE/HTCondor site
Monitoring HTCondor with Ganglia
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
Building Grids with Condor
Accounting in HTCondor
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
Negotiator Policy and Configuration
Accounting, Group Quotas, and User Priorities
HTCondor Security Basics HTCondor Week, Madison 2016
Basic Grid Projects – Condor (Part I)
CPU SCHEDULING.
Condor: Firewall Mirroring
Virtual Memory: Working Sets
Condor Administration in the Open Science Grid
GLOW A Campus Grid within OSG
Negotiator Policy and Configuration
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

Matchmaker Policies: Users and Groups HTCondor Week, Madison 2016 Zach Miller Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison

› So you have some resources… … how does HTCondor decide which job to run? › The admin needs to define a policy that controls the relative priorities › What defines a “good” or “fair” policy? HTCondor scheduling policy 2

› HTCondor does not share the same model of, for example, PBS, where jobs are placed into a first-in-first-out queue › It instead is based around a concept called “Fair Share”  Assumes users are competing for resources  Aims for long-term fairness First Things First 3

› Available compute resources are “The Pie” › Users, with their relative priorities, are each trying to get their “Pie Slice” › But it’s more complicated: Both users and machines can specify preferences. › Basic questions need to be answered, such as “do you ever want to preempt a running job for a new job if it’s a better match”? (For some definition of “better”) Spinning Pie 4

› First, the Matchmaker takes some jobs from each user and finds resources for them. › After all users have got their initial “Pie Slice”, if there are still more jobs and resources, we continue “spinning the pie” and handing out resources until everything is matched. Spinning Pie 5

› If two users have the same relative priority, then over time the pool will be divided equally among them. › Over time? › Yes! By default, HTCondor tracks usage and has a formula for determining priority based on both current demand and prior usage › However, prior usage “decays” over time Relative Priorities 6

› Example: (A pool of 100 cores) › User ‘A’ submits 100,000 jobs and 100 of them begin running, using the entire pool. › After 8 hours, user ‘B’ submits 100,000 jobs › What happens? Pseudo-Example 7

› Example: (A pool of 100 cores) › User ‘A’ submits 100,000 jobs and 100 of them begin running, using the entire pool. › After 8 hours, user ‘B’ submits 100,000 jobs › The scheduler will now allocate MORE than 50 cores to user ‘B’ because user ‘A’ has accumulated a lot of recent usage › Over time, each will end up with 50 cores. Pseudo-Example 8

Overview of Condor Architecture 9 Central Manager Greg Job1 Greg Job2 Greg Job3 Ann Job1 Ann Job2 Ann Job3 Greg Job4 Greg Job5 Greg Job6 Ann Job7 Ann Job8 Joe Job1 Joe Job2 Joe Job3 Schedd ASchedd B worker Usage History

› Negotiator computes, stores the user prio › View with condor_userprio tool › Inversely related to machines allocated (lower number is better priority)  A user with priority of 10 will be able to claim twice as many machines as a user with priority 20 Negotiator metric: User Priority 10

› Bob in schedd1 same as Bob in schedd2? › If have same UID_DOMAIN, they are. › We’ll talk later about other user definitions. › Map files can define the local user name What’s a user? 11

› (Effective) User Priority is determined by multiplying two components › Real Priority * Priority Factor User Priority 12

› Based on actual usage › Starts at 0.5 › Approaches actual number of machines used over time  Configuration setting PRIORITY_HALFLIFE  If PRIORITY_HALFLIFE = +Inf, no history  Default one day (in seconds) › Asymptotically grows/shrinks to current usage Real Priority 13

› Assigned by administrator  Set/viewed with condor_userprio  Persistently stored in CM › Defaults to 100 ( DEFAULT_PRIO_FACTOR ) › Allows admins to give prio to sets of users, while still having fair share within a group › “Nice user”s have Prio Factors of 1,000,000 Priority Factor 14

› Command usage: condor_userprio Effective Priority User Name Priority Factor In Use (wghted-hrs) Last Usage : : : : : :42 condor_userprio 15

› So far everything we saw was BETWEEN different users › Individual users can also control the priorities and preferences WITHIN their own jobs Different Type of Priority 16

› Set in submit file with JobPriority = 7 › … or dynamically with condor_prio cmd › Users can set priority of their own jobs › Integers, larger numbers are better priority › Only impacts order between jobs for a single user on a single schedd › A tool for users to sort their own jobs Schedd Policy: Job Priority 17

› Set in submit file with RANK = Memory › Not as powerful as you may think:  Remember steady state condition – there may not be that many resources to sort at any given time when pool is fully utilized. Schedd Policy: Job Rank 18

› Manage priorities across groups of users and jobs › Can guarantee maximum numbers of computers for groups (quotas) › Supports hierarchies › Anyone can join any group (well…) Accounting Groups (2 kinds) 19

› In submit file  Accounting_Group = “group1” › Treats all users as the same for priority › Accounting groups not pre-defined › No verification – HTCondor trusts the job › condor_userprio replaces user with group Accounting Groups as Alias 20

condor_userprio –setfactor 10 condor_userprio –setfactor 20 Note that you must get UID_DOMAIN correct Gives group1 members twice as many resources as group2 Prio factors with groups 21

› Must be predefined in cm’s config file: GROUP_NAMES = a, b, c GROUP_QUOTA_a = 10 GROUP_QUOTA_b = 20 › And in submit file: Accounting_Group = a Accounting_User = gthain Accounting Groups w/ Quota 22

› “a” limited to 10 › “b” to 20 › Even if idle machines › What is the unit?  Slot weight. › With fair share for users within group Group Quotas 23

› Static versus Dynamic: Number of nodes versus proportion of the nodes › Dynamic scales to size of pool. › Static only “scales” if you oversubscribe your pool – HTCondor shrinks the allocations proportionally so they fit  This can be disabled in the configuration Hierarchical Group Quotas 24

25 Hierarchical Group Quotas physics CompSci string theory particle physics architecture DB CMS CDF ATLAS 200

26 Hierarchical Group Quotas physics CompSci string theory particle physics architecture DB CMS CDF ATLAS 0.4

27 Hierarchical Group Quotas physics string theory particle physics CMS CDF ATLAS 200 GROUP_QUOTA_physics = 700 GROUP_QUOTA_physics.string_theory = 100 GROUP_QUOTA_physics.particle_physics = 600 GROUP_QUOTA_physics.particle_physics.CMS = 200 GROUP_QUOTA_physics.particle_physics.ATLAS = 200 GROUP_QUOTA_physics.particle_physics.CDF = 100 group.sub- group.sub-sub- group…

28 Hierarchical Group Quotas physics string theory particle physics CMS CDF ATLAS 200 Look closely at the numbers in red 600 – ( ) = 100 There are extra resources there… now what?

29 Hierarchical Group Quotas physics string theory particle physics CMS CDF ATLAS 200 There are 100 extra resources there Who gets to use them? In this case, only “particle physics” (not the children… quotas are still strictly enforced there)

› Determines who can share extra resources › Allows groups to go over quota if there are idle machines › Creates the true hierarchy › Defined per group, or subgroup, or sub-sub… GROUP_ACCEPT_SURPLUS 30

31 Hierarchical Group Quotas physics string theory particle physics CMS CDF ATLAS Numbers in RED are shared across the three children “Particle physics” is still capped at 600 even if “string theory” is completely idle CMS/ATLAS/CDF can now go over their quotas if the other groups have no jobs

32 Hierarchical Group Quotas physics string theory particle physics CMS CDF ATLAS GROUP_ACCEPT_SURPLUS_ physics.particle_physics.CMS = TRUE GROUP_ACCEPT_SURPLUS_ physics.particle_physics.ATLAS = TRUE GROUP_ACCEPT_SURPLUS_ physics.particle_physics.CDF = TRUE GROUP_ACCEPT_SURPLUS_ physics.particle_physics = TRUE

› Also allows groups to go over quota if idle machines. › “Last chance” round, with every submitter for themselves. GROUP_AUTOREGROUP 33

› We’ll switch gears a little bit to talk about other pool-wide mechanisms that affect matchmaking… › Welcome Jaime! Enough with groups… 34

› Match between schedd and startd can be reused to run many jobs › May need to create opportunities to rebalance how machines are allocated  New user  Jobs with special requirements (GPUs, high memory) Rebalancing the Pool 35

› Have startds return frequently to negotiator for rematching  CLAIM_WORKLIFE  Draining  More load on system, may not be necessary › Have negotiator proactively rematch a machine  Preempt running job to replace with better job  MaxJobRetirementTime can minimize killing of jobs How to Rematch 36

› Fundamental tension between  Throughput vs. Fairness › Preemption is required to have fairness › Need to think hard about runtimes, fairness and preemption › Negotiator implements preemption › (Workers implement eviction: different) A note about Preemption 37

› Startd Rank  Startd prefers new job New job has larger startd Rank value › User Priority  New job’s user has higher priority (deserves increased share of the pool) New job has lower user prio value › No preemption by default  Must opt-in Two Types of Preemption 38

› Gets all the slot ads › Updates user prio info for all users › Based on user prio, computes submitter limit for each user › For each user, finds the schedd  For each job (up to submitter limit) Finds all matching machines for job Sorts the machines Gives the job the best sorted machine Negotiation Cycle 39

› Single sort on a five-value key  NEGOTIATOR_PRE_JOB_RANK  Job Rank  NEGOTIATOR_POST_JOB_RANK  No preemption > Startd Rank preemption > User priority preemption  PREEMPTION_RANK Sorting Slots: Sort Levels 40

› Evaluated as if in the machine ad › MY.Foo : Foo in machine ad › TARGET.Foo : Foo in job ad › Foo : check machine ad, then job ad for Foo › Use MY or TARGET if attribute could appear in either ad Negotiator Expression Conventions 41

› Negotiator adds attributes about pool usage of job owners › Info about job being matched  SubmitterUserPrio  SubmitterUserResourcesInUse › Info about running job that would be preempted  RemoteUserPrio  RemoteUserResourcesInUse Accounting Attributes 42

› More attributes when using groups  SubmitterNegotiatingGroup  SubmitterAutoregroup  SubmitterGroup  SubmitterGroupResourcesInUse  SubmitterGroupQuota  RemoteGroup  RemoteGroupResourcesInUse  RemoteGroupQuota Group Accounting Attributes 43

› ( * My.Rank) + ( * (RemoteOwner=?=UNDEFINED)) - ( * Cpus) - Memory › Default › Prefer machines that like this job more › Prefer idle machines › Prefer machines with fewer CPUs, less memory NEGOTIATOR_PRE_JOB_RANK 44

› KFlops – SlotID › Prefer faster machines › Breadth-first filling of statically-partitioned machines NEGOTIATOR_POST_JOB_RANK 45

If Matched machine claimed, extra checks required › PREEMPTION_REQUIREMENTS and PREEMPTION_RANK › Evaluated when condor_negotiator considers replacing a lower priority job with a higher priority job › Completely unrelated to the PREEMPT expression (which should be called evict) 46

› If False, will not preempt for user priority › Only replace jobs running for at least one hour and 20% lower priority StateTimer = \ (CurrentTime – EnteredCurrentState) HOUR = (60*60) PREEMPTION_REQUIREMENTS = \ $(StateTimer) > (1 * $(HOUR)) \ && RemoteUserPrio > SubmitterUserPrio * 1.2 NOTE: classad debug() function v. handy PREEMPTION_REQUIREMENTS 47

› Can restrict preemption to restoring quotas PREEMPTION_REQUIREMENTS = ( SubmitterGroupResourcesInUse RemoteGroupQuota ) Preemption with HQG 48

› Of all claimed machines where PREEMPTION_REQUIREMENTS is true, picks which one machine to reclaim › Strongly prefer preempting jobs with a large (bad) priority and less runtime PREEMPTION_RANK = \ (RemoteUserPrio * ) \ – TotalJobRuntime PREEMPTION_RANK 49

› NEGOTIATOR_CONSIDER_PREEMPTION = False › Negotiator completely ignores claimed startds when matching › Makes matching faster › Startds can still evict jobs, then be rematched No-Preemption Optimization 50

› Manage pool-wide resources  E.g. software licenses, DB connections › In central manager config › FOO_LIMIT = 10 › BAR_LIMIT = 15 › In submit file › concurrency_limits = foo,bar:2 Concurrency Limits 51

› Many ways to schedule Summary 52