Download presentation
Presentation is loading. Please wait.
Published byGrant Flowers Modified over 8 years ago
1
Landing in the Right Nest: New Negotiation Features for Enterprise Environments Jason Stowe
2
New Features for Negotiation
3
Experience in Enterprise Environments
4
What is an Enterprise Environment?
5
Any Organization Using Condor with
6
Demanding Users
8
Organization = Groups of Demanding Users
9
Purchased Computer Capacity
10
Guaranteed Minimum Capacity
11
Need As Many as Possible
12
As Soon as they submit
13
Vanilla/Java Universe
14
Avoid Preemption
15
How do we ensure Resources land in the right Group’s Nest?
16
A valid definition of Enterprise Condor Users? Enterprise Condor Users
17
I started off as a Demanding User
18
Follow up to earlier work
19
Condor Week 2005
20
Condor for Movies: 75+ Million Jobs 1000+ CPUs (Linux/OSX) 70+ TB storage
21
(Project that added AccountingGroups)
22
Condor Week 2006
23
Web-based Management Tools, Consulting, and 24/7 Support
24
A Conversation with Miron
25
Bob Nordlund’s idea for Condor += Hooks
26
Configuration with Pipes CONDOR_CONFIG = cat /opt/condor/condor_config | (Condor 6.8)
27
Demanding Condor Uses for Banks/Insurance Companies => This year, new features
28
Negotiation Policies to Manage Number of Resources
29
For Groups and Users
30
What are the Requirements?
31
-Guaranteed Minimum Quota -Fast Claiming of Quota -Avoid Unnecessary Preemption
32
Three Common Ways
33
“Fair share” User Priority PREEMPTION_REQUIREMENTS
34
Machine RANK
35
AccountingGroups GROUP_QUOTA
36
Generally these are a progression
37
Story of a Pool
39
Fair-Share, User Priority
40
It Works! More Users…
42
condor_userprio –setfactor A 2 condor_userprio –setfactor B 2
43
PREEMPTION_REQUIREMENTS = RemoteUserPrio > SubmittorPrio
44
Works Well in Most cases
45
Suppose A has all 100 machines, and B submits 100 jobs
47
User Priorities Cached at Beginning of Negotiation And not updated…
48
PREEMPTION_REQUIREMENTS = RemoteUserPrio > SubmittorPrio
50
Standard Universe = No Problem (Preemption doesn’t lose work)
51
Problem: Vanilla or Java Universe (Work is lost!)
52
Dampen these with NEGOTIATOR_MAX_TIME_PER_SUBMITTER NEGOTIATOR_MAX_TIME_PER_PIESPIN
53
Slows matching rate, can lead to starvation
54
Time For RANK
55
RANK = Owner =?= “A” on 50 Machines RANK = Owner =?= “B” on 50 Machines
56
Users get their “quota”
57
Tied to particular machines
59
Problem: Group A submits 100 jobs on Empty Pool
61
50 jobs Finish
63
Group B submits 100 jobs, Empty Machines get jobs A Jobs on B Machines are preempted
65
B Jobs on A Machines are preempted.
69
Skip Preemption, Use Empty Machines?
70
Accounting Groups, GROUP_QUOTA
71
#New Machines = 200 GROUP_QUOTA_A = 50 GROUP_QUOTA_B = 50 GROUP_QUOTA_C = 50 GROUP_QUOTA_D = 50 GROUP_AUTOREGROUP = True
73
A, B Have 100 machines each, how does C get resources?
74
PREEMPTION_REQUIREMENTS Still has cache/preemption issues
75
We Need access to Up to Date Usage/Quota information PREEMPTION_REQUIREMENTS
76
A Conversation with Todd
77
SubmitterUserPrio SubmitterUserResourcesInUse (RemoteUser as well)
78
SubmitterGroupQuota SubmitterGroupResourcesInUse (RemoteGroup as well)
79
With Great Power Comes Great Responsibility
80
IMPORTANT: Turn-off Caching (may slow down) PREEMPTION_REQUIREMENTS_STABLE= False PREEMPTION_RANK_STABLE = False
81
PREEMPTION_REQUIREMENTS = (SubmitterGroupResourcesInUse RemoteGroupQuota) PREEMPTION_REQUIREMENTS_STABLE= False RANK = 0
83
Now we have everything needed!
84
Demanding Groups of Users
85
Getting Purchased Compute Capacity (Quota, not tied to machine)
86
Getting Guaranteed Minimum Capacity (GROUP_QUOTA)
87
Getting As Many as Possible (Auto-Regroup)
88
Getting As Soon as they submit (One Negotiation Cycle typically)
89
Avoids Preemption
90
condor_status?
91
It Works! (patched 6.8 and 6.9+) Code & Condor Community Process
92
Where do we go from here? What did we learn?
93
Wisconsin is Working on 6.9 Negotiation/Scheduling more Efficient
94
In the Future Allow us to Specify what we Account For per VM/Slot (KFLOPS) ?
95
That’s just me…
96
Come to tonight’s Reception Participate in the Community
97
Talk with Condor Team. Talk with other users.
98
Help the community continue to work well for everyone.
99
Thank you. Questions? http://www.cyclecomputing.com jstowe @ cyclecomputing.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.