Download presentation
Presentation is loading. Please wait.
Published byJeffry Wilcox Modified over 9 years ago
1
Jaime Frey Computer Sciences Department University of Wisconsin-Madison jfrey@cs.wisc.edu http://www.cs.wisc.edu/condor Condor-G: A Case in Distributed Job Delegation
2
www.cs.wisc.edu/condor Job Delegation › Transfer of responsibility to schedule and execute a job › Multiple delegations can form a chain
3
www.cs.wisc.edu/condor Job Delegation in Condor-G Today Condor-G Globus GRAM Batch System Front-end Execute Machine
4
www.cs.wisc.edu/condor Expanding the Model › What can we do with new forms of job delegation? › Some ideas Mirroring Load-balancing Glide-in schedd Multi-hop grid scheduling
5
www.cs.wisc.edu/condor Mirroring › What it does Jobs mirrored on two Condor-Gs If primary Condor-G crashes, secondary one starts running jobs On recovery, primary Condor-G gets job status from secondary one › Removes Condor-G submit point as single point of failure
6
www.cs.wisc.edu/condor Mirroring Example Condor-G 1 Matchmaker Execute Machine Condor-G 2
7
www.cs.wisc.edu/condor Mirroring Example Condor-G 1 Matchmaker Execute Machine Condor-G 2
8
www.cs.wisc.edu/condor Load-Balancing › What it does Front-end Condor-G distributes all jobs among several back-end Condor-Gs Front-end Condor-G keeps updated job status › Improves scalability › Maintains single submit point for users
9
www.cs.wisc.edu/condor Load-Balancing Example Condor-G Back-end 1 Condor-G Front-end Condor-G Back-end 3 Condor-G Back-end 2
10
www.cs.wisc.edu/condor Glide-In Schedd › What it does Drop a Condor-G onto the front-end machine of a cluster Delegate jobs to the cluster through the glide-in schedd › Apply cluster-specific policies to jobs
11
www.cs.wisc.edu/condor Glide-In Schedd Example Condor-G Glide-In Schedd Batch System
12
www.cs.wisc.edu/condor Multi-Hop Grid Scheduling › Match a job to a Virtual Organization (VO), then to a resource within that VO › Easier to schedule jobs across multiple VOs and grids
13
www.cs.wisc.edu/condor Multi-Hop Grid Scheduling Example Experiment Condor-G Experiment Resource Broker VO Condor-G VO Resource Broker Globus GRAM Batch Scheduler
14
www.cs.wisc.edu/condor Endless Possibilities › These new models can be combined with each other or with other new models › Resulting system can be arbitrarily sophisticated
15
www.cs.wisc.edu/condor Job Delegation Challenges › New complexity introduces new issues and exacerbates existing ones › A few… Transparency Representation Scheduling Control Active Job Control Revocation Error Handling and Debugging
16
www.cs.wisc.edu/condor Transparency › Full information about job should be available to user Information from full delegation path No manual tracing across multiple machines › Users need to know what’s happening with their jobs
17
www.cs.wisc.edu/condor Representation › Job state is a vector › How best to show this to user Summary Current delegation endpoint Job state at endpoint Full information available if desired Series of nested ClassAds?
18
www.cs.wisc.edu/condor Scheduling Control › Avoid loops in delegation path › Give user control of scheduling Allow limiting of delegation path length? Allow user to specify part or all of delegation path
19
www.cs.wisc.edu/condor Active Job Control › User may request certain actions hold, suspend, vacate, checkpoint › Actions cannot be completed synchronously for user Must forward along delegation path User checks completion later
20
www.cs.wisc.edu/condor Active Job Control (cont) › Endpoint systems may not support actions If possible, execute them at furthest point that does support them › Allow user to apply action in middle of delegation path
21
www.cs.wisc.edu/condor Revocation › Leases Lease must be renewed periodically for delegation to remain valid Allows revocation during long-term failures › What are good values for lease lifetime and update interval?
22
www.cs.wisc.edu/condor Error Handling and Debugging › Many more places for things to go horribly wrong › Need clear, simple error semantics › Logs, logs, logs Have them everywhere
23
www.cs.wisc.edu/condor Current Status › Done Mirroring › In Progress Condor-G -> Condor-G delegation User must specify hops Glide-in schedd Set up by hand
24
www.cs.wisc.edu/condor Thank You! › Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.