Download presentation
Presentation is loading. Please wait.
1
Condor and the Grid D. Thain, T. Tannenbaum, M. Livny Christopher M. Moretti 23 February 2007
2
2 Problem & Opportunity Users need CPUs Scientific computing Mathematical modeling Data mining Many CPU cycles are unused Personal workstations General use laboratories Research machines
3
3 Solution: Condor “A hunter of idle workstations” Keeps track of resources needed and available Determines and assigns matches Monitors progress Cleans up and reports results
4
4 Architecture Three principals: Agent: machine needing resources Matchmaker Resource: machine lending resources Three phases: Advertising Matching/Claiming Deploying/Executing
5
5 Advertising MatchMaker AgentLender I need X I have Y idle.cse.nd.eduneedy.cse.nd.edu Does Y satisfy X?
6
6 Matching & Claiming MatchMaker AgentLender Use idle.cse.nd.edu Listen for needy.cse.nd.edu idle.cse.nd.eduneedy.cse.nd.edu Are you still available? Yes.
7
7 Deploying / Executing AgentLender idle.cse.nd.eduneedy.cse.nd.edu Shadow Fork! Run job J. J I need file /tmp/foo. Sandbox Split Execution
8
8 Matching How are matches determined? Policy ClassAds Why independently claim a match? What if the Matchmaker dies?
9
9 ClassAds MyType=“Job” TargetType=“Machine” Requirements= (( other.Arch==“INTEL” && other.OpSys==“LINUX” && KeyboardIdle>600 )) Cmd=“/tmp/a.out” Owner=“cmoretti” MyType=“Machine” TargetType=“Job” Machine= “dustpuppy.cse.nd.edu” Requirements= (( KeyboardIdle>600 )) Arch=“INTEL” OpSys=“LINUX”
10
10 Flocking Using another pool’s resources Utilize more total resources Find resources that match needs Two methods Gateway flocking Direct flocking
11
11 Gateway Flocking Each pool has a known “gateway” Gateways negotiate sharing Advertise resources and needs Transmit requests to local matchmaker Pool-level granularity Accounting Policy Now obsolete
12
12 Gateway Flocking Gateway MM R R R R R R R A 2 3 5 4 5 1
13
13 Direct Flocking Agents report to other matchmakers No gateways Equivalent to being in multiple pools? Now the preferred (only) method
14
14 Gateway Flocking MM R R R R R R R A 2 3 1
15
15 Flocking Comparison + Transparency + Fosters organization-level sharing - Poor accounting - Complicated + No gateways + Individual relationships supported - Non-transparent - Fewer organization-level agreements Gateway FlockingDirect Flocking
16
16 Things Aren’t Perfect What happens if (when) … Matchmaker goes down Network or Agent fails during deploy Resource or App fails during compute Non-dedicated machines. How do we keep owners happy? What happens when an owner reclaims a resource?
17
17 Total Consumption in 2006 CPU-Hours Harnessed by Condor(48%)1161176 CPU-Hours Totally Unused(39%)934277 CPU-Hours Consumed by Owner at Keyboard(11%)281003 CPU-Hours Total(100%)2376456 http://www.cse.nd.edu/~ccl/operations/condor/2005/users.html Condor at Notre Dame Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006 “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain
18
18 Current Donors Feb 2007 OwnerNodesCPUsStorage (TB) CRC/OIT92 3.7 CSE7312411.7 Prof. Thain59915.5 Prof. Flynn18350.65 Prof. Striegel 10200.65 Misc717 Total25937920.2 TB Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006 “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain
19
19 CPU History Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006 “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain
20
20 Recap Condor facilitates distributed computation on dedicated or scavenged CPUs arranged by a matchmaker using ClassAds. Split Execution is necessary to fit the job’s needs to the environment. An agent can advertise to multiple matchmakers to examine more potential matches.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.