Download presentation
Presentation is loading. Please wait.
1
Building Grids with Condor
Alan De Smet Computer Sciences Department University of Wisconsin-Madison
2
What are Grids? Grids allow access to distributed, remote compute cycles Send input and executable over Run the executable Pull output back Ask 10 people, get 10 different answers. Here’s mine...
3
Grids typically... have multiple sites with multiple administrative domains Different people, different rules and policies use the public internet Unreliable, insecure hand jobs off to a local batch system PBS, LSF, Condor, etc
4
Archetypical Grid Remote Site Head Node Submit Node Queue Internet
Job 1 Job 2 … Execute Node Submitter: a human being with some work to do Submit node: machine on which submitter submits work Submit queue: tracks jobs for submitter Remote site: non-local resources where your job can potentially run. Head node: machine that acts as front end to a remote site. Accepts jobs on remote site's behalf. Execute node: Where job actually runs Firewall Batch System
5
Grids with Condor Distributed Condor pools Condor Flocking
Condor to Globus: Condor-G Condor to Condor: Condor-C
6
Distributed pools with Condor
Simply have a single large Condor pool spanning multiple sites Variety of work to better support public internet Encryption and authentication Disconnected starter-shadow TCP instead of UDP updates
7
Distributed pools with Condor
Condor Pool Submit Node Internet Queue Job 1 Job 2 … Execute Node You can certainly use Condor this way. Note the lack of a head node or distinct “remote site”
8
Distributed pools with Condor: Advantages
Unified system (all Condor) Strong matchmaking capabilities Directly matching jobs to execute nodes Everything else great about Condor (See rest of talks for details)
9
Distributed pools with Condor: Disadvantages
Requires coordination between sites Weak with firewalls and address translation (NATs) Solutions, like Generic Connection Brokering (GCB) exist Centralized point of failure A central manager network outage will stop jobs from starting These disadvantages are specific to running Condor on a large pool distributed across multiple sites. These don’t apply to Condor as a local batch system.
10
Condor Flocking A Condor submit node (condor_schedd) works with multiple Condor pools
11
Condor Flocking Local Condor Pool Remote Condor Pool Submit Node
Internet Queue Job 1 Job 2 … Execute Node Batch System
12
Condor Flocking Advantages
Unified system (all Condor) Strong matchmaking capabilities Directly matching jobs to execute nodes Slightly less coordination between sites required Sites can have different policies
13
Condor Flocking Disadvantages
As a large distributed pool Weak with firewalls and address translation (NATs) Network connection intensive More complex than a single pool
14
Globus Globus provides a remote front end to multiple batch systems
In addition to other functionality
15
Globus Remote Site Head Node Submit Node Internet Execute Node
Note the lack of a queue Firewall Batch System
16
Globus Advantages Standard Can speak to a variety of batch systems
Widely used Variety of tools built on top of Globus are available Can speak to a variety of batch systems Condor, PBS, LSF, etc Each site can run own system Note, I’m only discussing Globus as a way to distribute jobs at this point; I’m ignoring the other functionality Globus works on
17
Globus Disadvantages Minimal: designed to be a lower layer
Simple command line tools No job tracking No matchmaking No recovery from errors Must configure Globus in addition Strictly remote side
18
Condor-G Condor can provide an interface and job queue for Globus: Condor-G universe = grid grid_type = gt2 (or gt3, or gt4)
19
Condor-G Remote Site Head Node Submit Node Queue Internet Execute Node
(Globus) Submit Node Internet Queue Job 1 Job 2 … Execute Node Condor-G acts as the queue Firewall Batch System
20
Condor-G Advantages Interoperable with and builds on strengths of Globus Provides persistent submit queue Attempts to automatically recover from errors
21
Condor-G Disadvantages
Must configure Condor-G in addition Strictly submitter side Remote side doesn’t need to know
22
Condor-G Status and News
Globus Toolkit 2 is stable Globus Toolkit 3 is supported But we think most people are moving to… Globus Toolkit 4 in progress GT4 beta works now in Condor 6.7.6 Condor will officially support soon after official GT4 release. GT4 is tentatively scheduled for late March or April.
23
Condor-C Condor handing jobs off to Condor
universe = grid grid_type = condor Once handed off, behaves like a normal Condor job
24
Condor-C Remote Site Head Node Submit Node Queue Internet Execute Node
Job 1 Job 2 … Execute Node Firewall Batch System
25
Condor-C Advantages Unified system (All Condor)
Relatively easy to configure if you’re already using Condor Can optionally speak to a variety of batch systems Each site can run own system: PBS, LSF, etc
26
Condor-C is Flexible General way to redistribute Condor work between schedds Overloaded schedd? Fan the work out schedd startd schedd startd
27
Condor-C Disadvantages
Work in progress Not yet ready for multi-user environments Expected for Condor 6.8.0 No strong security yet Speaking to other batch systems very new, not yet distributed
28
Condor-C Available for evaluation in Condor 6.7
First stable release in Condor 6.8
29
The End? More on Wednesday Demos 9:00AM to Noon
At the Computer Science Building Demos :00AM to Noon Condor-C – Alan De Smet – room 4247 Condor-G – Jaime Frey – room 4254 Birds of a Feather discussion 1:00PM to 2:30PM – room 4331 Want gory technical details? Want to discuss how you might use one of these systems? Map to Computer Science in your packets
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.