Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building Grids with Condor

Similar presentations


Presentation on theme: "Building Grids with Condor"— Presentation transcript:

1 Building Grids with Condor
Alan De Smet Computer Sciences Department University of Wisconsin-Madison

2 What are Grids? Grids allow access to distributed, remote compute cycles Send input and executable over Run the executable Pull output back Ask 10 people, get 10 different answers. Here’s mine...

3 Grids typically... have multiple sites with multiple administrative domains Different people, different rules and policies use the public internet Unreliable, insecure hand jobs off to a local batch system PBS, LSF, Condor, etc

4 Archetypical Grid Remote Site Head Node Submit Node Queue Internet
Job 1 Job 2 Execute Node Submitter: a human being with some work to do Submit node: machine on which submitter submits work Submit queue: tracks jobs for submitter Remote site: non-local resources where your job can potentially run. Head node: machine that acts as front end to a remote site. Accepts jobs on remote site's behalf. Execute node: Where job actually runs Firewall Batch System

5 Grids with Condor Distributed Condor pools Condor Flocking
Condor to Globus: Condor-G Condor to Condor: Condor-C

6 Distributed pools with Condor
Simply have a single large Condor pool spanning multiple sites Variety of work to better support public internet Encryption and authentication Disconnected starter-shadow TCP instead of UDP updates

7 Distributed pools with Condor
Condor Pool Submit Node Internet Queue Job 1 Job 2 Execute Node You can certainly use Condor this way. Note the lack of a head node or distinct “remote site”

8 Distributed pools with Condor: Advantages
Unified system (all Condor) Strong matchmaking capabilities Directly matching jobs to execute nodes Everything else great about Condor (See rest of talks for details)

9 Distributed pools with Condor: Disadvantages
Requires coordination between sites Weak with firewalls and address translation (NATs) Solutions, like Generic Connection Brokering (GCB) exist Centralized point of failure A central manager network outage will stop jobs from starting These disadvantages are specific to running Condor on a large pool distributed across multiple sites. These don’t apply to Condor as a local batch system.

10 Condor Flocking A Condor submit node (condor_schedd) works with multiple Condor pools

11 Condor Flocking Local Condor Pool Remote Condor Pool Submit Node
Internet Queue Job 1 Job 2 Execute Node Batch System

12 Condor Flocking Advantages
Unified system (all Condor) Strong matchmaking capabilities Directly matching jobs to execute nodes Slightly less coordination between sites required Sites can have different policies

13 Condor Flocking Disadvantages
As a large distributed pool Weak with firewalls and address translation (NATs) Network connection intensive More complex than a single pool

14 Globus Globus provides a remote front end to multiple batch systems
In addition to other functionality

15 Globus Remote Site Head Node Submit Node Internet Execute Node
Note the lack of a queue Firewall Batch System

16 Globus Advantages Standard Can speak to a variety of batch systems
Widely used Variety of tools built on top of Globus are available Can speak to a variety of batch systems Condor, PBS, LSF, etc Each site can run own system Note, I’m only discussing Globus as a way to distribute jobs at this point; I’m ignoring the other functionality Globus works on

17 Globus Disadvantages Minimal: designed to be a lower layer
Simple command line tools No job tracking No matchmaking No recovery from errors Must configure Globus in addition Strictly remote side

18 Condor-G Condor can provide an interface and job queue for Globus: Condor-G universe = grid grid_type = gt2 (or gt3, or gt4)

19 Condor-G Remote Site Head Node Submit Node Queue Internet Execute Node
(Globus) Submit Node Internet Queue Job 1 Job 2 Execute Node Condor-G acts as the queue Firewall Batch System

20 Condor-G Advantages Interoperable with and builds on strengths of Globus Provides persistent submit queue Attempts to automatically recover from errors

21 Condor-G Disadvantages
Must configure Condor-G in addition Strictly submitter side Remote side doesn’t need to know

22 Condor-G Status and News
Globus Toolkit 2 is stable Globus Toolkit 3 is supported But we think most people are moving to… Globus Toolkit 4 in progress GT4 beta works now in Condor 6.7.6 Condor will officially support soon after official GT4 release. GT4 is tentatively scheduled for late March or April.

23 Condor-C Condor handing jobs off to Condor
universe = grid grid_type = condor Once handed off, behaves like a normal Condor job

24 Condor-C Remote Site Head Node Submit Node Queue Internet Execute Node
Job 1 Job 2 Execute Node Firewall Batch System

25 Condor-C Advantages Unified system (All Condor)
Relatively easy to configure if you’re already using Condor Can optionally speak to a variety of batch systems Each site can run own system: PBS, LSF, etc

26 Condor-C is Flexible General way to redistribute Condor work between schedds Overloaded schedd? Fan the work out schedd startd schedd startd

27 Condor-C Disadvantages
Work in progress Not yet ready for multi-user environments Expected for Condor 6.8.0 No strong security yet Speaking to other batch systems very new, not yet distributed

28 Condor-C Available for evaluation in Condor 6.7
First stable release in Condor 6.8

29 The End? More on Wednesday Demos 9:00AM to Noon
At the Computer Science Building Demos :00AM to Noon Condor-C – Alan De Smet – room 4247 Condor-G – Jaime Frey – room 4254 Birds of a Feather discussion 1:00PM to 2:30PM – room 4331 Want gory technical details? Want to discuss how you might use one of these systems? Map to Computer Science in your packets


Download ppt "Building Grids with Condor"

Similar presentations


Ads by Google