Presentation is loading. Please wait.

Presentation is loading. Please wait.

Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.

Similar presentations


Presentation on theme: "Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap."— Presentation transcript:

1 Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Condor RoadMap

2 www.cs.wisc.edu/condor 2 Outline › The “Big Picture” › Version 6.7.x  Availability Failover  Scalability Resources, jobs, matchmaking framework, files  Accessibility APIs, more Grid middleware, network

3 www.cs.wisc.edu/condor 3 Big Picture What do we want to achieve What do we want to achieve in a new Condor developer series? › Technology Transfer  Building a bridge between the Condor production software development activity and the academic core research activity BAD-FS, Stork, Diskrouter, Parrot (transparent I/O), Schedd Glidein, VO Schedulers, HA, Management, Improved ClassAds…

4 www.cs.wisc.edu/condor 4 What do we want to achieve, cont? New Ports: Go to where the cycles are! The RedHat Dilemma Our porting ‘hopper’ : AIX 5.1L on the PowerPC architecture Redhat AS server on x86 Fedora Core on x86 Fedora Core 2 on x86 Redhat AS server on AMD64 SuSE 8.0 on AMD64 Redhat AS server on IA64 HPUX 11.11 64-bit

5 www.cs.wisc.edu/condor 5 What do we want to achieve, cont. › Improve existing ports  Move “clipped wing” port to full ports (w/ checkpoint, process migration) Max OS X, Windows  Better integration into environments Windows: operate better w/ DFS, use MSI Unix: operate w/ AFS

6 www.cs.wisc.edu/condor 6 What do we want to achieve, cont. › Address changes in the computing landscape  Firewalls, NATs  64-bit operating systems  Emphasis on data  Movement towards standards such as WS, OGSA, …

7 www.cs.wisc.edu/condor 7 Version 6.7.x Theme › Version 6.7.x  Scalability Resources, jobs, matchmaking framework, security  Availability Failover  Accessibility APIs, more Grid middleware, network

8 www.cs.wisc.edu/condor 8 What happens if my submit machine reboots? Once upon a time, only one answer: job restarts. Checkpoint? No Checkpoint? High Availability in v6.7.x

9 www.cs.wisc.edu/condor 9 New: Job Progress continues if connection is interrupted › Now for Vanilla and Java universe jobs, Condor now supports reestablishment of the connection between the submitting and executing machines. › To take advantage of this feature, put the following line into their job’s submit description file: JobLeaseDuration = For example: JobLeaseDuration = 1200

10 www.cs.wisc.edu/condor 10 What if the submission point spontaneously explodes? (don’t try this at home)

11 www.cs.wisc.edu/condor 11 More High Availability Solutions › Condor can support a submit machine “hot spare”  If your submit machine is down for longer than N minutes, a second machine can take over › Two mechanisms available  Job Mirroring Described by Jaime earlier today  High Availability Daemon Failover Just tell the condor_master to run ONE instance

12 www.cs.wisc.edu/condor 12 Daemon Failover Master SchedD Master SchedD Refresh Lock Check Lock Machine A Machine B Active(hot spare) Obtain Lock Refresh Lock Active

13 www.cs.wisc.edu/condor 13 Accessibility › Support for GCB  Condor working w/ NATs, Firewalls › Distributed Resource Management Application API (DRMAA)  GGF Working Group  An API specification for the submission and control of jobs to one or more Distributed Resource Management (DRM) systems  Condor DRMAA interface to appear in v6.7.0

14 www.cs.wisc.edu/condor 14 SOAP/Grid Service condor_schedd Cedar OGSI: SOAP HTTPG Web Service: SOAP HTTPS

15 www.cs.wisc.edu/condor 15 New “Grid Universe” › With new Grid Universe, always specify a ‘gridtype’. So the old “globus” Universe is now declared as: universe = grid gridtype = gt2 › Other gridtypes? GT3 for OGSA- based Globus Toolkit 3

16 www.cs.wisc.edu/condor 16 Condor-G improvements › Condor-G can submit to either Globus GT2 or GT3 resources, including support for GT3 with web services.  Condor-G includes everything required; no need for client to have a GT3 installation.  Good migration path to OGSA › Condor-G to Nordugrid, Unicore, Condor, ORACLE › Support for credential refresh via the MyProxy Online Credential Management in NMI http://grid.ncsa.uiuc.edu/myproxy/

17 www.cs.wisc.edu/condor 17 Why Condor + MyProxy? › Long-lived tasks or services need credentials  Task lifetime is difficult to predict › Don’t want to delegate long-lived credentials  Fear of compromise › Instead, renew credentials with MyProxy as needed during the task’s lifetime  Provides a single point of monitoring and control  Renewal policy can be modified at any time For example, disable renewals if compromise is detected or suspected

18 www.cs.wisc.edu/condor 18 Credential Renewal Condor-G Scheduler MyProxy Resource Manager Job HomeRemote Submit Jobs Enable Renewal Launch Job Retrieve Credentials Refresh Credentials

19 www.cs.wisc.edu/condor 19 More… › Condor can now transfer job data files larger than 2 GB in size.  On all platforms that support 64bit file offsets › Real-time spooling of stdout/err/in in any universe incl VANILLA  Real-time monitoring of job progress

20 www.cs.wisc.edu/condor 20 Thank you!


Download ppt "Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap."

Similar presentations


Ads by Google