Download presentation
Presentation is loading. Please wait.
Published byDarleen Patterson Modified over 8 years ago
1
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Condor RoadMap
2
www.cs.wisc.edu/condor 2 Outline › The “Big Picture” › Version 6.7.x Availability Failover Scalability Resources, jobs, matchmaking framework, files Accessibility APIs, more Grid middleware, network
3
www.cs.wisc.edu/condor 3 Big Picture What do we want to achieve What do we want to achieve in a new Condor developer series? › Technology Transfer Building a bridge between the Condor production software development activity and the academic core research activity BAD-FS, Stork, Diskrouter, Parrot (transparent I/O), Schedd Glidein, VO Schedulers, HA, Management, Improved ClassAds…
4
www.cs.wisc.edu/condor 4 What do we want to achieve, cont? New Ports: Go to where the cycles are! The RedHat Dilemma Our porting ‘hopper’ : AIX 5.1L on the PowerPC architecture Redhat AS server on x86 Fedora Core on x86 Fedora Core 2 on x86 Redhat AS server on AMD64 SuSE 8.0 on AMD64 Redhat AS server on IA64 HPUX 11.11 64-bit
5
www.cs.wisc.edu/condor 5 What do we want to achieve, cont. › Improve existing ports Move “clipped wing” port to full ports (w/ checkpoint, process migration) Max OS X, Windows Better integration into environments Windows: operate better w/ DFS, use MSI Unix: operate w/ AFS
6
www.cs.wisc.edu/condor 6 What do we want to achieve, cont. › Address changes in the computing landscape Firewalls, NATs 64-bit operating systems Emphasis on data Movement towards standards such as WS, OGSA, …
7
www.cs.wisc.edu/condor 7 Version 6.7.x Theme › Version 6.7.x Scalability Resources, jobs, matchmaking framework, security Availability Failover Accessibility APIs, more Grid middleware, network
8
www.cs.wisc.edu/condor 8 What happens if my submit machine reboots? Once upon a time, only one answer: job restarts. Checkpoint? No Checkpoint? High Availability in v6.7.x
9
www.cs.wisc.edu/condor 9 New: Job Progress continues if connection is interrupted › Now for Vanilla and Java universe jobs, Condor now supports reestablishment of the connection between the submitting and executing machines. › To take advantage of this feature, put the following line into their job’s submit description file: JobLeaseDuration = For example: JobLeaseDuration = 1200
10
www.cs.wisc.edu/condor 10 What if the submission point spontaneously explodes? (don’t try this at home)
11
www.cs.wisc.edu/condor 11 More High Availability Solutions › Condor can support a submit machine “hot spare” If your submit machine is down for longer than N minutes, a second machine can take over › Two mechanisms available Job Mirroring Described by Jaime earlier today High Availability Daemon Failover Just tell the condor_master to run ONE instance
12
www.cs.wisc.edu/condor 12 Daemon Failover Master SchedD Master SchedD Refresh Lock Check Lock Machine A Machine B Active(hot spare) Obtain Lock Refresh Lock Active
13
www.cs.wisc.edu/condor 13 Accessibility › Support for GCB Condor working w/ NATs, Firewalls › Distributed Resource Management Application API (DRMAA) GGF Working Group An API specification for the submission and control of jobs to one or more Distributed Resource Management (DRM) systems Condor DRMAA interface to appear in v6.7.0
14
www.cs.wisc.edu/condor 14 SOAP/Grid Service condor_schedd Cedar OGSI: SOAP HTTPG Web Service: SOAP HTTPS
15
www.cs.wisc.edu/condor 15 New “Grid Universe” › With new Grid Universe, always specify a ‘gridtype’. So the old “globus” Universe is now declared as: universe = grid gridtype = gt2 › Other gridtypes? GT3 for OGSA- based Globus Toolkit 3
16
www.cs.wisc.edu/condor 16 Condor-G improvements › Condor-G can submit to either Globus GT2 or GT3 resources, including support for GT3 with web services. Condor-G includes everything required; no need for client to have a GT3 installation. Good migration path to OGSA › Condor-G to Nordugrid, Unicore, Condor, ORACLE › Support for credential refresh via the MyProxy Online Credential Management in NMI http://grid.ncsa.uiuc.edu/myproxy/
17
www.cs.wisc.edu/condor 17 Why Condor + MyProxy? › Long-lived tasks or services need credentials Task lifetime is difficult to predict › Don’t want to delegate long-lived credentials Fear of compromise › Instead, renew credentials with MyProxy as needed during the task’s lifetime Provides a single point of monitoring and control Renewal policy can be modified at any time For example, disable renewals if compromise is detected or suspected
18
www.cs.wisc.edu/condor 18 Credential Renewal Condor-G Scheduler MyProxy Resource Manager Job HomeRemote Submit Jobs Enable Renewal Launch Job Retrieve Credentials Refresh Credentials
19
www.cs.wisc.edu/condor 19 More… › Condor can now transfer job data files larger than 2 GB in size. On all platforms that support 64bit file offsets › Real-time spooling of stdout/err/in in any universe incl VANILLA Real-time monitoring of job progress
20
www.cs.wisc.edu/condor 20 Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.