Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman 1, Paul Wilson 2, Todd Tannenbaum 3, Matthew Farrellee 3, Miron Livny 3, John Brodholt 2, and Wolfgang Emmerich 1 1 Dept. of Computer Science, University College London, Gower St, London WC1E 6BT, United Kingdom 2 Dept. of Earth Sciences, University College London Gower St, London WC1E 6BT, United Kingdom 3 Computer Sciences Department, University of Wisconsin 1210 W. Dayton St., Madison, WI , U.S.A.
Goals Leverage acceptance of grid standards: investigate the potential for interoperability with established systems Complementary architectures: OGSA allows us to expose a range of Condor services Seamless integration of Condor resources in a standardized Grid environment Improving Condor’s grid capabilities: Bring Condor in line with advances in grid computing – and add significant new functionality Providing a set of high-throughput computing services to the grid community (workload management, scheduling, etc.) 2
Condor Architecture overview Central manager Execution machine(s)Submission machine(s) Startd Collector User jobs Schedd 3
Condor Architecture overview Central manager Execution machine(s)Submission machine(s) Startd Collector Schedd Negotiator Starter Shadow 4
Architectural alternatives Site B 1. Resource allocation request Manager Job queue Remote client Site A Manager Job queue Local Schedule r Remote client 1. Job execution request 2. Resource allocation request 3. Job execution 2. Job Execution 5 Option 1 Option 2
Comparisons Must take into account real world constraints such as: Firewalls or private LANs: might not have access to all machines of a pool – even though the use of SOAP should help ease access through firewalls Potential cost in resource usage (Condor is currently relatively lightweight) – need to consider weight of hosting environment -> debatable Should avoid interfering with intricate relationships between condor components 6
Option 1: Job Delegation Need to provide: -Job submission and queue management interface -Job execution management -Resource information providers: allow external sources to estimate pool suitability before submission Can be mapped to: schedd, collector (shadow) 7
Option 1: The scheduler Can present a transaction oriented interface for job submission Transient schedulers: allow users to instantiate their own instances of the scheduler via a scheduler factory -Isolates user/application-specific sets of jobs -Can be destroyed when no longer required -Security benefits: scheduler would no longer require root access. Expose job classAds as service data elements -Job classAds represent a job and its characteristics during its lifetime -Allows job information to be obtained via OGSA query mechanisms -Allows for asynchronous notifications of classAd updates 8
Option 1: Resource Information Providers The collector: Collects information about availability and characteristics of resources in a pool in the form of resource classAds Can expose resource classAds as individual service data elements Can complement this information with pool policies (priorities, job pre-emption rules, etc.) – but need a clearer representation of customer capabilities in Condor Will the central manager be accessible? (Firewalls…) Might want to use proxy services or redirect queries through the scheduler 9
Security and identity management Must use a 2-layer approach: Condor has its own access control and authorization system which defines different roles (administrator/owner/negotiator/user) and levels of access (read/write) It is possible for users to only need accounts on submission machines Can use GSI and X.509 certificates to manage global identities Can use a modified grid map-file to map global identities (Distinguished Names) to condor identities 10
Conclusion VO-wide management tools will be the focus point for future development work Project funded by DTI, JISC and Microsoft Starting point: implementation of a (transient) scheduler -Take advantage of OGSA concepts such as service data, notification and factories to boost Condor capabilities and ease remote access and integration in grid environment -Couple this with (VO-wide) discovery and monitoring services -Move to WSRF and Web Services Notification 11