Condor Overview Bill Hoagland
Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware under distributed ownership
Condor History Developed by University of Wisconsin- Madison Computer Science Department First put into production use 15 years ago –Mature and stable
Condor Availability Freely available under a BSD style license Not open source, code is not distributed publicly
Supported Systems Solaris 8, 9, & 10 (Sparc) Red Hat & Fedora Core (x86) MS Windows 2000, XP & 2003 Server (x86) Mac OS 10.3 & 10.4 (PPC) Other Unixes (SuSE, AIX, HPUX,Yellow Dog, Debian)
Condor Design Originally developed for “cycle stealing” from idle machines Retains robustness to failures and changing availability from this legacy
Condor Goal “High throughput” vs “High performance” –High performance - fast machines (ie. Cray) –High throughput - many machines, fault tolerant infrastructure (ie.
Condor Components Job queueing Scheduling policy Priority mechanism Resource monitoring Resource management
Condor Highlights Checkpointing –Checkpointing saves complete running process and I/O state to disk
Checkpointing Allows recovery from failures –Roll back to the last saved state Allows process migration –Move saved state and restart
Checkpointing continued Can compress checkpoint images Checkpoint mechanism can be used outside of Condor
Checkpointing continued Some limitations –Single process space –Single kernel thread –Cannot save state of file open for both read and write Not supported on all platforms
Checkpointing continued Must have object files Usually requires no changes Relink code to include condor library layer, e.g. $ condor_compile gcc -o foo foo.c
Condor Highlights Remote system calls –Preserves user environment on remote machine –Users need not make files available or have access to remote machine
Condor Highlights Pools of Machines can be Hooked Together –Jobs submitted to one pool can migrate to a second –Subject to the policies of each pools owner
Condor Highlights Jobs can be Ordered –Jobs can be ordered because of dependencies easily –Dependencies are described in a directed acyclic graph
Condor Highlights Condor Enables Grid Computing –Condor has been designed with grid support hooks –Globus controlled resources
Condor Highlights Sensitive to the Desires of Machine Owners –Machine owners may set almost any usage policy
Condor Highlights Powerful priority policy mechanism –Requirements and preferences are associated with jobs and machines –A negotiation process matches job requirements then ranks on preferences
Condor Security Condors purpose is to allow users to run arbitrary code on large numbers of machines Assumes users are trustworthy
Condor Security continued Cannot protect against users that can elevate their privileges Does not run user jobs in sandboxes
Condor Security continued Can prevent unauthorized access to Condor Optional authentication e.g. Kerberos, Grid Security Infrastructure (GSI), others
Condor Security continued Can ensure that user data has not been examined or tampered with Optional encryption and integrity checking of all network traffic
Condor Backfill When machine completely idle… –Configure default job –Support for BOINC
Condor Configuration Controlled by hierarchical config files –Well commented –Human readable –In some cases, more clear than the manual
Condor Adminstration CondorView –Web based statistics –Machine and user data
Condor Website