Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor
Overview › Condor Development Process Stable vs. Development › New Features in › Significant improvements which are covered in other talks: What’s new in Condor-G covered by Todd Tannenbaum Hawkeye covered by Nick LeRoy COD (Computing On Demand) covered by Derek Wright Packaging and Testing covered by Alain Roy
Condor Development Process › We maintain two different releases at all times Stable Series Second digit is even: e.g , 6.4.7, Development Series Second digit is odd: e.g , 6.5.2
Stable Series › Heavily tested › Runs on our production pool of nearly 1,000 CPUs › No new features, only bugfixes, are allowed into a stable series › A given stable release is always compatible with other releases from the same series 6.4.X is compatible with 6.4.Y › Recommended for production pools
Development Series › Less heavily tested › Runs on our small(er) test pool. › New features and new technology are added frequently › Versions from the same development series are not always compatible with each other
Overview of New Features Windows DAGMan Better Security Central Manager Improved Negotiation Black Holes New Utilities Smarter File Transfer Submit-time file staging New Installer ClassAd improvements And More!!
Improvements in Condor for Windows › Ability to run SCHEDULER universe jobs DAGMan Any executable or batch file › JAVA universe support JVM provided by execution site Better error management Ability to use CHIRP (Remote I/O)
Improvements in Condor for Windows (cont) › New Support for: Windows XP Foreign Language versions of Windows Legacy 16-bit app › Improved Windows-to-UNIX job submission and vice versa. › BirdWatcher, a system tray icon which gives basic status and control of Condor
New Features in DAGMan › DAGMan previously required that all jobs share one log file › Each job can now have it’s own log file › Understands XML userlogs › Can produce.dot file graphs
Better Security › GSI (X.509 Certificates) implementation more complete and customizable Each Condor daemon can have its own certificate You can run a “Personal Condor” with your user proxy › Easier configuration If you already have Globus installed, very little additional configuration of Condor is necessary to start using X.509 certificates for authentication › Improved error messages if something goes wrong Tells you if the problems was network, authentication, or authorization related
Central Manager New Features › Keeps statistics on missed updates › Can use TCP instead of UDP, if you must › Redundant central managers can be running with the SECONDARY_COLLECTOR_LIST parameter If the main central manager goes down, you may still run administrative commands › Central Manager daemons can now run on any port COLLECTOR_HOST = condor.cs.wisc.edu:9019 NEGOTIATOR_HOST = condor.cs.wisc.edu:9020
Improved Negotiation › Allows the condor_schedd (the job queue manager) to send “classes” of jobs to the Negotiator for matching › Previously, jobs were sent one at a time. › Now, 1000 of the same job will take the same time to negotiate as 100, 10 or just one job › Currently, job classes are defined in the condor_config file. Very soon, they will be automatically determined… “Buckets” will be needed
Avoiding Black Holes › Condor can keep track of the last N resource matches › This can be used to prefer the same machine if restarted › Can also be used to avoid a machine if restarted, which is a first step towards avoiding “Black Holes” – machines that consume jobs but always fail to run them
New Utilites › ‘condor_q –held’ gives you a list of held jobs and the reason they were put on hold › ‘condor_config_val –config’ tells you where (file and line number) an attribute is defined › ‘condor_rm –f’ will forcefully remove a job, which is particularily useful when the globus jobmanager is not cooperating › ‘condor_fetch_log’ will grab a log file from a remote machine: condor_fetch_log c2-15.cs.wisc.edu STARTD
Smarter File Transfer › New file transfer mechanism: ShouldTransferFiles = YES | NO | IF_NEEDED YES : Always transfer files to execution site NO : Rely on a shared filesystem IF_NEEDED : will automatically transfer the files if the submit and execute machine are not in the same FileSystemDomain › Very useful for cross-platform submitting and also for flocking
Submit-Time File Staging › When submitting a job, you can tell Condor to create a “sandbox” of all necessary input files with ‘condor_submit –s’ › After completion, job can stay in queue with ‘leave_in_queue’ expression › Output files are then fetched manually
New Installer › For Windows Based on MSI (Microsoft Software Installer) Batch Install option › For UNIX Version will be available in RPMs Command line options specify the installation parameters, and no questions are asked Easier to automate
ClassAds › ClassAd attributes can be dynamically linked to external functions Example: [ label = “uptime” value = some_func_that_calls_uptime() ]
Misc New Features › Jobs can be submitted via GRAM (the Globus Gatekeeper) › Daemons do not have to run as ‘root’ or ‘condor’ to have multiple different users submitting › Rudimentary load balancing between checkpoint servers by picking one randomly from a list › More job policy expressions PERIODIC_RELEASE GLOBUS_RESUBMIT
Conclusion › Todd Tannenbaum will tell you about the roadmap for future work › Questions?