Alain Roy Computer Sciences Department University of Wisconsin-Madison 24-June-2002 Using and Administering.

Slides:

Advertisements

Similar presentations

June 21-25, 2004Lecture2: Grid Job Management1 Lecture 3 Grid Resources and Job Management Jaime Frey Condor Project, University of Wisconsin-Madison

Advertisements

1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison

Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.

Condor DAGMan Warren Smith. 12/11/2009 TeraGrid Science Gateways Telecon2 Basics Condor provides workflow support with DAGMan Directed Acyclic Graph –Each.

Intermediate Condor: DAGMan Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.

SIE’s favourite pet: Condor (or how to easily run your programs in dozens of machines at a time) Adrián Santos Marrero E.T.S.I. Informática - ULL.

1 Using Condor An Introduction ICE 2008.

Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.

Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.

Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.

Derek Wright Computer Sciences Department, UW-Madison Lawrence Berkeley National Labs (LBNL)

Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.

Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.

Jaime Frey Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.

Introduction to Condor DMD/DFS J.Knudstrup December 2005.

Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.

CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.

Grid Computing, B. Wilkinson, 20046d.1 Schedulers and Resource Brokers.

Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.

April Open Science Grid Building a Campus Grid Mats Rynge – Renaissance Computing Institute University of North Carolina, Chapel.

High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.

Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.

The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.

Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.

Installing and Managing a Large Condor Pool Derek Wright Computer Sciences Department University of Wisconsin-Madison

Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G and DAGMan.

Grid Computing I CONDOR.

1 Using Condor An Introduction ICE 2010.

Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.

Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.

1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison

Part 8: DAGMan A: Grid Workflow Management B: DAGMan C: Laboratory: DAGMan.

Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison

Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.

Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.

1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison

The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison

Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.

Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.

Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.

July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.

HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.

Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,

Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.

Derek Wright Computer Sciences Department University of Wisconsin-Madison New Ways to Fetch Work The new hook infrastructure in Condor.

Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.

Grid Compute Resources and Job Management. 2 How do we access the grid ?  Command line with tools that you'll use  Specialised applications Ex: Write.

Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison

Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.

Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.

Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.

Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,

Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &

Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.

Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.

Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.

Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Introduction.

Intermediate Condor Monday morning, 10:45am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.

Condor DAGMan: Managing Job Dependencies with Condor

Operations Support Manager - Open Science Grid

Intermediate HTCondor: Workflows Monday pm

Using Stork An Introduction Condor Week 2006

Using Condor An Introduction Condor Week 2004

Using Condor An Introduction Condor Week 2003

Basic Grid Projects – Condor (Part I)

Using Condor An Introduction Paradyn/Condor Week 2002

Condor Administration

Condor-G Making Condor Grid Enabled

Presentation transcript:

Alain Roy Computer Sciences Department University of Wisconsin-Madison 24-June-2002 Using and Administering Condor

Добрый вечер! › Thank you for having me! › I am:  Alain Roy  Computer Science Ph.D. in Quality of Service, with Globus Project  Working with the Condor Project

Condor Tutorials Remaining › Monday (Today)17:00-19:00  Using and administering Condor › Tuesday17:00-19:00  Using Condor on the Grid

Review: What is Condor? › Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughput computing facility.  Run lots of jobs over a long period of time,  Not a short burst of “high-performance” › Condor manages both machines and jobs with ClassAd Matchmaking to keep everyone happy

Condor Takes Care of You › Condor does whatever it takes to run your jobs, even if some machines…  Crash (or are disconnected)  Run out of disk space  Don’t have your software installed  Are frequently needed by others  Are far away & managed by someone else

What is Unique about Condor? › ClassAds › Transparent checkpoint/restart › Remote system calls › Works in heterogeneous clusters › Clusters can be:  Dedicated  Opportunistic

What’s Condor Good For? › Managing a large number of jobs  You specify the jobs in a file and submit them to Condor, which runs them all and sends you when they complete  Mechanisms to help you manage huge numbers of jobs (1000’s), all the data, etc.  Condor can handle inter-job dependencies (DAGMan)

What’s Condor Good For? (cont’d) › Robustness  Checkpointing allows guaranteed forward progress of your jobs, even jobs that run for weeks before completion  If an execute machine crashes, you only lose work done since the last checkpoint  Condor maintains a persistent job queue - if the submit machine crashes, Condor will recover  (Story)

What’s Condor Good For? (cont’d) › Giving your job the agility to access more computing resources  Checkpointing allows your job to run on “opportunistic resources” (not dedicated)  Checkpointing also provides “migration” - if a machine is no longer available, move!  With remote system calls, run on systems which do not share a filesystem - You don’t even need an account on a machine where your job executes

Other Condor features › Implement your policy on when the jobs can run on your workstation › Implement your policy on the execution order of the jobs › Keep a log of your job activities

A Condor Pool In Action

A Bit of Condor Philosophy › Condor brings more computing to everyone  A small-time scientist can make an opportunistic pool with 10 machines, and get 10 times as much computing done.  A large collaboration can use Condor to control it’s dedicated pool with hundreds of machines.

The Idea Computing power is everywhere, we try to make it usable by anyone.

Remember Frieda? Today we’ll revisit Frieda’s Condor explorations in more depth

I have 600 simulations to run. Where can I get help?

Install a Personal Condor!

Installing Condor › Download Condor for your operating system › Available as a free download from › Available for most Unix platforms and Windows NT

So Frieda Installs Personal Condor on her machine… › What do we mean by a “Personal” Condor?  Condor on your own workstation, no root access required, no system administrator intervention needed—easy to set up.

Personal Condor?! What’s the benefit of a Condor “Pool” with just one user and one machine?

Your Personal Condor will... › Keep an eye on your jobs and will keep you posted on their progress › Keep a log of your job activities › Add fault tolerance to your jobs › Implement your policy on when the jobs can run on your workstation

What’s in a Personal Condor? › Everything that is in Condor, just one machine. › Condor daemons:  Condor_master  Condor_collector—Stores ClassAds for jobs, machines  Condor_negotiator—Matchmaking  Condor_schedd—Submits, monitors jobs  Condor_startd—Starts jobs  Condor_starter—Launches a job  Condor_shadow—Monitors remote job

A Condor Pool of One Condor_master Condor_schedd Condor_collector Condor_negotiator Condor_startd Condor_starter Condor job Condor_shadow

condor_master › Starts up all other Condor daemons › If there are any problems and a daemon exits, it restarts the daemon and sends to the administrator › Checks the time stamps on the binaries of the other Condor daemons, and if new binaries appear, the master will gracefully shutdown the currently running version and start the new version

condor_master (cont’d) › Acts as the server for many Condor remote administration commands:  condor_reconfig, condor_restart, condor_off, condor_on, condor_config_val, etc.

condor_startd › Represents a machine to the Condor system › Responsible for starting, suspending, and stopping jobs › Enforces the wishes of the machine owner (the owner’s “policy”… more on this soon)

condor_schedd › Represents users to the Condor system › Maintains the persistent queue of jobs › Responsible for contacting available machines and sending them jobs › Services user commands which manipulate the job queue:  condor_submit,condor_rm, condor_q, condor_hold, condor_release, condor_prio, …

condor_collector › Collects information from all other Condor daemons in the pool  “Directory Service” / Database for a Condor pool › Each daemon sends a periodic update called a “ClassAd” to the collector › Services queries for information:  Queries from other Condor daemons  Queries from users (condor_status)

condor_negotiator › Performs “matchmaking” in Condor › Gets information from the collector about all available machines and all idle jobs › Tries to match jobs with machines that will serve them › Both the job and the machine must satisfy each other’s requirements

Frieda wants more… › She decides to use the graduate students’ computers when they aren’t, and get done sooner. › In exchange, they can use the Condor pool too.

Frieda’s Condor pool… Frieda’s Computer: Central Manager Graduate Student’s Desktop Computers

A larger Condor pool Submitter Condor_master Condor_schedd Condor_shadow Collector Condor_master Condor_negotiator Condor_collector Submitter/Executor Condor_master Condor_scheddCondor_startd Condor_shadowCondor_starter Condor Job Executor Condor_master Condor_startd Condor_starter Condor Job

Happy Day! Frieda’s organization purchased a Beowulf Cluster! › Other scientists in her department have realized the power of Condor and want to share it.. › The Beowulf cluster and the graduate student computers can be part of a single Condor pool.

Frieda’s Condor pool… Central Manager Graduate Student’s Desktop Computers Beowulf Cluster

How would you set it up? › Grad student machines:  Submitters  Executors › Beowulf cluster machines  Executors only › Independent machine for collector/neg  Big job—take it away from Freida’s computer  Could split collector and negotiator

Frieda collaborates… › She wants to share her Condor pool with scientists from another lab.

Condor Flocking › Condor pools can work cooperatively

How would you set it up? › Two independent pools  Each has it’s own collector/negotiator › Set up flocking from one pool to another: by machine, or by pool.  FLOCK_TO  FLOCK_FROM › Can be uni- or bi-directional

Questions So Far?

How do you run a job? › It doesn’t matter if you have:  Personal Condor  Large Condor pool  Condor pool with flocking › Four steps 1. Write program 2. Write submit file 3. Give it to Condor 4. Condor gives you the results

Step 1: Writing a program › Condor has universes  Vanilla Universe: Run anything Less capable  Java Universe: Works better for Java  Standard Universe: Checkpointing Remote I/O Can’t work with all programs

Step 1: Vanilla Universe › You can run any program  C/C++/Perl/Python/Fortran/Java/Lisp…  No checkpointing: if your job is interrupted or the machine crashes, Condor has to restart it from the beginning.  Can do anything you could do if you were logged in.

Step 1: Java Universe › Works better for Java programs › Checks for valid Java environment › Distinguishes Java environment exceptions from program exceptions (wrapper program) › No checkpointing (it could happen though) › Remote I/O

Step 1: Standard Universe › Requires re-linking your program  condor_compile gcc –o simple simple.o › Allows checkpointing and remote I/O › Restrictions on behavior  No threading  Limited networking  Restrictions on compiler used

Step 2: Write submit file Executable = simple Universe = vanilla Arguments = First Log = simple.log Output = simple.output Error = simple.error Requirements = Memory > 512 Queue Note: This assumes a shared filesystem

Step 2: Write submit file Executable = simple Universe = vanilla Arguments = First Log = simple.log Output = simple.output Error = simple.error Transfer_input_files = data.in Transfer_output_files = data.out Requirements = Memory > 512 Queue Note: This does not assume a shared filesystem

Step 2: Write submit file Executable = simple Universe = standard Arguments = First Log = simple.log Output = simple.output Error = simple.error Requirements = Memory > 512 Queue Note: This does not assume a shared filesystem, but remote I/O

Step 2: Submit Files › Condor is helpful: it makes a real requirements:  Requirements = memory > 512 becomes…  Requirements = (OpSys == “Linux”) && (memory > 512) && … › Queue can take a parameter (more later) › A single file can submit many jobs

Step 3: Give it to Condor › condor_submit submit.desc › condor_q -- Submitter: dsonokwa.cs.wisc.edu : : dsonokwa.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 5.0 roy 6/15 20: :00:02 R simple First 1 jobs; 0 idle, 1 running, 0 held

Step 4: Condor gives it back › The program’s output is where you asked it to be. › Condor left a log file documenting what it did. › Condor optionally sends you an telling you it’s done.

Step 4: Condor gives it back 000 ( ) 06/15 21:00:01 Job submitted from host: 001 ( ) 06/15 21:00:01 Job executing on host: 005 ( ) 06/15 21:00:06 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage

Step 4: Condor gives it back Date: Sat, 15 Jun :00: (CDT) From: Condor Project Message-Id: To: Subject: [Condor] Condor Job This is an automated from the Condor system on machine "beak.cs.wisc.edu". Do not reply. Your condor job exited with status 0. Job: /scratch/roy/condor/simple/simple First

Clusters and Processes › If your submit file describes multiple jobs, we call this a “cluster”. › Each job within a cluster is called a “process” or “proc”. › If you only specify one job, you still get a cluster, but it has only one process. › A Condor “Job ID” is the cluster number, a period, and the process number (“23.5”) › Process numbers always start at 0.

Example Submit Description File for a Cluster # Example condor_submit input file that defines # a whole cluster of jobs at once Universe = standard Executable = simple Output = my_job.stdout Error = my_job.stderr Log = my_job.log Arguments = -arg1 -arg2 InitialDir = /home/roy/condor/run.$(Process) Queue 500

Questions So Far?

condor_q › Find out status of your jobs, from your condor_schedd. › condor_q cluster: all jobs in a cluster › condor_q cluster.proc: particular job › condor_q –sub name: jobs for a particular user

Temporarily halt a Job › Use condor_hold to place a job on hold  Kills job if currently running  Will not attempt to restart job until released › Use condor_release to remove a hold and permit job to be scheduled again

condor_rm › You submitted a job, but you want to cancel it › condor_rm clusterid  Condor_rm 6: all jobs in cluster › condor_rm clusterid.procid  condor_rm 6.3: specific job › condor_rm –all: all of your jobs › Can only remove your jobs › Reflected in job log

condor_status › Find status of pool from condor_collector (simplified view here) Name OpSys Arch State Activity carmi.cs.wisc LINUX INTEL Unclaimed Idle coral.cs.wisc LINUX INTEL Unclaimed Idle doc.cs.wisc.e LINUX INTEL Unclaimed Idle dsonokwa.cs.w LINUX INTEL Unclaimed Idle... Machines Owner Claimed Unclaimed LINUX SOLARIS Total

condor_status › condor_status –run: which machines are running jobs › condor_status –sub: whose jobs are running? › condor_status –constraint: restrict to showing subset as defined by user

DAGMan › Directed Acyclic Graph Manager › DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you. › (e.g., “Don’t run job “B” until job “A” has completed successfully.”)

What is a DAG? › A DAG is the data structure used by DAGMan to represent these dependencies. › Each job is a “node” in the DAG. › Each node can have any number of “parent” or “children” nodes – as long as there are no loops! Job A Job BJob C Job D

Defining a DAG › A DAG is defined by a.dag file, listing each of its nodes and their dependencies: # diamond.dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D › each node will run the Condor job specified by its accompanying Condor submit file Job A Job BJob C Job D

Submitting a DAG › To start your DAG, just run condor_submit_dag with your.dag file, and Condor will start a personal DAGMan daemon which to begin running your jobs: % condor_submit_dag diamond.dag › condor_submit_dag submits a Scheduler Universe Job with DAGMan as the executable. › Thus the DAGMan daemon itself runs as a Condor job, so you don’t have to baby-sit it.

DAGMan Running a DAG › DAGMan acts as a “meta-scheduler”, managing the submission of your jobs to Condor based on the DAG dependencies. Condor Job Queue C D A A B.dag File

DAGMan Running a DAG (cont’d) › DAGMan holds & submits jobs to the Condor queue at the appropriate times. Condor Job Queue C D B C B A

DAGMan Running a DAG (cont’d) › In case of a job failure, DAGMan continues until it can no longer make progress, and then creates a “rescue” file with the current state of the DAG. Condor Job Queue X D A B Rescue File

DAGMan Recovering a DAG › Once the failed job is ready to be re-run, the rescue file can be used to restore the prior state of the DAG. Condor Job Queue C D A B Rescue File C

DAGMan Recovering a DAG (cont’d) › Once that job completes, DAGMan will continue the DAG as if the failure never happened. Condor Job Queue C D A B D

DAGMan Finishing a DAG › Once the DAG is complete, the DAGMan job itself is finished, and exits. Condor Job Queue C D A B

Additional DAGMan Features › Provides other handy features for job management…  nodes can have PRE & POST scripts  failed nodes can be automatically re- tried a configurable number of times  job submission can be “throttled”

Questions So Far?

What if each job needed to run for 20 days? What if I wanted to interrupt a job with a higher priority job?

Condor’s Standard Universe to the rescue! › Condor can support various combinations of features/environments in different “Universes” › Different Universes provide different functionality for your job:  Vanilla—runs any Serial Job  Java—well suited for Java programs  Standard – Support for transparent process checkpoint and restart

Process Checkpointing › Condor’s Process Checkpointing mechanism saves all the state of a process into a checkpoint file  Memory, CPU, I/O, etc. › The process can then be restarted from right where it left off › Typically no changes to your job’s source code needed – however, your job must be relinked with Condor’s Standard Universe support library

Linking for Standard Universe To do this, just place “condor_compile” in front of the command you normally use to link your job: condor_compile gcc -o myjob myjob.c OR condor_compile f77 -o myjob filea.f fileb.f

Limitations in the Standard Universe › Condor’s checkpointing is not at the kernel level. Thus in the Standard Universe the job may not  Fork()  Use kernel threads  Use some forms of IPC, such as pipes and shared memory › Many typical scientific jobs are OK

When will Condor checkpoint your job? › Periodically, if desired  For fault tolerance › To free the machine to do a higher priority task (higher priority job, or a job from a user with higher priority)  Preemptive-resume scheduling › When you explicitly run condor_checkpoint, condor_vacate, condor_off or condor_restart command

Administering Condor › Condor provides extensive configuration files  One per pool, one per machine, or anything in between › Extensive documentation  Online manual  Heavily commented sample configuration file

I am adding nodes to the Cluster… but the Chemistry Department has priority on these nodes. (Boss Fat Cat) Policy Configuration

The Machine (Startd) Policy Expressions START – When is this machine willing to start a job RANK - Job Preferences SUSPEND - When to suspend a job CONTINUE - When to continue a suspended job PREEMPT – When to nicely stop running a job KILL - When to immediately kill a preempting job

Freida’s Current Settings START = True RANK = SUSPEND = False CONTINUE = PREEMPT = False KILL = False

Freida’s New Settings for the Chemistry nodes START = True RANK = Department == “Chemistry” SUSPEND = False CONTINUE = PREEMPT = False KILL = False

Submit file with Custom Attribute Executable = chem-job Universe = standard +Department = Chemistry queue

What if “Department” not specified? START = True RANK = Department =!= UNDEFINED && Department == “Chemistry” SUSPEND = False CONTINUE = PREEMPT = False KILL = False

Another example START = True RANK = Department =!= UNDEFINED && ((Department == “Chemistry”)*2 + Department == “Physics”) SUSPEND = False CONTINUE = PREEMPT = False KILL = False

The Cluster is fine. But not the desktop machines. Condor can only use the desktops when they would otherwise be idle. (Boss Fat Cat) Policy Configuration, cont

So Frieda decides she wants the desktops to: › START jobs when their has been no activity on the keyboard/mouse for 5 minutes and the load average is low › SUSPEND jobs as soon as activity is detected › PREEMPT jobs if the activity continues for 5 minutes or more › KILL jobs if they take more than 5 minutes to preempt

Macros in the Config File NonCondorLoadAvg = (LoadAvg - CondorLoadAvg) BackgroundLoad = 0.3 HighLoad = 0.5 KeyboardBusy = (KeyboardIdle < 10) CPU_Idle = ($(NonCondorLoadAvg) <= $(Background)) MachineBusy = ($(CPU_Busy) || $(KeyboardBusy)) ActivityTimer= (CurrentTime - EnteredCurrentActivity)

Desktop Machine Policy START = $(CPU_Idle) && KeyboardIdle > 300 SUSPEND= $(MachineBusy) CONTINUE = $(CPU_Idle) && KeyboardIdle > 120 PREEMPT= (Activity == "Suspended") && $(ActivityTimer) > 300 KILL = $(ActivityTimer) > 300

Policy Review › Users submitting jobs can specify Requirements and Rank expressions › Administrators can specify Startd Policy expressions individually for each machine (Start,Suspend,etc) › Expressions can use any job or machine ClassAd attribute › Custom attributes easily added › Bottom Line: Enforce almost any policy!

Administrator Commands › condor_vacateLeave a machine now › condor_onStart Condor › condor_offStop Condor › condor_reconfigReconfig on-the-fly › condor_config_valView/set config › condor_userprioUser Priorities › condor_statsView detailed usage accounting stats

Questions So Far?

Security in Condor › Since version 6.3.3, Condor has greatly improved security › Multiple authentication methods:  X509 (Using GSI)  Kerberos  Filesystem (shared filesystem, known user) › Encryption:  3DES  Blowfish

Security in Condor › Authentication  Based on users, with optional wildcards  Users can be given different permissions: Read Write Administrator Config

Version Numbers in Condor › Odd minor numbers are development releases:  6.3.1, 6.3.2, 6.5.0…  Compatibility not guaranteed within a series, like 6.3.x. › Even minor numbers are stable releases  6.2.2, 6.4.0, 6.4.1…  Compatibility guaranteed within a series, like 6.4.x.

Questions? Comments? › Web: ›