Download presentation
Presentation is loading. Please wait.
Published byCoral Thomas Modified over 9 years ago
1
Condor Tutorial NCSA Alliance ‘98 Presented by: The Condor Team University of Wisconsin-Madison Email: condor-admin@cs.wisc.educondor-admin@cs.wisc.edu URL: http://www.cs.wisc.edu/condorhttp://www.cs.wisc.edu/condor
2
Condor Tutorial, NCSA Alliance '98, April 27th 1998 2 Welcome to the Condor Tutorial! Introductions What is Condor ? A system for High Throughput Computing
3
Condor Tutorial, NCSA Alliance '98, April 27th 1998 3 The “Religion” behind High Throughput Computing Key Concepts: High Throughput Computing (HTC) Distributively owned resources
4
Condor Tutorial, NCSA Alliance '98, April 27th 1998 4 Performance vs.Throughput High Performance - Very large amounts of processing capacity over short time periods (FLOPS - Floating Point Operations Per Second) High Throughput - Large amounts of processing capacity sustained over very long time periods (FLOPY - Floating Point Operations Per Year) FLOPY 30758400*FLOPS
5
Condor Tutorial, NCSA Alliance '98, April 27th 1998 5 Distributed Ownership Due to dramatic decrease in the cost- performance ratio of hardware, powerful computing resources are owned today by individuals, groups, departments, … Huge increase in the aggregate processing capacity owned by the organization Much smaller increase in the capacity accessible by a single person
6
Condor Tutorial, NCSA Alliance '98, April 27th 1998 6 The Challenge and Motivation behind Condor Turn large collections of existing distributively owned (and perhaps non- dedicated) computing resources into effective High Throughput Computing Environments Minimize Wait while Idle
7
Condor Tutorial, NCSA Alliance '98, April 27th 1998 7 Road Block: Sociology Make owners (& system administrators) happy. Give owners full control on – –when and by whom private resources are used for HTC – –impact of HTC on private Quality of Service – –membership and information on HTC related activities No changes to existing software and make it easy – – to install, configure, monitor, and maintain Happy owners more resources higher throughput
8
Condor Tutorial, NCSA Alliance '98, April 27th 1998 8 Road Block: Robustness To be effective, a HTC environment must run as a 24-7-365 operation. Customers count on it Debugging and fault isolation may be a very time consuming processes In a large distributed system, everything that might go wrong will go wrong. Robust system less down time higher throughput
9
Condor Tutorial, NCSA Alliance '98, April 27th 1998 9 Road Block: Portability To be effective, the HTC software must run on and support the latest greatest hardware and software. Owners select hardware and software according to their needs and tradeoffs Customers expect it to be there. Application developer expect only few (if any) changes to their applications. Portability more platforms higher throughput
10
Condor Tutorial, NCSA Alliance '98, April 27th 1998 10 Condor’s unique mechanisms for HTC Matchmaking - enables requests for services and offers to provide services to find each other. Checkpointing - enables preemptive resume scheduling ( go ahead and use it as long as it is available!). Remote I/O - enables remote (from execution site) access to local (at submission site) data.
11
Condor Tutorial, NCSA Alliance '98, April 27th 1998 11 Condor Viewpoints Owner Creates resource offers User Creates resource requests Administrator Drinks Coffee Manages the pool-wide configuration Could also be the Owner
12
Condor Tutorial, NCSA Alliance '98, April 27th 1998 12 Condor Agents Condor Resource Agent condor_startd daemon allows a machine to execute Condor jobs enforces owner policy Condor User Agent condor_schedd daemon allows a machine to submit jobs to a pool
13
Condor Tutorial, NCSA Alliance '98, April 27th 1998 13 schedd Your Workstation The Tutorial Installation CentralManager Alliance ‘98 Pool startd
14
Condor Tutorial, NCSA Alliance '98, April 27th 1998 14 The Tutorial Installation CentralManager CentralManager Alliance ‘98 Pool UW-Madison Pool schedd schedd Your Workstation startd
15
Condor Tutorial, NCSA Alliance '98, April 27th 1998 15 Hands-on: Example #1 Joining the UW-Madison CS Condor Pool as a Submit-only node
16
Condor Tutorial, NCSA Alliance '98, April 27th 1998 16 Overview of Submitting a Job to Condor Create a Submit-Description File Run condor_compile to relink your program with the Condor Libraries, if Condor’s Checkpointing or Remote I/O support is desired Run condor_submit sends your request to the User Agent (condor_schedd)
17
Condor Tutorial, NCSA Alliance '98, April 27th 1998 17 Condor System Structure
18
Condor Tutorial, NCSA Alliance '98, April 27th 1998 18 Hands-on: Example #2 Submit Jobs to Condor
19
Condor Tutorial, NCSA Alliance '98, April 27th 1998 19 Condor Universes A Universe specifies a Condor runtime environment: STANDARD –Supports Checkpointing –Supports Remote System Calls –Has some limitations…. VANILLA –Any Unix executable (shell scripts, etc) –No Condor Checkpointing or Remote I/O
20
Condor Tutorial, NCSA Alliance '98, April 27th 1998 20 Hands-on: Example #3 Tour of User Tools/Commands
21
Condor Tutorial, NCSA Alliance '98, April 27th 1998 21 User Priorities in Condor Each active user in the pool has a user priority Viewed or changed with condor_userprio Like golf: the lower, the better A given user’s share of available machines is inversely related to the ratio between user priorities. Example: Fred’s priority is 10, Joe’s is 20. Fred will be allocated twice as many machines as Joe.
22
Condor Tutorial, NCSA Alliance '98, April 27th 1998 22 User Priorities in Condor, cont. Condor continuously adjusts user priorities over time machines allocated > priority, priority worsens machines allocated < priority, priority improves Priority Preemption Higher priority users will grab machines away from lower priority users (thanks to Checkpointing…) Starvation is prevented Priority “thrashing” is prevented
23
Condor Tutorial, NCSA Alliance '98, April 27th 1998 23 Parallel Jobs in Condor Condor can run parallel applications ( written to the popular PVM message passing library )
24
Condor Tutorial, NCSA Alliance '98, April 27th 1998 24 Master-Worker Paradigm Condor-PVM is designed to run PVM applications which follow the master-worker paradigm. Master has a pool of work, sends pieces of work to the workers, manages the work and the workers Worker gets a piece of work, does the computation, sends the result back
25
Condor Tutorial, NCSA Alliance '98, April 27th 1998 25 What does Condor-PVM do? Condor acts as the PVM resource manager. All pvm_addhost requests get re-mapped to Condor. Condor dynamically constructs PVM virtual machines out of non-dedicated desktop machines. When a machine leaves the pool, the user gets notified via the normal PVM notification mechanisms.
26
Condor Tutorial, NCSA Alliance '98, April 27th 1998 26 How to compile and submit Condor-PVM jobs Binary Compatible Compile and link with PVM library just as normal PVM applications. No need to link with Condor. Submit In the submit file set: universe = PVM machine_count =..
27
Condor Tutorial, NCSA Alliance '98, April 27th 1998 27 Classified Advertisements ClassAds Language for expressing attributes Semantics for evaluating them Intuitively, a ClassAd is a set of named expressions Each named expression is an attribute Expressions are similar to C … Constants, attribute references, operators
28
Condor Tutorial, NCSA Alliance '98, April 27th 1998 28 Classified Advertisements: Example MyType = "Machine" TargetType = "Job" Name = "froth.cs.wisc.edu" StartdIpAddr=" " Arch = "INTEL" OpSys = "SOLARIS251" VirtualMemory = 225312 Disk = 35957 KFlops = 21058 Mips = 103 LoadAvg = 0.011719 KeyboardIdle = 12 Cpus = 1 Memory = 128 Requirements = LoadAvg 15 * 60 Rank = 0
29
Condor Tutorial, NCSA Alliance '98, April 27th 1998 29 Classified Advertisements: Matching ClassAds are always considered in pairs Does ClassAd A match ClassAd B (and vice versa)?
30
Condor Tutorial, NCSA Alliance '98, April 27th 1998 30 Classified Advertisements: Examples ClassAd A MyType = "Apartment" TargetType = "ApartmentRenter" SquareArea = 3500 RentOffer = 1000 HeatIncluded = False OnBusLine = True Rank = UnderGrad==False + TARGET.RentOffer Requirements = MY.RentOffer - TARGET.RentOffer < 150 ClassAd B MyType = "ApartmentRenter" TargetType = "Apartment" UnderGrad = False RentOffer = 900 Rank = 1/(TARGET.RentOffer + 100.0) + 50*HeatIncluded Requirements = OnBusLine && SquareArea > 2700
31
Condor Tutorial, NCSA Alliance '98, April 27th 1998 31 ClassAds in the Condor System ClassAds allow Condor to be a general system Constraints and ranks on matches expressed by entities themselves Only priority logic integrated into Manager All principal entities in the Condor system are represented by ClassAds Machines, Jobs, Submitters
32
Condor Tutorial, NCSA Alliance '98, April 27th 1998 32 ClassAds in Condor: Requirements and Rank (Example) Friend = Owner == "tannenba" || Owner == "wright" ResearchGroup = Owner == "jbasney" || Owner == "raman" Trusted = Owner != "rival" && Owner != "riffraff" Requirements = Trusted && ( ResearchGroup || LoadAvg 15*60 ) Rank = Friend + ResearchGroup*10
33
Condor Tutorial, NCSA Alliance '98, April 27th 1998 33 Hands-on: Example #4 Submit Jobs with ClassAd Constraints
34
Condor Tutorial, NCSA Alliance '98, April 27th 1998 34 Resource Owner’s Viewpoint In Condor, the owner of the resource (machine owner) can dictate the terms and conditions under which that resource can be used How? Configure the Resource Agent’s Policy (condor_startd configuration)
35
Condor Tutorial, NCSA Alliance '98, April 27th 1998 35 Resource Agent Configuration Expressions START expression When TRUE, Condor can start a job –True = Unclaimed State –False = Owner State SUSPEND expression When TRUE, Condor suspends any job running on this machine CONTINUE expression When TRUE, will continue a suspended job
36
Condor Tutorial, NCSA Alliance '98, April 27th 1998 36 Resource Agent Configuration Expressions, cont. VACATE expression When TRUE, kick the job off of the machine (via a Checkpoint if possible) KILL expression When TRUE, kill the job immediately –No Checkpoint –On UNIX: a “kill -9”
37
Condor Tutorial, NCSA Alliance '98, April 27th 1998 37 Resource Agent Configuration Expressions, Cont. STARTSTART WANT SUSPEND SUSPENDSUSPEND VACATEVACATE WANT VACATE KILLKILL True False
38
Condor Tutorial, NCSA Alliance '98, April 27th 1998 38 Resource Agent Configuration Expressions, cont. Default Setup WANT_VACATE : True WANT_SUSPEND : True START : Keyboard_Idle && CPU_Idle SUSPEND : Keyboard_Busy || CPU_Busy CONTINUE : Keyboard and CPU idle again VACATE : If Suspended > 10 minutes KILL : If spent > 10 minutes in VACATE state
39
Condor Tutorial, NCSA Alliance '98, April 27th 1998 39 Hands-on: Example #5 UW-Madison CS Pool Startd Policy
40
Condor Tutorial, NCSA Alliance '98, April 27th 1998 40 Condor Administrator Features The condor_master is the administrator’s best friend Watches/restarts other daemons Sends Email if notices suspicious problems Runs condor_preen Provides administrator remote control
41
Condor Tutorial, NCSA Alliance '98, April 27th 1998 41 Condor Administrator Commands Administrator Commands condor_off [ hostname … ] –Down entire pool: condor_off `cat machines-file` condor_on condor_restart condor_reconfig (“on-the-fly” reconfiguration) condor_vacate These commands could be used by the Owner as well, if desired
42
Condor Tutorial, NCSA Alliance '98, April 27th 1998 42 Condor Host-based Access Control HOST_ALLOW and HOST_DENY to grant machines (subnets, domains) different access levels: READ access WRITE access ADMINISTRATOR access OWNER access
43
Condor Tutorial, NCSA Alliance '98, April 27th 1998 43 Example: Simple Host-based Access Control HOSTDENY_READ = *.mil HOSTALLOW_WRITE = *.ncsa.uiuc.edu HOSTDENY_WRITE = ppp*.ncsa.uiuc.edu, 172.44.* HOSTALLOW_ADMINISTRATOR = bigcheese.ncsa.uiuc.edu HOSTALLOW_OWNER = $(FULL_HOSTNAME), $(HOSTALLOW_ADMINISTRATOR)
44
Condor Tutorial, NCSA Alliance '98, April 27th 1998 44 Configuration File Hierarchy condor_config Pool-wide default Condor pool administrator’s requirements condor_config.local Overrides for a specific machine Reflects Owner’s requirements condor_config.root System Administrator requirements
45
Condor Tutorial, NCSA Alliance '98, April 27th 1998 45 Future Directions Condor for Windows NT SMP support More parallel job support Checkpoint parallel jobs MPI, MPI-2 Flocking …
46
Condor Tutorial, NCSA Alliance '98, April 27th 1998 46 Obtaining Condor Condor can be downloaded from the Condor web site at: http://www.cs.wisc.edu/condor Complete Users and Administrators manual available http://www.cs.wisc.edu/condor/manual Contracted Support is available Questions? Email : condor-admin@cs.wisc.edu
47
Condor Tutorial, NCSA Alliance '98, April 27th 1998 47 Thank You!! Thank you for your interest! The Condor Team: Miron Livny Marvin Solomon Todd Tannenbaum Derek Wright Bin Song Rajesh Raman Tom Stanis Jim Basney Adiel Yoaz
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.