Download presentation
Presentation is loading. Please wait.
Published byNeil Holland Modified over 8 years ago
1
Condor Project Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Condor and DAGMan Barcelona, 2006
2
2 http://www.cs.wisc.edu/condor Agenda Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs
3
3 http://www.cs.wisc.edu/condor Some jobs have dependencies… Condor can help solve dependency problems
4
4 http://www.cs.wisc.edu/condor Frieda learns DAGMan Directed Acyclic Graph Manager DAGMan allows Frieda to specify the dependencies between her Condor jobs, so Condor manages the jobs automatically. Dependency example: Do not run job B until job A has completed successfully.
5
5 http://www.cs.wisc.edu/condor What is a DAG? Directed Acyclic Graph A DAG is the data structure used by DAGMan to represent dependencies A BC D
6
6 http://www.cs.wisc.edu/condor DAG Definitions DAGs have one or more nodes (or vertices). Dependencies are represented by arcs (or edges). These are arrows that go from parent to child). No cycles ! A BC D
7
7 http://www.cs.wisc.edu/condor Condor and DAGs Each node represents a Condor job Dependencies define the possible order of job execution Job A Job B Job C Job D
8
8 http://www.cs.wisc.edu/condor Defining a DAG to Condor A DAG input file defines a DAG: # file name: diamond.dag Job A a.submit Job B b.submit Job C c.submit Job D d.submit Parent A Child B C Parent B C Child D A BC D
9
9 http://www.cs.wisc.edu/condor Submit Description File For node B: # file name: # b.submit universe = vanilla executable = B input = B.in output = B.out error = B.err log = B.log queue For node C: # file name: # c.submit universe = standard executable = C input = C.in output = C.out error = C.err log = C.log queue
10
10 http://www.cs.wisc.edu/condor Submitting the DAG to Condor To submit the entire DAG, run condor_submit_dag diamond.dag condor_submit_dag creates a submit description file for DAGMan, and DAGMan itself is submitted as a Condor job!
11
11 http://www.cs.wisc.edu/condor a DAGMan requirement The submit description file for each job must specify a log file Log files may be separate or shared by different jobs within the DAG The log files are used to synchronize job submission
12
12 http://www.cs.wisc.edu/condor Nodes Job execution at a node is either successful or fails Based on the return value of the job 0 success not 0 failure A BC D
13
13 http://www.cs.wisc.edu/condor Advanced DAGMan Tricks Retry of a node Abort the entire DAG setting a variable, a VARS entry Throttles and DAGs PRE and POST scripts: editing the DAG Nested DAGs: loops and more
14
14 http://www.cs.wisc.edu/condor Retry Before a node is marked as failed... Retry N times. In the DAG input file: Retry C 4 (to rerun node C four times before calling the node failed) Retry N times, unless a node returns specific exit code. In the DAG input file: Retry C 4 UNLESS-EXIT 2
15
15 http://www.cs.wisc.edu/condor Abort the Entire DAG If a specific error value should cause the entire DAG to stop Place in the DAG input file: Abort-DAG-On B 3 Name of node Returned error code
16
16 http://www.cs.wisc.edu/condor VARS An entry in the DAG input file intended to reduce the number of unique submit description files needed defines a variable and value associated with a node use the value in a substitution macro
17
17 http://www.cs.wisc.edu/condor Root Invented Example: A Binary Tree A E B CD F Assume that a single executable processes each node. But, handling is different based on a node’s position as a left or right child.
18
18 http://www.cs.wisc.edu/condor The DAG Input File # tree example, file is tree.dag Job root node.submit Job A node.submit Vars A position=”left” Job B node.submit Vars B position=”right” Job C node.submit Vars C position=”left”... Parent root Child A B... Root A E B CD F
19
19 http://www.cs.wisc.edu/condor The Submit Description File # file name is node.submit executable = process.exe arguments = $(position) log = node.log queue The job at node A has the command line: process.exe left
20
20 http://www.cs.wisc.edu/condor Throttles Throttles to control number of job submissions at one time Maximum number of jobs submitted % condor_submit_dag –maxjobs 40 bigdag.dag Maximum number of jobs idle % condor_submit_dag –maxidle 10 bigdag.dag
21
21 http://www.cs.wisc.edu/condor Submit DAG with 200,000 nodes No dependencies between jobs Use DAGMan to throttle the jobs, because Condor is scalable, but will have problems with 200,000 simultaneous job submissions Throttling Example A1A1 A2A2 A3A3 … A 200000
22
22 http://www.cs.wisc.edu/condor DAGMan scripts DAGMan allows PRE and/or POST scripts Not necessarily a script: any executable Run before (PRE) or after (POST) job Run on the submit machine In the DAG input file: Job A a.submit Script PRE A before-script Script POST A after-script
23
23 http://www.cs.wisc.edu/condor node A within the DAG before-script after-script Condor job described in a.submit
24
24 http://www.cs.wisc.edu/condor PRE script PRE script can make decisions Should I pass different arguments to the job? Should I change a submit description file? Lazy decision making
25
25 http://www.cs.wisc.edu/condor POST script POST script is always run, independent of the Condor job’s return value POST script can change return value DAGMan marks the node failed for a non- zero return value from the POST script POST script can look at error code or output files and return 0 (success) or non-zero (failure) based on deeper knowledge.
26
26 http://www.cs.wisc.edu/condor Pre-defined variables In the DAG input file: Job A a.submit Script PRE A before-script $JOB Script POST A after-script $JOB $RETURN (optional) arguments to script $JOB becomes the string that defines the node name $RETURN becomes the return value from the Condor job defined by the node
27
27 http://www.cs.wisc.edu/condor Script Throttles Throttles to control the number of scripts running at one time % condor_submit_dag –maxpre 10 bigdag.dag OR % condor_submit_dag –maxpost 30 bigdag.dag
28
28 http://www.cs.wisc.edu/condor Nested DAGs Idea: any DAG node can be a script that does: 1.Make decision 2.Create DAG input file 3.Call condor_submit_day –nosubmit 4.Outer DAG waits for inner DAG DAG node will not complete until the inner (nested) DAG finishes Why? Implement a fixed-length loop Modify behavior on the fly
29
29 http://www.cs.wisc.edu/condor Nested DAG Example A BC D V W Z X Y C is
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.