Download presentation
Presentation is loading. Please wait.
1
Intermediate HTCondor: More Workflows Monday pm
Greg Thain Center For High Throughput Computing University of Wisconsin-Madison
2
Before we begin… Any questions on the lectures or exercises up to this point?
3
Life cycle of HTCondor Job
Held Idle Running Complete Submit file Suspend Suspend Suspend History file
4
Life cycle of HTCondor Job
Held Output files xfered here Idle Running Complete Submit file Suspend Suspend Suspend History file
5
What about long running job?
Held Idle Running Complete Submit file Suspend Suspend Suspend History file
6
What about long running job?
Held Idle Running Complete Submit file output files removed Suspend Suspend Suspend History file
7
WHEN_TO_TRANSFER_OUTPUT
Universe = vanilla Executable = gronk should_transfers_files = yes WHEN_TO_TRANSFER_OUTPUT = ON_EXIT_OR_EVICT Arguments = $(OUTPUT) queue
8
ON_EXIT_OR_EVICT Submit file output files saved! History file Held
Idle Running Complete Submit file output files saved! Suspend Suspend Suspend History file
9
Advanced DAGMan Tricks
DAGMan Variables DAGs without dependencies Throttles Sub-DAGs Retries Pre and Post scripts: editing your DAG SPLICes: DAGs as subroutines
10
Throttles Throttles to control job submissions
Max jobs idle condor_submit_dag -maxidle XX work.dag Max_jobs_submitted Max scripts running condor_submit_dag -maxpre XX -maxpost XX Useful for “big bag of tasks” Schedd holds everything in memory
11
DAGMan variables # Diamond dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D
12
DAGMan variables (Cont)
# Diamond dag Job A a.sub Job B a.sub Job C a.sub Job D a.sub VARS A OUTPUT=“A.out” VARS B OUTPUT=“B.out” VARS C OUTPUT=“C.out” VARS D OUTPUt=“D.out” Parent A Child B C Parent B C Child D
13
DAGMan variables (cont)
# a.sub Universe = vanilla Executable = gronk Arguments = $(OUTPUT) queue
14
Retries Failed nodes can be automatically retried a configurable number of times Helps when jobs randomly crash Job A a.sub Job B b.sub Job C c.sub Job D d.sub RETRY D 5 Parent A Child B C Parent B C Child D
15
DAGs without dependencies
Submit DAG with: 200,000 nodes No dependencies Use DAGMan to throttle the job submissions: HTCondor is scalable, but it will have problems if you submit 200,000 jobs simultaneously A1 A2 A3 …
16
Shishkabob DAG Used for breaking long jobs into short
Easier for scheduling A1 A2 A3
17
Sub-DAG Idea: any given DAG node can be another DAG
SUBDAG External Name DAG-file DAG node will not complete until sub-dag finishes Interesting idea: A previous node could generate this DAG node Simpler DAG structure Implement a fixed-length loop Modify behavior on the fly
18
Sub-DAG A B C D V W X Y Z
19
Subdag Syntax # Diamond dag Job A a.sub Job B b.sub SUBDAG EXTERNAL C c.dag Job D d.sub Parent A Child B C Parent B C Child D
20
DAGMan scripts DAGMan allows pre & post scripts Syntax:
Run before (pre) or after (post) job Run on the same computer you submitted from Don’t have to be scripts: any executable Syntax: JOB A a.sub SCRIPT PRE A before-script $JOB SCRIPT POST A after-script $JOB $RETURN
21
So What? Pre script can make decisions
Where should my job run? (Particularly useful to make job run in same place as last job.) What should my job do? Generate Sub-DAG Post script can change return value DAGMan decides job failed in non-zero return value Post-script can look at {error code, output files, etc} and return zero or non-zero based on deeper knowledge.
22
SPLICEs: DAGs as subroutines
23
SPLICEs: DAGs as subroutines
24
A B C D A B C D A B C D E
25
SPLICE Syntax SPLICE name dagfile.dag Creates new node with dag as node CHILD / PARENT / etc all work on ndoes
26
Example JOB E E.submit SPLICE DIAMOND diamond.dag PARENT DIAMOND CHILD E
27
Let’s try it out! Exercises with DAGMan.
28
Questions? Questions? Comments? Feel free to ask me questions later:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.