Submitting Jobs (and how to find them) John (TJ) Knoeller Center for High Throughput Computing
Overview Development of an advanced submit file Using as many techniques and tricks as possible. Custom print formats A few more random tricks (time permitting)
The Story I have a lot of media files that I have collected over the years. I want to convert them all to .mp4 (Sounds like a high-throughput problem…)
Basic submit file for conversion Executable = ffmpeg Transfer_executable = false Should_transfer_files = YES file = S1E2 The Train Job.wmv Transfer_input_files = $(file) Args = "-i '$(file)' '$(file).mp4'" Queue From now on, assume that Executable and should transfer are always present in the submit file.
Converting a set of files Transfer_input_files = $(file) Args = "-i '$(file)' '$(file).mp4' " Queue FILE from ( S1E1 Serenity.wmv S1E2 The Train Job.wmv S1E3 Bushwhacked.wmv S1E4 Shindig.wmv … )
Output filename problems Output is $(file).mp4. So output files are named S1E1 Serenity.wmv.mp4 S1E2 The Train Job.wmv.mp4 S1E3 Bushwhacked.wmv.mp4 S1E4 Shindig.wmv.mp4
$F() to the rescue $Fqpdnxba() expands to parts of a filename file = "./Video/Firefly/S1E4 Shindig.wmv" $Fp(file) -> ./Video/Firefly/ $Fqp(file) -> "./Video/Firefly" $Fqpa(file)-> './Video/Firefly' $Fd(file) -> Firefly/ $Fdb(file) -> Firefly $Fn(file) -> S1E4 Shindig $Fx(file) -> .wmv $Fnx(file) -> S1E4 Shindig.wmv
$Fn() is name without extension Transfer_Input_Files = $(file) Args = "-i '$Fnx(file)' '$Fn(file).mp4'" Resulting files are now S1E1 Serenity.mp4 S1E2 The Train Job.mp4 S1E3 Bushwhacked.mp4 S1E4 Shindig.mp4
$Fq() and Arguments $Fq(file) expands to quoted "filename" Gives "parse error" with Arguments statement For Args use '$F(file)' instead. Becomes 'filename' on LINUX Becomes "filename" on Windows In 8.6 you can use $Fqa(file) instead
"new" Args preserves spaces FILE = The Train Job.wmv Args = "-i '$Fnx(file)' -w640 '$Fn(file).mp4' " # Tool Tip* see it before you submit it. condor_submit test.sub -dump test.ads condor_status -ads test.ads -af Arguments -i The' 'Train' 'Job.wmv -w640 The' 'Train' 'Job.mp4 # On *nix the job sees -i 'The Train Job.wmv' –w640 'The Train Job.mp4‘ # on Windows the job sees -i "The Train Job.wmv" –w640 "The Train Job.mp4"
Sometimes you can't use Args Argument quoting not portable across operating systems LINUX needs space and ' escaped Windows needs double quotes around filenames that have space or ^ What the job sees can be hard to predict Transfer_input_files will not transfer a file with a comma in the name.
Alternative to Args Use a script as your executable Use custom job attributes to pass information to the script # these both set CustomAttr in the job ad +CustomAttr = "value" MY.CustomAttr = "value" You can refer to custom attributes in $() expansion in your submit file transfer_input_files = $F(My.CustomAttr) Use $F() here because it strips the quotes
Add custom attributes to the job Executable = xcode.pl Args = -s 640x360 Transfer_executable = true Should_transfer_files = true # +WantIOProxy = true MY.SourceDir = $Fqp(FILE) MY.SourceFile = $Fqnx(FILE) +OutFile = "$Fn(FILE).mp4" Batch_name = $Fdb(FILE) Queue FILE matching files Firefly/*.wmv
Use a script to query the .job.ad #!/usr/bin/env perl # xcode.pl # Pull filenames from job ad my $src = `condor_status -ads .job.ad -af SourceFile`; my $out = `condor_status -ads .job.ad -af OutFile`; # find condor_chirp (also need +WantIOProxy in job) my $lib = `condor_config_val libexec`; chomp $src; chomp $out; chomp $lib; # fetch the input file system("$lib/condor_chirp fetch '$src' '$src'") # do the conversion system("ffmpeg -i '$src' @ARGV '$out'");
Use python to query the .job.ad #!/usr/bin/env python # xcode.py # load the job ad and get the source filename job = classad.parseOne(open('.job.ad').read()) src = job['SourceFile'] srcpath = job['SourceDir'] + src # fetch the input file chirp.fetch(srcpath, src) # do the conversion and return the exit code exit(os.system("ffmpeg -i {0} {2} {1}".format( src, job['OutFile'], job['Arguments']));
See how it's going.. condor_q OWNER BATCH_NAME … DONE RUN IDLE TOTAL JOB_IDS Tj Firefly _ 2 2 _ 104.0-4 condor_q -af:jh JobStatus SourceFile SourceDir ID JobStatus SourceFile SourceDir 104.0 2 S1E1 Serenity Firefly/ 104.1 2 S1E2 The Train Job Firefly/ 104.2 1 S1E3 Bushwhacked Firefly/ 104.4 1 S134 Shindig Firefly/
The ClassAd "Pretty Printer" A c++ class within HTCondor that can print information from a ClassAd Used by condor_status / q / history for most of the standard output -format and -autoformat control it directly -print-format <format-file> gives the most control
Use a custom print format -print-format <format-file> control attributes, headings, format, constraint like -autoformat on steroids condor_status, condor_q, and condor_history Config to make it your default output An "experimental" feature, but stable htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ExperimentalFeatures
Custom print format syntax SELECT [LABEL [SEPARATOR <string>]] \ [FIELDPREFIX <string>] \ [FIELDSUFFIX <string>] \ [RECORDPREFIX] <string>] \ [RECORDSUFFIX] <string>] <expr> [AS <label>][ PRINTF <fmt> \ | PRINTAS <function> \ | WIDTH [AUTO | <int>]] \ [LEFT | RIGHT] [NOPREFIX] [NOSUFFIX] .. repeat, as needed [GROUP BY <sort-expr> [ASCENDING | DECENDING]
Custom print format xcode.cpf SELECT ClusterId AS " ID" PRINTAS JOB_ID JobStatus AS ST PRINTAS JOB_STATUS (time()-EnteredCurrentStatus)/60.0 AS MIN PRINTF %7.2f JobBatchName AS BATCH SourceFile AS SOURCE RemoteHost ?: "_" AS SLOT # ignore jobs without the custom SourceFile attribute WHERE SourceFile isnt undefined SUMMARY STANDARD ID ST MIN BATCH SOURCE SLOT 104.0 R 5.02 Firefly S1E2 The Train Job slot1@crane
Make it your default output In your personal config ~/.condor/user_config %USERPROFILE%\.condor\user_config Save the xcode.cpf file and add this knob # uncomment one of these depending on Linux/Windows # PERSONAL = $ENV(HOME)/.condor # PERSONAL = $ENV(USERPROFILE)\.condor Q_DEFAULT_PRINT_FORMAT_FILE=$(PERSONAL)/xcode.cpf
And now a few random tips...
Put Queue on command line condor_submit xcod.sub -q FILE matching *.wmv (xcod.sub must NOT have a queue line) pick.sh | condor_submit x.sub -q FILE from -
Test using a subset of jobs Queue FILE from ( S1E1 Serenity.wmv # S1E2 The Train Job.wmv # S1E3 Bushwhacked.wmv … ) use a python-style slice to define a subset Queue FILE matching files [:1] *.wmv
Even easier if you prepare Put $(slice) in your submit file Queue FILE matching files $(slice) *.wmv Then control the slice from the command line condor_submit 'slice=[:1]' firefly.sub
Variable Tricks # transfer files starting with file001 sequence = $(ProcId)+1 transfer_input_files = file$INT(sequence,%03d) # Use the submit dir and cluster as the batch name batch_name = $Ffdb(SUBMIT_FILE)_$(ClusterId) # use the same random value for all jobs in this submit include command : /bin/echo rval=$RANDOM_INTEGER(1,100) Arguments = $(rval)
Submit from python # HTCondor 8.7.9 or later for this code sub = htcondor.Submit(open('xcode.sub').read()) # override some things sub['MY.SourceDir'] = '"/media/Firefly/"' sub['FILE'] = '$(Item)' files = ['S1E1 Serenity.wmv', 'S1E2 The Train Job.wmv', 'S1E3 Bushwhacked.wmv', 'S1E4 Shindig.wmv' ] # submit using the files list with schedd.transaction() as txn: cluster = sub.queue_with_iter(txn, 1, iter(files)) # cluster object has ClusterId, range of ProcId's # and "common" job ClassAd
Questions?