Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview Development of an advanced submit file

Similar presentations


Presentation on theme: "Overview Development of an advanced submit file"— Presentation transcript:

1 Submitting Jobs (and how to find them) John (TJ) Knoeller Center for High Throughput Computing

2 Overview Development of an advanced submit file
Using as many techniques and tricks as possible. Custom print formats A few more random tricks (time permitting)

3 The Story I have a lot of media files that I have collected over the years. I want to convert them all to .mp4 (Sounds like a high-throughput problem…)

4 Basic submit file for conversion
Executable = ffmpeg Transfer_executable = false Should_transfer_files = YES file = S1E2 The Train Job.wmv Transfer_input_files = $(file) Args = "-i '$(file)' '$(file).mp4'" Queue I will assume that ffmpeg is installed on the execute nodes and is in the path Future examples will leave out Executable and Should_transfer_files keywords, you should assume that they are present

5 Converting a set of files
Transfer_input_files = $(file) Args = "-i '$(file)' '$(file).mp4' " Queue FILE from ( S1E1 Serenity.wmv S1E2 The Train Job.wmv S1E3 Bushwhacked.wmv S1E4 Shindig.wmv … ) The first refinement is to submit multiple jobs from a single submit. One way to do this is to imbed the list of filenames in the submit file after the queue statement. This list of filenames could also be a separate file or the output of a script. We will use the inline list for now because it makes things easier to follow, in the long run we will want to use an list file or a directory enumeration.

6 Output filename problems
Output is $(file).mp4. So output files are named S1E1 Serenity.wmv.mp4 S1E2 The Train Job.wmv.mp4 S1E3 Bushwhacked.wmv.mp4 S1E4 Shindig.wmv.mp4 The submit file as written produces output files that just have the .mp4 file extension tacked on to the original filename. this is not really what we want.

7 $F() to the rescue $Fqpdnxba() expands to parts of a filename
file = "./Video/Firefly/S1E4 Shindig.wmv" $Fp(file) -> ./Video/Firefly/ $Fqp(file) -> "./Video/Firefly" $Fqpa(file)-> './Video/Firefly' $Fd(file) -> Firefly/ $Fdb(file) -> Firefly $Fn(file) -> S1E4 Shindig $Fx(file) -> .wmv $Fnx(file) -> S1E4 Shindig.wmv The solution is to use the $F() macro to extract the file basename, and then use that to specify the output filename explicitly $F() takes a variety of options, in any order that control how it expands the various filename parts. the above is not a comprehensive list of combinations.

8 $Fn() is name without extension
Transfer_Input_Files = $(file) Args = "-i '$Fnx(file)' '$Fn(file).mp4'" Resulting files are now S1E1 Serenity.mp4 S1E2 The Train Job.mp4 S1E3 Bushwhacked.mp4 S1E4 Shindig.mp4 To solve the problem mentioned in slide. we will use $Fn() to extract the basename of the file. and then tack on the .mp4 extension. We also use $Fnx() to remove the path from the input filename since once the file has been transferred it will be in the current directory when ffmpeg is inovoked.

9 $Fq() and Arguments $Fq(file) expands to quoted "filename"
Gives "parse error" with Arguments statement For Args use '$F(file)' instead. Becomes 'filename' on LINUX Becomes "filename" on Windows In 8.6 you can use $Fqa(file) instead Note that we used '$F()' in the previous slide. This is because you cannot use " in a submit file arguments statement without getting a parse error. What condor_submit expects you to do is use '' and it will convert those to ' or " depending on the platform that the job executes on. Starting with 8.6 you can use $Fqa() when you use $F() on an arguments line.

10 "new" Args preserves spaces
FILE = The Train Job.wmv Args = "-i '$Fnx(file)' -w640 '$Fn(file).mp4' " # Tool Tip* see it before you submit it. condor_submit test.sub -dump test.ads condor_status -ads test.ads -af Arguments -i The' 'Train' 'Job.wmv -w640 The' 'Train' 'Job.mp4 # On *nix the job sees -i 'The Train Job.wmv' –w640 'The Train Job.mp4‘ # on Windows the job sees -i "The Train Job.wmv" –w640 "The Train Job.mp4" When an args or arguments line starts with ", it is parsed as *new* arguments. Where "new" means better the *really* broken way that arguments were parsed in the HTCondor dark ages. This is the only way to get HTCondor to preserve spaces on your arguments line. Unfortunately, this is ALL that is does. Notice that the actual value of arguments in the job ad. each space to be preserved is wrapped in ' '. Then when the arguments are built up on the execute node the spaces are preserved by inserting appropriate '' or "".

11 Sometimes you can't use Args
Argument quoting not portable across operating systems LINUX needs space and ' escaped Windows needs double quotes around filenames that have space or ^ What the job sees can be hard to predict Also - Transfer_input_files will not transfer a file with a comma in the name. Since the job argument attribute doesn't do general purpose escaping of special characters, and because it *does* attempt to re-write arguments for the target execute node, there are some things you just can't do within an arguments statement.

12 Alternative to Args Use a script as your executable
Use custom job attributes to pass information to the script # these both set CustomAttr in the job ad +CustomAttr = "value" MY.CustomAttr = "value" You can refer to custom attributes in $() expansion in your submit file transfer_input_files = $F(My.CustomAttr) So when you want to pass arguments to your job, you may need to use other mechanisms. You can store values in custom job attributes, and you can also refer to these custom attributes in $() expansions with a My prefix. Note that we use Use $F() here because it strips the quotes that we had to use when we set the custom attributes. You can also pass information to the job using the environment.

13 Add custom attributes to the job
Executable = xcode.pl Args = -s 640x360 Transfer_executable = true Should_transfer_files = true # +WantIOProxy = true MY.SourceDir = $Fqp(FILE) MY.SourceFile = $Fqnx(FILE) +OutFile = "$Fn(FILE).mp4" Batch_name = $Fdb(FILE) Queue FILE matching files Firefly/*.wmv

14 Use a script to query the .job.ad
#!/usr/bin/env perl # xcode.pl # Pull filenames from job ad my $src = `condor_status -ads .job.ad -af SourceFile`; my $out = `condor_status -ads .job.ad -af OutFile`; # find condor_chirp (also need +WantIOProxy in job) my $lib = `condor_config_val libexec`; chomp $src; chomp $out; chomp $lib; # fetch the input file system("$lib/condor_chirp fetch '$src' '$src'") # do the conversion system("ffmpeg -i '$out'"); you can use a script to pull values out of job attributes. The example here is perl, but these days python might be better because of the python bindings. you can use HTCondor tools in your script to query the job ad, and the local HTCondor configuration

15 Use python to query the .job.ad
#!/usr/bin/env python # xcode.py import htcondor, classad, htchirp, sys, os # load the job ad and get the source filename job = classad.parseOne(open('.job.ad').read()) src = job['SourceFile'] srcpath = job['SourceDir'] + src # fetch the input file htchirp.fetch(srcpath, src) # do the conversion and return the exit code sys.exit(os.system("ffmpeg -i {0} {2} {1}".format( src, job['OutFile'], job['Arguments'])); In some ways python is a better choice for a job wrapper script since the bindings make it easy to query the job ad and configuration and to do chirp operations.

16 Can I use a script to create my submit file?

17 Submit using python Use the htcondor.Submit class
Uses the condor_submit keywords Supports $() expansion like condor_submit Use queue_with_itemdata() optional count overrides queue arguments optional iterator of itemdata overrides queue arguments Returns a SubmitResult which contains a Classad, clusterId and range of procIds If you plan to use a script to generate your submit files, skip making the submit file entirely and submit using the HTCondor python bindings.

18 Other python submit methods
Submit.queue_with_itemdata() Use this one, ignore the others. Submit.queue() Ok. but ad_results can use a lot of memory Schedd.submit() and submitMany() Takes raw job classads Alters Requirements, Iwd Don’t use this method! There are older submit methods that you probably should not use anymore. submit and submitMany were the original methods, this is close to the raw command interface of submit to the schedd, but it is very hard to created correct job classads, especially since condor_submit doesn't make job ads until after it allocates a clusterId - something which is not possible using these interfaces. Submit.queue() is Ok to use, it operates in the same way as queue_with_itemdata(), but it has no way to supply itemdata, and for historical reasons it returns just the clusterId and not a SubmitResult. Also, in some older versions of HTCondor, passing anything other than none to the last argument of queue() can result in a memory leak.

19 Example: Submit from python
# HTCondor 8.8 or later for this code sub = htcondor.Submit(open('xcode.sub').read()) # override some things we read from xcode.sub srcdir = '/media/Firefly/' sub['MY.SourceDir'] = '"%s"' % srcdir sub['FILE'] = '$(Item)' files = os.listdir(srcdir) # submit using the files list with schedd.transaction() as txn: res = sub.queue_with_itemdata(txn, 1, iter(files)) # result object has ClusterId, a "common" job classad # and the range of ProcId's Python is also a good way to do job submission, particularly when you already have the list of files you want to operate on in a python data structure. Note that here we need to setup a translation of the temporary variable FILE that we used in the rest of the submit file, because queue_with_itemdata will use Item as the variable name when it is passed an iterator that iterates strings. You could also have passed a dictionary as the first item in the files arrary in which case the key names from that dictionary would be used rather than 'Item'.

20 See how it's going.. condor_q OWNER BATCH_NAME … DONE RUN IDLE TOTAL JOB_IDS Tj Firefly _ 2 2 _ condor_q -af:jh JobStatus SourceFile SourceDir ID JobStatus SourceFile SourceDir S1E1 Serenity Firefly/ S1E2 The Train Job Firefly/ S1E3 Bushwhacked Firefly/ S134 Shindig Firefly/ And now for a brief digression Since we have stored the source filename into the job ad, we can easily make a custom report showing source filename and job status. Unfortunately, the job status is stored in the job as and integer, so it's hard to translate. We can do an even better job by justing a custom print format.

21 And now a few random tips...

22 Put Queue on command line
condor_submit xcod.sub -q FILE matching *.wmv (xcod.sub must NOT have a queue line) pick.py | condor_submit x.sub -q FILE from - when a submit file does not have a queue statement, you can provide the queue line as a command line option to condor_submit. The first example above is correct for Windows. use -q 'FILE matching *.wmv' on Linux to avoid having *.wmv be globbed by the shell. The second example uses the output of a script (pick.sh) as the list of files, passing - as the "filename" to read from, which indiates that condor_submit should read the list of files from stdin - which is the output of the script. This example works the same on Windows as it does on Linux (assuming that Windows recognises pick.py as a valid executable)

23 Test using a subset of jobs
Queue FILE from ( S1E1 Serenity.wmv # S1E2 The Train Job.wmv # S1E3 Bushwhacked.wmv ) use a python-style slice to define a subset Queue FILE matching files [:1] *.wmv If your jobs list is inline in the submit file, you can comment out lines as a way to ignore them when submitting. This does not work when the file list is coming from an external file. A better way to do this is to use the optional python-style slice syntax to define a subse. for example the above will process only the first file that matches the pattern *.wmv

24 Even easier if you prepare
Put $(slice) in your submit file Queue FILE matching files $(slice) *.wmv Then control the slice from the command line condor_submit 'slice=[:1]' firefly.sub If you make a habit of putting $(slice) in your submit files, then you can set the value of slice on the command line when you invoke condor_submit. If you do not give slice a value when you submit it will default to nothing, which causes it to be replaced by nothing when we $() expand, and thus all files that match *.wmv will be processed.

25 Variable Tricks # transfer files starting with file001 sequence = $(ProcId)+1 transfer_input_files = $INT(sequence,file%03d) # Use the submit dir and cluster as the batch name batch_name = $Ffdb(SUBMIT_FILE)_$(ClusterId) # use the same random value for all jobs in this submit include command : /bin/echo rval=$RANDOM_INTEGER(1,100) Arguments = $(rval) the $INT() macro will lookup its first argument, then evaluate it and print it as an integer using the format specified in the second argument. So If ProcId = 0, the $INT() expansion would be file001 When condor_submit reads from a submit file, it stores the name into a magic variable SUBMIT_FILE, that you can use for $() expansion, and you can use $Ff() to force it to be turned into a full path, so $Ffdb() expands to the parent directory of the submit file. When a submit file contains an include command statement. $() expansion must happened on that line before the command is run, so you can use that fact to cause $RANDOM_INTEGER() to be run only once for a single submit (otherwise it would expand to a different random value for each job in that submission.)

26 Questions?


Download ppt "Overview Development of an advanced submit file"

Similar presentations


Ads by Google