Overview Development of an advanced submit file

Slides:



Advertisements
Similar presentations
Arrays A list is an ordered collection of scalars. An array is a variable that holds a list. Arrays have a minimum size of 0 and a very large maximum size.
Advertisements

2 nd May 2007Condor Week 2007, Madison WI1 Condor Gotchas II John Kewley Daresbury Laboratory e-Science Centre ( this time it's cynical!)
Introduction to C Programming
Input from STDIN STDIN, standard input, comes from the keyboard. STDIN can also be used with file re-direction from the command line. For instance, if.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Perl I/O Software Tools. Lecture 15 / Slide 2 Input from STDIN Reading from STDIN is easy, and we have done it many times. $a = ; In a scalar context,
Guide To UNIX Using Linux Third Edition
Introduction to C Programming
New condor_submit features in HTCondor 8.3/8.4 John (TJ) Knoeller Condor Week 2015.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Introduction to Python
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
History of C 1950 – FORTRAN (Formula Translator) 1959 – COBOL (Common Business Oriented Language) 1971 – Pascal Between Ada.
Intro and Review Welcome to Java. Introduction Java application programming Use tools from the JDK to compile and run programs. Videos at
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
Grid job submission using HTCondor Andrew Lahiff.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
5 1 Data Files CGI/Perl Programming By Diane Zak.
(A Very Short) Introduction to Shell Scripts CSCI N321 – System and Network Administration Copyright © 2000, 2003 by Scott Orr and the Trustees of Indiana.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
1 Homework Done the reading? –K&R –Glass Chapters 1 and 2 Applied for cs240? (If not, keep at it!) Gotten a UNIX account? (If not, keep at it!)
Python Let’s get started!.
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
OCR Computing GCSE © Hodder Education 2013 Slide 1 OCR GCSE Computing Python programming 3: Built-in functions.
HTCondor Advanced Job Submission John (TJ) Knoeller Center for High Throughput Computing.
Chapter 3: Formatted Input/Output 1 Chapter 3 Formatted Input/Output.
Quiz 3 Topics Functions – using and writing. Lists: –operators used with lists. –keywords used with lists. –BIF’s used with lists. –list methods. Loops.
Intermediate Condor Monday morning, 10:45am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
HTCondor stats John (TJ) Knoeller Condor Week 2016.
Lecture 3: Getting Started & Input / Output (I/O)
Improvements to Configuration
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
Python Lesson 12 Mr. Kalmes.
C-Shell with Functions
Input from STDIN STDIN, standard input, comes from the keyboard.
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
Topics Introduction Hardware and Software How Computers Store Data
Python Let’s get started!.
Things you may not know about HTCondor
Formatting Output.
Things you may not know about HTCondor
Introduction to C CSE 2031 Fall /3/ :33 AM.
Getting Started with C.
HTCondor Command Line Monitoring Tool
Engineering Innovation Center
Troubleshooting Your Jobs
Submitting Many Jobs at Once
Topics Introduction to File Input and Output
Job Matching, Handling, and Other HTCondor Features
Topics Introduction Hardware and Software How Computers Store Data
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
Notes about Homework #4 Professor Hugh C. Lauer CS-1004 — Introduction to Programming for Non-Majors (Slides include materials from Python Programming:
CSE 303 Concepts and Tools for Software Development
Late Materialization has (lately) materialized
Input and Output Python3 Beginner #3.
Introduction to Bash Programming, part 3
Topics Introduction to File Input and Output
Overview Development of an advanced submit file
Troubleshooting Your Jobs
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

Submitting Jobs (and how to find them) John (TJ) Knoeller Center for High Throughput Computing

Overview Development of an advanced submit file Using as many techniques and tricks as possible. Custom print formats A few more random tricks (time permitting)

The Story I have a lot of media files that I have collected over the years. I want to convert them all to .mp4 (Sounds like a high-throughput problem…)

Basic submit file for conversion Executable = ffmpeg Transfer_executable = false Should_transfer_files = YES file = S1E2 The Train Job.wmv Transfer_input_files = $(file) Args = "-i '$(file)' '$(file).mp4'" Queue From now on, assume that Executable and should transfer are always present in the submit file.

Converting a set of files Transfer_input_files = $(file) Args = "-i '$(file)' '$(file).mp4' " Queue FILE from ( S1E1 Serenity.wmv S1E2 The Train Job.wmv S1E3 Bushwhacked.wmv S1E4 Shindig.wmv … )

Output filename problems Output is $(file).mp4. So output files are named S1E1 Serenity.wmv.mp4 S1E2 The Train Job.wmv.mp4 S1E3 Bushwhacked.wmv.mp4 S1E4 Shindig.wmv.mp4

$F() to the rescue $Fqpdnxba() expands to parts of a filename file = "./Video/Firefly/S1E4 Shindig.wmv" $Fp(file) -> ./Video/Firefly/ $Fqp(file) -> "./Video/Firefly" $Fqpa(file)-> './Video/Firefly' $Fd(file) -> Firefly/ $Fdb(file) -> Firefly $Fn(file) -> S1E4 Shindig $Fx(file) -> .wmv $Fnx(file) -> S1E4 Shindig.wmv

$Fn() is name without extension Transfer_Input_Files = $(file) Args = "-i '$Fnx(file)' '$Fn(file).mp4'" Resulting files are now S1E1 Serenity.mp4 S1E2 The Train Job.mp4 S1E3 Bushwhacked.mp4 S1E4 Shindig.mp4

$Fq() and Arguments $Fq(file) expands to quoted "filename" Gives "parse error" with Arguments statement For Args use '$F(file)' instead. Becomes 'filename' on LINUX Becomes "filename" on Windows In 8.6 you can use $Fqa(file) instead

"new" Args preserves spaces FILE = The Train Job.wmv Args = "-i '$Fnx(file)' -w640 '$Fn(file).mp4' " # Tool Tip* see it before you submit it. condor_submit test.sub -dump test.ads condor_status -ads test.ads -af Arguments -i The' 'Train' 'Job.wmv -w640 The' 'Train' 'Job.mp4 # On *nix the job sees -i 'The Train Job.wmv' –w640 'The Train Job.mp4‘ # on Windows the job sees -i "The Train Job.wmv" –w640 "The Train Job.mp4"

Sometimes you can't use Args Argument quoting not portable across operating systems LINUX needs space and ' escaped Windows needs double quotes around filenames that have space or ^ What the job sees can be hard to predict Transfer_input_files will not transfer a file with a comma in the name.

Alternative to Args Use a script as your executable Use custom job attributes to pass information to the script # these both set CustomAttr in the job ad +CustomAttr = "value" MY.CustomAttr = "value" You can refer to custom attributes in $() expansion in your submit file transfer_input_files = $F(My.CustomAttr) Use $F() here because it strips the quotes

Add custom attributes to the job Executable = xcode.pl Args = -s 640x360 Transfer_executable = true Should_transfer_files = true # +WantIOProxy = true MY.SourceDir = $Fqp(FILE) MY.SourceFile = $Fqnx(FILE) +OutFile = "$Fn(FILE).mp4" Batch_name = $Fdb(FILE) Queue FILE matching files Firefly/*.wmv

Use a script to query the .job.ad #!/usr/bin/env perl # xcode.pl # Pull filenames from job ad my $src = `condor_status -ads .job.ad -af SourceFile`; my $out = `condor_status -ads .job.ad -af OutFile`; # find condor_chirp (also need +WantIOProxy in job) my $lib = `condor_config_val libexec`; chomp $src; chomp $out; chomp $lib; # fetch the input file system("$lib/condor_chirp fetch '$src' '$src'") # do the conversion system("ffmpeg -i '$src' @ARGV '$out'");

Use python to query the .job.ad #!/usr/bin/env python # xcode.py # load the job ad and get the source filename job = classad.parseOne(open('.job.ad').read()) src = job['SourceFile'] srcpath = job['SourceDir'] + src # fetch the input file chirp.fetch(srcpath, src) # do the conversion and return the exit code exit(os.system("ffmpeg -i {0} {2} {1}".format( src, job['OutFile'], job['Arguments']));

See how it's going.. condor_q OWNER BATCH_NAME … DONE RUN IDLE TOTAL JOB_IDS Tj Firefly _ 2 2 _ 104.0-4 condor_q -af:jh JobStatus SourceFile SourceDir ID JobStatus SourceFile SourceDir 104.0 2 S1E1 Serenity Firefly/ 104.1 2 S1E2 The Train Job Firefly/ 104.2 1 S1E3 Bushwhacked Firefly/ 104.4 1 S134 Shindig Firefly/

The ClassAd "Pretty Printer" A c++ class within HTCondor that can print information from a ClassAd Used by condor_status / q / history for most of the standard output -format and -autoformat control it directly -print-format <format-file> gives the most control

Use a custom print format -print-format <format-file> control attributes, headings, format, constraint like -autoformat on steroids condor_status, condor_q, and condor_history Config to make it your default output An "experimental" feature, but stable htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ExperimentalFeatures

Custom print format syntax SELECT [LABEL [SEPARATOR <string>]] \ [FIELDPREFIX <string>] \ [FIELDSUFFIX <string>] \ [RECORDPREFIX] <string>] \ [RECORDSUFFIX] <string>] <expr> [AS <label>][ PRINTF <fmt> \ | PRINTAS <function> \ | WIDTH [AUTO | <int>]] \ [LEFT | RIGHT] [NOPREFIX] [NOSUFFIX] .. repeat, as needed [GROUP BY <sort-expr> [ASCENDING | DECENDING]

Custom print format xcode.cpf SELECT ClusterId AS " ID" PRINTAS JOB_ID JobStatus AS ST PRINTAS JOB_STATUS (time()-EnteredCurrentStatus)/60.0 AS MIN PRINTF %7.2f JobBatchName AS BATCH SourceFile AS SOURCE RemoteHost ?: "_" AS SLOT # ignore jobs without the custom SourceFile attribute WHERE SourceFile isnt undefined SUMMARY STANDARD ID ST MIN BATCH SOURCE SLOT 104.0 R 5.02 Firefly S1E2 The Train Job slot1@crane

Make it your default output In your personal config ~/.condor/user_config %USERPROFILE%\.condor\user_config Save the xcode.cpf file and add this knob # uncomment one of these depending on Linux/Windows # PERSONAL = $ENV(HOME)/.condor # PERSONAL = $ENV(USERPROFILE)\.condor Q_DEFAULT_PRINT_FORMAT_FILE=$(PERSONAL)/xcode.cpf

And now a few random tips...

Put Queue on command line condor_submit xcod.sub -q FILE matching *.wmv (xcod.sub must NOT have a queue line) pick.sh | condor_submit x.sub -q FILE from -

Test using a subset of jobs Queue FILE from ( S1E1 Serenity.wmv # S1E2 The Train Job.wmv # S1E3 Bushwhacked.wmv … ) use a python-style slice to define a subset Queue FILE matching files [:1] *.wmv

Even easier if you prepare Put $(slice) in your submit file Queue FILE matching files $(slice) *.wmv Then control the slice from the command line condor_submit 'slice=[:1]' firefly.sub

Variable Tricks # transfer files starting with file001 sequence = $(ProcId)+1 transfer_input_files = file$INT(sequence,%03d) # Use the submit dir and cluster as the batch name batch_name = $Ffdb(SUBMIT_FILE)_$(ClusterId) # use the same random value for all jobs in this submit include command : /bin/echo rval=$RANDOM_INTEGER(1,100) Arguments = $(rval)

Submit from python # HTCondor 8.7.9 or later for this code sub = htcondor.Submit(open('xcode.sub').read()) # override some things sub['MY.SourceDir'] = '"/media/Firefly/"' sub['FILE'] = '$(Item)' files = ['S1E1 Serenity.wmv', 'S1E2 The Train Job.wmv', 'S1E3 Bushwhacked.wmv', 'S1E4 Shindig.wmv' ] # submit using the files list with schedd.transaction() as txn: cluster = sub.queue_with_iter(txn, 1, iter(files)) # cluster object has ClusterId, range of ProcId's # and "common" job ClassAd

Questions?