Running a job on the grid is easier than you think! S. Bagnasco, INFN Torino Torino Jan 15-16, 2009
I lied Steps: Pretend everything works Learn the AliEn JDL syntax and semantics They’re subtly different from LCG ones… Write a master JDL file for a simple production job We’ll skip data management and collections for now Submit the job! Learn some aliensh commands to check its status Ps, top, spy, masterjob, registerOutput,… Try using MonaLisa to inspect the Task Queue Tomorrow, debug why it did not work…
Run a “production” like Latchezar does! Your first job Run a “production” like Latchezar does! Submit a masterjob that gets split and run wherever the system deems appropriate Will not use AliRoot for simplicity… Each subjob saves its output locally to wherever it ran
Your first job #!/usr/bin/env bash echo ******* Run: $2 Event: $1 aliroot -b <<EOF AliSimulation sim("myConfig.C"); sim.SetDefaultStorage("alien://Folder=/alice/simulation/2008/v4-15-Release/Ideal/"); sim.Run(); .q EOF ls -l AliReconstruction rec; rec.SetDefaultStorage("alien://Folder=/alice/simulation/2008/v4-15-Release/Ideal/"); rec.Run();
Your first jdl # This is an AliEn JDL! # runHelloWorld.jdl # Requirements = other.CE=="Alice::Torino::LCG"; Executable = "/alice/cern.ch/user/s/sbagnasc/bin/helloworld.csh"; TTL = 720; OutputDir = "/alice/cern.ch/user/s/sbagnasc/GridSchool/run.$1/job.#alien_counter#"; OutputFile = { "result.txt", "stdout", "stderr" }; Split = "production:1-100"; SplitArguments = "#alien_counter#";
Jdl syntax For single values: For a list: Reference: http://project-arda-dev.web.cern.ch/project-arda/dev/alice/apiservice//guide/ALICE%20Analysis%20User%20Guide%20V1.0.htm For single values: <tag-name> = “<value>”; <tag-name> = {“<value>”}; For a list: <tag-name> = { “<val1>”, “<val2>”, “<val3>” };
Basic jdl ‘tags’ Executable Argument Packages TTL This is the only mandatory field in the jdl. It gives the name of the lfn that will be executed. It has to be a file in either /bin or /<VO>/bin or /<HOME>/bin in the AliEn catalogue Argument These will be passed to the executable Packages This constrains the execution of the job to be done in a site where the package is installed. You can also require a specific version of a package. For example, you can put Packages = "AliRoot"; and it will require the current version of AliRoot, or Packages = "AliRoot::3.07.02"; TTL Here you specify the maximum run time for your job. The system takes care to run your job on a worker node which provides the requested run time for jobs.
Input tags InputFile InputData These files will be transported to the node where the job will be executed. Lfn’s have to be specified like “LF:/alice/cern.ch/mymacro.C”. InputData InputData is very similar to InputFile, but the physical location of InputData adds automatically an requirement to the place of execution to the JDL. The generated requirement is to execute the job in a site close to the files specified here.You can specify patterns, like "LF:/alice/sim/2003-02/V3.09.06/00143/*/tpc.tracks.root", and then all the LFN that satisfy this pattern will be included If you don’t want the files to be staged into the sandbox (typical for Analysis) as it is done also for input files, you can specify “LF:/alice/....../file.root,nodownload”. Use this tag only for few files (<100) – otherwise the JDL becomes very large and slows down the job processing.
More input tags InputDataCollection InputDataList InputDataListFormat An input data collection is used to specify long lists of input data files and allows to group corresponding files together. Use the find command to create an input data collection. You should use this mechanism, if you have many input files the submission is much faster it is better for the job optimizer services you don't need to specify the InputData field InputDataList this is the filename in which the Job Agent writes the InputData list. The format of this file is specified in InputDataListFormat. InputDataListFormat this is the list format of the InputData list. Possible formats are: xml-single means, that every file is equivalent to one event. xml-group a new event starts every time the first base filename appears again
output tags OutputFile OutputArchive The files that will be registered in the catalogue when the job finishes. You can specify the storage element by adding “@<SE-Name” to the file name. OutputArchive Here you can define, which output files are archived in ZIP archives. Per default AliEn puts all OutputFiles together in ONE archive. OutputArchive = { "root_archive.zip:*.root@Alice::CERN::Castor2", "log_archive:*.log,stdout,stderr@Alice::CERN::se01" };
Job splitting tags Split SplitArguments If you want to split your job in several sub jobs, you can define the method to split the jobs according to the input data (collection) or some production variables. Valid are: file: there will be one sub job per file in the InputData section directory: all the files of one directory will be analyzed in the same sub job. se: this should be used for most analysis. The jobs are split according to the list of storage elements where data are located. Job <1> reads all data at <SE1>, job <2> all data at <SE2>. You can however force to split on the second level the job <1> ,<2> … into several jobs using the two tags SplitMaxInputFileNumber and SplitMaxInputFileSize. event: all the files with the same name of the last subdirectory will be analyzed in the same sub job userdefined: Check the field SplitDefinition production:(<#start>-<#end>)): this kind of split does not require any InputData. It will submit the same JDL several times (from #start to #fend). You can reference this counter in SplitArguments using “#alien_counter#" SplitArguments Here you can define the arguments field for each sub job. If you want f.e. to give the sub jobs counter produced by the Split = “production:1-10” tag, you can write f.e. something like SplitArguments = "simrun.C --run 1 --event #alien_counter#"; If you define more than one value each sub job will be submitted as many times as there are items in this array, and the sub jobs will have the element in the array as arguments.
More Job splitting tags SplitMaxInputFileNumber Defines the maximum number of files that are in each of the subjobs. SplitMaxInputFileSize Similar to the previous one, but puts the limit in the size of the file. The size has to be given in bytes. SplitDefinitions This is a list of JDLs. If the user defines them, AliEn will take those jdls as the subjobs, and all of them would behave as if they were subjobs of the original job (for instance, if the original jobs gets killed, all of them will get killed, and once all of the subjobs finish, their output will be copied to the master job).
Your first jdl # This is an AliEn JDL! # runHelloWorld.jdl Requirements = member(other.GridPartitions,"INFNGRID"); # Requirements = other.CE=="Alice::Torino::LCG"; Executable = "/alice/cern.ch/user/s/sbagnasc/bin/helloworld.csh"; TTL = 720; OutputDir = "/alice/cern.ch/user/s/sbagnasc/GridSchool/run.$1/job.#alien_counter#"; OutputFile = { "result.txt", "stdout", "stderr" }; Split = "production:1-10"; SplitArguments = "#alien_counter#";
Your second jdl Executable = "/alice/cern.ch/user/s/sbagnasc/bin/myProduction.sh"; Arguments = "$1 #alien_counter#”; Split = "production:1-100"; Packages = { "VO_ALICE@AliRoot::v4-15-Rev-06", "VO_ALICE@GEANT3::v1-9-6", "VO_ALICE@ROOT::v5-21-01-alice", "VO_ALICE@APISCONFIG::V2.4" }; # Requirements = (other.CE=="Alice::Torino::LCG"); Validationcommand ="/alice/cern.ch/user/s/sbagnasc/bin/validation.sh"; InputFile = {"LF:/alice/cern.ch/user/s/sbagnasc/TorinoWorkshop/myConfig.C”}; OutputFile = { "AliESDs.root", "*.log", "stdout", "stderr" OutputDir = "/alice/cern.ch/user/s/sbagnasc/TorinoWorkshop/JobOutput/$1/#alien_counter_03i#";
Try it out! Check that you have the CSH and JDL files Change them to use your home dir instad of mine Submit runHelloworld.jdl Follow its execution Check that the result.txt files are there with ‘find’ Submit checkFile.jdl Play with the commands
Job submission!
Job state machine
Try it out! Check that you have the CSH and JDL files Change them to use your home dir instad of mine Submit runHelloworld.jdl Follow its execution Check that the result.txt files are there with ‘find’ Submit checkFile.jdl Play with the commands
Your second job #!/bin/tcsh -vx # checkFile.csh set job=`grep 'This is job' result.txt |cut -f 5 -d ' '` echo This is from checkFile.csh, job $job echo "date: `date`" echo "hostname : `hostname -f`" # Now the same on a file echo "date: `date`">>check.$job.txt echo "hostname : `hostname -f`">>check.$job.txt echo "Helloworld for job $job was run on:">>check.$job.txt grep date result.txt>>check.$job.txt
Your second JDL # This is an AliEn JDL! Executable = "/alice/cern.ch/user/s/sbagnasc/bin/checkFile.csh"; InputData = { "LF:/alice/cern.ch/user/s/sbagnasc/GridSchool/run.$1/*/result.txt" }; OutputDir = "/alice/cern.ch/user/s/sbagnasc/GridSchool/run.$1/output"; OutputFile = { "check.*.txt","stdout","stderr" Split = "file";
Checking the jobs
Checking the jobs
Checking the jobs Ps: Report process states from the AliEn TQ '-F {l}' - “l” = long (output format) '-f ' <flags/status>’ '-u <userlist>' '-s <sitelist>' '-m <masterjoblist>' '-o <sortkey>’ '-j <jobidlist>' '-l <query-limit>’ '-X' active jobs in extended format '-A' select all your owned jobs in any state '-W' select all YOUR jobs which are waiting for execution '-E' select all YOUR jobs which are in error state '-a' select jobs of ALL users 'jdl <jobid>’ display the job jdl of <jobid>. 'trace <jobid> [trace-tag[,trace-tag]]’ display the job trace information. “proc” resource information “state” job state changes “error” error statements “trace” job actions (downloads etc.) “all” output with all previous tags
Try it out! Check that you have the CSH and JDL files Change them to use your home dir instad of mine Submit runHelloworld.jdl Follow its execution Check that the result.txt files are there with ‘find’ Submit checkFile.jdl Play with the commands