Running a job on the grid is easier than you think!

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Workload Management meeting 07/10/2004 Federica Fanzago INFN Padova Grape for analysis M.Corvo, F.Fanzago, N.Smirnov INFN Padova.
The LEGO Train Framework
ATS Programming Short Course I INTRODUCTORY CONCEPTS Tuesday, Jan. 27 th, 2009 Essential Unix Commands.
Introduction to UNIX/Linux Exercises Dan Stanzione.
AliEn Tutorial MODEL th May, May 2009 Installation of the AliEn software AliEn and the GRID Authentication File Catalogue.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
ALICE Offline Tutorial Using the AliEn Grid Client GSI, 4 th Mar
AliEn uses bbFTP for the file transfers. Every FTD runs a server, and all the others FTD can connect and authenticate to it using certificates. bbFTP implements.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) WMPROXY API Python & C++ Diego Scardaci
Elisabetta Ronchieri - How To Use The UI command line - 10/29/01 - n° 1 How To Use The UI command line Elisabetta Ronchieri by WP1 elisabetta.ronchieri.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
110/10/06 - AliEn AliEn Tutorial Solutions Panos Christakoglou University of Athens - CERN.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Job Management DIRAC Project. Overview  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you have learned? KEK 10/2012DIRAC Tutorial.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
Working with AliEn Kilian Schwarz ALICE Group Meeting April
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
FESR Consorzio COMETA - Progetto PI2S2 WMS - Scripting techniques Fabio Scibilia INFN – Catania, Italy Tutorial per utenti e sviluppo.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
Changing Meta Data and WSDL OPEN Development Conference September 17-19, 2008 Ravi Rajaram IT Development Manager Max Lin Senior Systems Analyst.
Data Management The European DataGrid Project Team
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Porting an application to the EGEE Grid & Data management for Application Rachel Chen.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
AliEn2 and GSI batch farm/disks/tape Current status Kilian Schwarz.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES L. Betev, A. Grigoras, C. Grigoras, P. Saiz, S. Schreiner AliEn.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.
1 Lecture 7 Introduction to Shell Scripts COP 3353 Introduction to UNIX.
AliEn Tutorial ALICE workshop Sibiu 20 th August, 2008 Pablo Saiz.
Job Management Beijing, 13-15/11/2013. Overview Beijing, /11/2013 DIRAC Tutorial2  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you.
FESR Consorzio COMETA - Progetto PI2S2 Jobs with Input/Output data Fabio Scibilia, INFN - Catania, Italy Tutorial per utenti e.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
SEE-GRID-SCI WRF-ARW model: Grid usage The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures.
Development Environment
Installation of the ALICE Software
WMS - Tecniche di scripting
AliEn Tutorial Panos Christakoglou University of Athens - CERN
Lecture 7 Introduction to Shell Programming
Lecture 7 Introduction to Shell Scripts COP 3353 Introduction to UNIX.
L’analisi in LHCb Angelo Carbone INFN Bologna
Basic aliensh S. Bagnasco, INFN Torino CNAF Nov 27-28, 2007.
Running a job on the grid is easier than you think!
Report PROOF session ALICE Offline FAIR Grid Workshop #1
Work report Xianghu Zhao Nov 11, 2014.
Joint JRA1/JRA3/NA4 session
INFN-GRID Workshop Bari, October, 26, 2004
The Linux Operating System
Shell Script Assignment 1.
5. Job Submission Grid Computing.
Pablo Saiz CAF and Grid User Forum
Cookies BIS1523 – Lecture 23.
What is Bash Shell Scripting?
NextGen Trustee General Ledger Accounting
Dealing with images in a resume form.
Michael P. McCumber Task Force Meeting April 3, 2006
Linux Shell Script Programming
gLite Job Management Christos Theodosiou
Job Application Monitoring (JAM)
Introduction to Bash Programming, part 3
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

Running a job on the grid is easier than you think! S. Bagnasco, INFN Torino Torino Jan 15-16, 2009

I lied Steps: Pretend everything works Learn the AliEn JDL syntax and semantics They’re subtly different from LCG ones… Write a master JDL file for a simple production job We’ll skip data management and collections for now Submit the job! Learn some aliensh commands to check its status Ps, top, spy, masterjob, registerOutput,… Try using MonaLisa to inspect the Task Queue Tomorrow, debug why it did not work…

Run a “production” like Latchezar does! Your first job Run a “production” like Latchezar does! Submit a masterjob that gets split and run wherever the system deems appropriate Will not use AliRoot for simplicity… Each subjob saves its output locally to wherever it ran

Your first job #!/usr/bin/env bash echo ******* Run: $2 Event: $1 aliroot -b <<EOF AliSimulation sim("myConfig.C"); sim.SetDefaultStorage("alien://Folder=/alice/simulation/2008/v4-15-Release/Ideal/"); sim.Run(); .q EOF ls -l AliReconstruction rec; rec.SetDefaultStorage("alien://Folder=/alice/simulation/2008/v4-15-Release/Ideal/"); rec.Run();

Your first jdl # This is an AliEn JDL! # runHelloWorld.jdl # Requirements = other.CE=="Alice::Torino::LCG"; Executable = "/alice/cern.ch/user/s/sbagnasc/bin/helloworld.csh"; TTL = 720; OutputDir = "/alice/cern.ch/user/s/sbagnasc/GridSchool/run.$1/job.#alien_counter#"; OutputFile = { "result.txt", "stdout", "stderr" }; Split = "production:1-100"; SplitArguments = "#alien_counter#";

Jdl syntax For single values: For a list: Reference: http://project-arda-dev.web.cern.ch/project-arda/dev/alice/apiservice//guide/ALICE%20Analysis%20User%20Guide%20V1.0.htm For single values: <tag-name> = “<value>”; <tag-name> = {“<value>”}; For a list: <tag-name> = { “<val1>”, “<val2>”, “<val3>” };

Basic jdl ‘tags’ Executable Argument Packages TTL This is the only mandatory field in the jdl. It gives the name of the lfn that will be executed. It has to be a file in either /bin or /<VO>/bin or /<HOME>/bin in the AliEn catalogue Argument These will be passed to the executable Packages This constrains the execution of the job to be done in a site where the package is installed. You can also require a specific version of a package. For example, you can put Packages = "AliRoot"; and it will require the current version of AliRoot, or Packages = "AliRoot::3.07.02"; TTL Here you specify the maximum run time for your job. The system takes care to run your job on a worker node which provides the requested run time for jobs.

Input tags InputFile InputData These files will be transported to the node where the job will be executed. Lfn’s have to be specified like “LF:/alice/cern.ch/mymacro.C”. InputData InputData is very similar to InputFile, but the physical location of InputData adds automatically an requirement to the place of execution to the JDL. The generated requirement is to execute the job in a site close to the files specified here.You can specify patterns, like "LF:/alice/sim/2003-02/V3.09.06/00143/*/tpc.tracks.root", and then all the LFN that satisfy this pattern will be included If you don’t want the files to be staged into the sandbox (typical for Analysis) as it is done also for input files, you can specify “LF:/alice/....../file.root,nodownload”. Use this tag only for few files (<100) – otherwise the JDL becomes very large and slows down the job processing.

More input tags InputDataCollection InputDataList InputDataListFormat An input data collection is used to specify long lists of input data files and allows to group corresponding files together. Use the find command to create an input data collection. You should use this mechanism, if you have many input files the submission is much faster it is better for the job optimizer services you don't need to specify the InputData field InputDataList this is the filename in which the Job Agent writes the InputData list. The format of this file is specified in InputDataListFormat. InputDataListFormat this is the list format of the InputData list. Possible formats are: xml-single means, that every file is equivalent to one event. xml-group a new event starts every time the first base filename appears again

output tags OutputFile OutputArchive The files that will be registered in the catalogue when the job finishes. You can specify the storage element by adding “@<SE-Name” to the file name. OutputArchive Here you can define, which output files are archived in ZIP archives. Per default AliEn puts all OutputFiles together in ONE archive. OutputArchive = { "root_archive.zip:*.root@Alice::CERN::Castor2", "log_archive:*.log,stdout,stderr@Alice::CERN::se01" };

Job splitting tags Split SplitArguments If you want to split your job in several sub jobs, you can define the method to split the jobs according to the input data (collection) or some production variables. Valid are: file: there will be one sub job per file in the InputData section directory: all the files of one directory will be analyzed in the same sub job. se: this should be used for most analysis. The jobs are split according to the list of storage elements where data are located. Job <1> reads all data at <SE1>, job <2> all data at <SE2>. You can however force to split on the second level the job <1> ,<2> … into several jobs using the two tags SplitMaxInputFileNumber and SplitMaxInputFileSize. event: all the files with the same name of the last subdirectory will be analyzed in the same sub job userdefined: Check the field SplitDefinition production:(<#start>-<#end>)): this kind of split does not require any InputData. It will submit the same JDL several times (from #start to #fend). You can reference this counter in SplitArguments using “#alien_counter#" SplitArguments Here you can define the arguments field for each sub job. If you want f.e. to give the sub jobs counter produced by the Split = “production:1-10” tag, you can write f.e. something like SplitArguments = "simrun.C --run 1 --event #alien_counter#"; If you define more than one value each sub job will be submitted as many times as there are items in this array, and the sub jobs will have the element in the array as arguments.

More Job splitting tags SplitMaxInputFileNumber Defines the maximum number of files that are in each of the subjobs. SplitMaxInputFileSize Similar to the previous one, but puts the limit in the size of the file. The size has to be given in bytes. SplitDefinitions This is a list of JDLs. If the user defines them, AliEn will take those jdls as the subjobs, and all of them would behave as if they were subjobs of the original job (for instance, if the original jobs gets killed, all of them will get killed, and once all of the subjobs finish, their output will be copied to the master job).

Your first jdl # This is an AliEn JDL! # runHelloWorld.jdl Requirements = member(other.GridPartitions,"INFNGRID"); # Requirements = other.CE=="Alice::Torino::LCG"; Executable = "/alice/cern.ch/user/s/sbagnasc/bin/helloworld.csh"; TTL = 720; OutputDir = "/alice/cern.ch/user/s/sbagnasc/GridSchool/run.$1/job.#alien_counter#"; OutputFile = { "result.txt", "stdout", "stderr" }; Split = "production:1-10"; SplitArguments = "#alien_counter#";

Your second jdl Executable = "/alice/cern.ch/user/s/sbagnasc/bin/myProduction.sh"; Arguments = "$1 #alien_counter#”; Split = "production:1-100"; Packages = { "VO_ALICE@AliRoot::v4-15-Rev-06", "VO_ALICE@GEANT3::v1-9-6", "VO_ALICE@ROOT::v5-21-01-alice", "VO_ALICE@APISCONFIG::V2.4" }; # Requirements = (other.CE=="Alice::Torino::LCG"); Validationcommand ="/alice/cern.ch/user/s/sbagnasc/bin/validation.sh"; InputFile = {"LF:/alice/cern.ch/user/s/sbagnasc/TorinoWorkshop/myConfig.C”}; OutputFile = { "AliESDs.root", "*.log", "stdout", "stderr" OutputDir = "/alice/cern.ch/user/s/sbagnasc/TorinoWorkshop/JobOutput/$1/#alien_counter_03i#";

Try it out! Check that you have the CSH and JDL files Change them to use your home dir instad of mine Submit runHelloworld.jdl Follow its execution Check that the result.txt files are there with ‘find’ Submit checkFile.jdl Play with the commands

Job submission!

Job state machine

Try it out! Check that you have the CSH and JDL files Change them to use your home dir instad of mine Submit runHelloworld.jdl Follow its execution Check that the result.txt files are there with ‘find’ Submit checkFile.jdl Play with the commands

Your second job #!/bin/tcsh -vx # checkFile.csh set job=`grep 'This is job' result.txt |cut -f 5 -d ' '` echo This is from checkFile.csh, job $job echo "date: `date`" echo "hostname : `hostname -f`" # Now the same on a file echo "date: `date`">>check.$job.txt echo "hostname : `hostname -f`">>check.$job.txt echo "Helloworld for job $job was run on:">>check.$job.txt grep date result.txt>>check.$job.txt

Your second JDL # This is an AliEn JDL! Executable = "/alice/cern.ch/user/s/sbagnasc/bin/checkFile.csh"; InputData = { "LF:/alice/cern.ch/user/s/sbagnasc/GridSchool/run.$1/*/result.txt" }; OutputDir = "/alice/cern.ch/user/s/sbagnasc/GridSchool/run.$1/output"; OutputFile = { "check.*.txt","stdout","stderr" Split = "file";

Checking the jobs

Checking the jobs

Checking the jobs Ps: Report process states from the AliEn TQ '-F {l}' - “l” = long (output format) '-f ' <flags/status>’ '-u <userlist>' '-s <sitelist>' '-m <masterjoblist>' '-o <sortkey>’ '-j <jobidlist>' '-l <query-limit>’ '-X' active jobs in extended format '-A' select all your owned jobs in any state '-W' select all YOUR jobs which are waiting for execution '-E' select all YOUR jobs which are in error state '-a' select jobs of ALL users 'jdl <jobid>’ display the job jdl of <jobid>. 'trace <jobid> [trace-tag[,trace-tag]]’ display the job trace information. “proc” resource information “state” job state changes “error” error statements “trace” job actions (downloads etc.) “all” output with all previous tags

Try it out! Check that you have the CSH and JDL files Change them to use your home dir instad of mine Submit runHelloworld.jdl Follow its execution Check that the result.txt files are there with ‘find’ Submit checkFile.jdl Play with the commands