Long term job submission and monitoring uing grid services

Slides:



Advertisements
Similar presentations
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
Advertisements

SARA Reken- en NetwerkdienstenToPoS | 3 juni 2007 More efficient job submission Evert Lammerts SARA Computing and Networking Services High Performance.
12th EELA Tutorial, Lima, FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America.
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
Riccardo Bruno, INFN.CT Sevilla, 10-14/09/2007 GENIUS Exercises.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Tutorial Getting started with GILDA.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
GLite authentication and authorization Discipline: Grid Computing, 07/08-2 Practical classes Inês Dutra, DCC/FCUP.
FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio COMETA
Enabling Grids for E-sciencE gLite training at Sinaia '06 Victor Penso Kilian Schwarz GSI Darmstadt Germany.
INFSO-RI Enabling Grids for E-sciencE Practicals on VOMS and MyProxy Emidio Giorgio INFN Retreat between GILDA and ESR VO, Bratislava,
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Luciano Díaz ICN-UNAM Based on Domenico.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
E-science grid facility for Europe and Latin America Watchdog: A job monitoring solution inside the EELA-2 Infrastructure Riccardo Bruno,
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
1 HeMoLab - Porting HeMoLab's SolverGP to EELA glite Grid Environment FINAL REPORT Ramon Gomes Costa - Paulo Ziemer.
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) GISELA Additional Services Diego Scardaci
Group 1 : Grid Computing Laboratory of Information Technology Supervisors: Alexander Ujhinsky Nikolay Kutovskiy.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America MyProxy server installation Emidio Giorgio.
E-science grid facility for Europe and Latin America Using Secure Storage Service inside the EELA-2 Infrastructure Diego Scardaci INFN (Italy)
E-infrastructure shared between Europe and Latin America Security Hands-on Christian Grunfeld, UNLP 8th EELA Tutorial, La Plata, 11/12-12/12,2006.
INFSO-RI Enabling Grids for E-sciencE GILDA Practicals : Security systems GILDA Tutors Singapore, 1st South East Asia Forum -- EGEE.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA Hands-on on security Pedro Rausch IF - UFRJ.
EGEE is a project funded by the European Union under contract IST Grid proxy and MyProxy Roberto Barbera Univ. of Catania and INFN SEE-GRID.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
4th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America Security Hands-on Vanessa.
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Practicals on Security Miguel Cárdenas Montes.
E-infrastructure shared between Europe and Latin America Security Hands-on Alexandre Duarte CERN Fifth EELA Tutorial Santiago, 06/09-07/09,2006.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Moisés Hernández Duarte UNAM FES Cuautitlán.
Further aspects of EGEE middleware components INFN, Catania EGEE is funded by the European Union under contract IST
INFSO-RI Enabling Grids for E-sciencE VOMS & MyProxy interaction Emidio Giorgio INFN NA4 Generic Applications Meeting 10 January.
Enabling Grids for E-sciencE Sofia, 17 March 2009 INFSO-RI Introduction to Grid Computing, EGEE and Bulgarian Grid Initiatives –
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMS tricks & tips – further scripting Giuseppe.
User Interface UI TP: UI User Interface installation & configuration.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
Hands-on security Carlos Fuentes RedIRIS Madrid,26 – 30 de Octubre de 2008.
Hands on Security, Authentication and Authorization Virginia Martín-Rubio Pascual RedIRIS/Red.es Curso Grid y e-Ciencia.
EGI-InSPIRE RI Grid Training for Power Users EGI-InSPIRE N G I A E G I S Grid Training for Power Users Institute of Physics Belgrade.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) 马兰馨 IHEP, CAS Hands on gLite Security.
Enabling Grids for E-sciencE gLite security pratical tutorial Dario Russo INFN Catania Catania,
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
FESR Consorzio COMETA - Progetto PI2S2 FEMM Riccardo Bruno, INFN CT Sindoni Salvatore, DIEES UniCT.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
FESR Consorzio COMETA - Progetto PI2S2 Jobs with Input/Output data Fabio Scibilia, INFN - Catania, Italy Tutorial per utenti e.
FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio Cometa
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
EGEE is a project funded by the European Union under contract IST Job Submission Giuseppe La Rocca EGEE NA4 Generic Applications INFN Catania.
EGEE is a project funded by the European Union under contract IST Grid proxy and MyProxy Giuseppe La Rocca EGEE NA4 Generic Applications GENIUS/GILDA.
Grid2Win Porting of gLite middleware to Windows XP platform
How to connect your DG to EDGeS? Zoltán Farkas, MTA SZTAKI
MyProxy Server Installation
Special jobs with the gLite WMS
Practicals on VOMS and MyProxy
gLite 1.4. Data Mangement Exercises
Corso di Calcolo Parallelo Grid Computing
FEMM Riccardo Bruno INFN CT.
login: clermont-ferrandxx password: GridCLExx
gLite Advanced Job Management
Grid2Win: Porting of gLite middleware to Windows XP platform
Certificates Usage and Simple Job Submission
Certificates Usage and Simple Job Submission
Certificates Usage and Simple Job Submission
gLite Job Management Christos Theodosiou
GENIUS Grid portal Hands on
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

Long term job submission and monitoring uing grid services Riccardo Bruno INFN, Sez. CT 23/07/2007 Meeting sull'uso di applicazioni parallele in PI2S2

Outline Long term job submission Long term job monitoring References MyProxyServer Renewal The renewal process and JDL tag Long term job monitoring Middleware tools How to do monitoring efficiently The Watchdog Watchdog use example The main script The watchdog flow The main script code Some outputs The future … References Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

Long term job submission Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

MyProxyServer Proxy has limited lifetime (default is 12 h) • Bad idea to have longer proxy myproxy server: • myproxy-init –voms <voname> -s <host_name> – Allows to create and store a long term proxy certificate: -s: <host_name> specifies the hostname of the myproxyserver • myproxy-info – Get information about stored long living proxy • myproxy-get-delegation – Get a new proxy from the MyProxy server • myproxy-destroy – Removes the stored proxy from the server Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

Renewal • A dedicated service on the RB can renew automatically the proxy: [edg-wl-renewd] - /etc/init.d/edg-wl-proxyrenewal • Some dedicated flags are required during the creation of the long term proxy credential with myproxy-init: – -d : Use the proxy certificate subject (DN) as the default username, instead of the LOGNAME env. var. – -n : Don't prompt for passphrase bash-2.05b$ myproxy-init –voms cometa -d -n Your identity: /C=IT/O=GILDA/L=INFN Catania/CN=Riccardo Bruno/ Email=riccardo.bruno@ct.infn.it Enter GRID pass phrase for this identity: Creating proxy ......................................... Done Proxy Verify OK Your proxy is valid until: Fri Jul 23 09:30:33 2007 A proxy valid for 168 hours (7.0 days) for user /C=IT/O=GILDA/L=INFN Catania/ CN=Riccardo Bruno/Email=riccardo.bruno@ct.infn.it now exists on grid001.ct.infn.it. Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

The renewal process and JDL tag 5 or 10 minutes before the proxy expires the RB proxy renewal daemon will perform the following steps: Contacts the MyProxyServer indicated into the JDL and asks for a new delegation contacts the VOMS server to add the ACs transfers the new VOMS-enabled proxy to the WNs running the job. An additional attribute has to be added to the JDL MyProxyServer = "grid001.ct.infn.it"; The item informs the RB which MyProxyServer has to be contacted to renew the credentials. Otherwise a default one is taken from UI VO configuration settings: glite_wmsui.conf Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

Long term job submission Create the long term proxy on the MyProxy server myproxy-init --voms cometa -d –n Create a new proxy or get the delegation from MyProxy server voms-proxy-init –voms cometa myproxy-get-delegation –d -a $X509_USER_PROXY (Please notice you must have already a valid proxy on the UI) Submit the job normaly edg-job-submit -o jid testmyproxy.jdl bash-2.05b$ myproxy-init –voms cometa -d -n Your identity: /C=IT/O=GILDA/L=INFN Catania/CN=Riccardo Bruno/ Email=riccardo.bruno@ct.infn.it Enter GRID pass phrase for this identity: Creating proxy ......................................... Done Proxy Verify OK Your proxy is valid until: Fri Jul 23 09:30:33 2007 A proxy valid for 168 hours (7.0 days) for user /C=IT/O=GILDA/L=INFN Catania/ CN=Riccardo Bruno/Email=riccardo.bruno@ct.infn.it now exists on grid001.ct.infn.it. Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

Renewal feedback Starting at: 20070720124320 subject : /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=limited proxy … type : limited proxy strength : 512 bits path : /tmp/globus-tmp.unime-wn-03.27834.0 timeleft : 0:56:58 === VO cometa extension information === VO : cometa subject : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Riccardo Bruno issuer : /C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it attribute : /cometa/Role=NULL/Capability=NULL timeleft : 11:56:01 … Other output from job’ core execution (just sleep execution) subject : /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=proxy/CN=limited proxy timeleft : 8:45:18 timeleft : 10:26:00 Ending at: 20070720141321. This job has been executed with a delegated proxy 1 hr long (myproxy-get-delegation -d -t 1:00 -a $X509_USER_PROXY) The 1° call to voms-proxy-info returns 0:56:58 as time left After the job core execution the 2° call to voms-proxy-info gives 8:45:18 as time left Please notice also the different subjects: /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=limited proxy /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=proxy/CN=limited proxy Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

Long term jobs monitoring Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

Middleware tools Currently gLite offers the following services allowing to monitor the job execution Interactive Jobs or direct use of X server communication via SSH tunneling User forced to use interactive JDL Keep open the X client for the whole job duration Use of RGMA The use of dedicated producers need to apply code changes not ever possible. Code changes are error prone and need to be tested Use of AMGA The use of AMGA APIs requires code changes as well Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

How to do monitoring efficiently IDEA: Perform the job monitoring using still grid services in the less possible invasive way. Observations: Almost all jobs submitted on the grid are piloted by shell scripts Shell scripting allow to get precious info in case of faults Shell scripting can pilot more complex batch processing Both SE and file catalog can be used as the simplest IS on the grid. lfc-* and lcg-* tools already available for file creation and retrieve The latency of CLI tools for the storage is very low compared to long term jobs Requirements: It would be useful to configure the monitoring tool accordingly to the user needs Few shell environment variables can be used to configure the monitoring tool Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

The Watchdog The Watchdog is a shell script to be included in the main script. Some watchdog features: It starts in background before to run the long term job The watchdog runs as long as the main job The main script can stop and wait until the watchdog has finished Easily and highly configurable The watchdog does not compromise the CPU power of the WN The watchdog is really simple and its behavior can be extended by the user The best way to explain the watchdog is to make an use example … Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

Watchdog use example The simplest use case foresees the following: The JDL: script.jdl The main script file: script.sh The watchdog script file: watchdog.sh script.jdl Type = "Job"; JobType = "Normal"; Executable = "/bin/bash"; StdOutput = "file.out"; StdError = "file.err"; InputSandbox = {"watchdog.sh", "script.sh"}; OutputSandbox = {"file.out", "file.err", "watchdog.out"}; Arguments = "script.sh"; InputSandbox file.out script.sh file.err watchdog.sh watchdog.out OutputSandbox Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

The main script It is a good practice to have a main script like the following structure: Get information about the WN Start the watchdog Stop the watchdog Execute and control the main job Collect information about the job execution Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

The watchdog flow Initialization File Catalog/SE USERPATH/JobId Enter the loop For each file in the list Take a snapshoot (just increments will be copied) <timestamp>_<file_1> <timestamp>_<file_2> … <timestamp>_<file_n> VO USERPATH FILE Catalog SE DELAY LIST OF FILES CTLR File exsists Create notification file CTRL file NTFY file Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

The main script code # # watchdog – Riccardo Bruno 200707 echo "Starting at: “\ $(date +%y%m%d%H%M%S) HOSTNAME=$(hostname -f) USER=$(whoami) ARG1=$1 LOCALDIR=$(pwd) echo "*****************************" echo "HOST: "$HOSTNAME echo "USER: "$USER echo "ARGS: "$ARG1 echo "LOCALDIR is: "$LOCALDIR echo "HOMEDIR is:"$HOME echo "Content of home:" ls -l $HOME echo "Content of current dir:" ls -l . echo "******************************" #start the watchdog chmod +x watchdog.sh ./watchdog.sh > watchdog.out & # perform 8 iterations, 15 seconds each # 2 minutes for i in $(seq 1 8) do echo "This is mine output at: “\ $(date +%y%m%d%H%M%S) echo "This is mine error at: “\ $(date +%y%m%d%H%M%S) 1>&2 sleep 15 done #stop and wait the dog rm -f watchdog.ctrl while [ ! -e watchdog.done ] sleep 1 echo "Waiting for watchdog: “\ echo "Watchdog closed" echo "done" echo "done" 1>&2 echo "Ending at: "$(date +%y%m%d%H%M%S) Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

Some outputs [brunor@glite-tutor tmp]$ lfc-ls -l /grid/gilda/brunor/2DFfQYycd5guISZSU3ZdOQ -rw-rw-r-- 1 1023 102 2211 Jul 18 16:13 070718161318_testmyproxy.out -rw-rw-r-- 1 1023 102 85 Jul 18 16:14 070718161347_testmyproxy.err … [brunor@glite-tutor brunor_2DFfQYycd5guISZSU3ZdOQ]$ cat file.out Starting at: 070713155443 **************************************** <WN INFO …> This is my output at: 070713155443 This is my output at: 070713155633 done Ending at: 070713155643 [brunor@glite-tutor brunor_2DFfQYycd5guISZSU3ZdOQ]$ cat file.err This is my error at: 070713155443 [brunor@glite-tutor brunor_2DFfQYycd5guISZSU3ZdOQ]$ cat watchdog.out Starting watchdog at: 070713155443 guid:205a2902-89e0-4c68-b963-2facf30efb6f guid:a21f30b4-46cf-4e63-919b-ceb911bfe710 Ending watchdog at: 070713155443 Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

The future … The watchdog can be easily improved Use a special folder in the catalog to be used as a virtual UI on the WN allowing the user to issue shell commands: WD_USER_PATH/<JobId>/ <timestamp>_file_1 <timestamp>_file_2 … <timestamp>_file_n UI/ commands <timestamp>_cmdresult_1 Use of AMGA/RGMA CLI tools instead of the catalog Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

References The watchdog wiki https://grid.ct.infn.it/twiki/bin/view/PI2S2/WatchdogUtility Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007

Questions… Catania, Meeting sull'uso di applicazioni parallele in PI2S2 , 23.07.2007