Www.eu-eela.eu E-science grid facility for Europe and Latin America Watchdog: A job monitoring solution inside the EELA-2 Infrastructure Riccardo Bruno,

Slides:



Advertisements
Similar presentations
E-science grid facility for Europe and Latin America E2GRIS1 Jaime Parada, Edgar Perdomo – UCV Itacuruça (Brazil), 2-15 November 2008 CATIVIC.
Advertisements

FP62004Infrastructures6-SSA E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
SARA Reken- en NetwerkdienstenToPoS | 3 juni 2007 More efficient job submission Evert Lammerts SARA Computing and Networking Services High Performance.
JRA1 – Application and Infrastructure Grid Services Francisco Brasileiro Universidade Federal de Campina Grande – UFCG (Brazil) Diego Scardaci.
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
E-science grid facility for Europe and Latin America A Data Access Policy based on VOMS attributes in the Secure Storage Service Diego Scardaci.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio COMETA
INFSO-RI Enabling Grids for E-sciencE Gilda experiences and tools in porting application Giuseppe La Rocca INFN – Catania ICTP/INFM-Democritos.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Pilot Test-bed Operations and Support Work.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Luciano Díaz ICN-UNAM Based on Domenico.
EGEE is a project funded by the European Union under contract IST Input from Generic and Testing Roberto Barbera NA4 Generic Applications Coordinator.
E-science grid facility for Europe and Latin America gRREEMM Status Report-3 Nov 13, 2008 E2GRIS1 Alina Roig Rassi Maikel Dominguez Garcia.
Grid Initiatives for e-Science virtual communities in Europe and Latin America DIRAC TEAM CPPM – CNRS DIRAC Grid Middleware.
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals GILDA Tutors INFN Catania ICTP/INFM-Democritos Workshop on Porting Scientific.
GRID Computing: Ifrastructure, Development and Usage in Bulgaria M. Dechev, G. Petrov, E. Atanassov.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
1 HeMoLab - Porting HeMoLab's SolverGP to EELA glite Grid Environment FINAL REPORT Ramon Gomes Costa - Paulo Ziemer.
E-science grid facility for Europe and Latin America Developing e-Infrastructure services for e-Science applications: the EELA-2 experience.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
E-science grid facility for Europe and Latin America Marcelo Risk y Juan Francisco García Eijó Laboratorio de Sistemas Complejos Departamento.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) GISELA Additional Services Diego Scardaci
Group 1 : Grid Computing Laboratory of Information Technology Supervisors: Alexander Ujhinsky Nikolay Kutovskiy.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
E-science grid facility for Europe and Latin America E2GRIS1 Claudio Baeza Retamal and Rodrigo Delgado Urzúa SAEMC Project (
E-science grid facility for Europe and Latin America E2GRIS1 Alina Roig Rassi Maikel Dominguez Garcia CUBAENERGIA Itacuruça (Brazil), 2-15.
E-science grid facility for Europe and Latin America E2GRIS1 Gustavo Miranda Teixeira Ricardo Silva Campos Laboratório de Fisiologia Computacional.
E-science grid facility for Europe and Latin America Using Secure Storage Service inside the EELA-2 Infrastructure Diego Scardaci INFN (Italy)
E-science grid facility for Europe and Latin America gLite MPI Tutorial for Grid School Daniel Alberto Burbano Sefair, Universidad de Los.
E-science grid facility for Europe and Latin America GRIP - Grid Image Processing for Biomedical Diagnosis SECOND EELA-2 GRID SCHOOL Querétaro,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
E-science grid facility for Europe and Latin America GridwWin: porting gLite to run under Windows Fabio Scibilia – Consorzio COMETA 30/06/2008.
Job Management DIRAC Project. Overview  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you have learned? KEK 10/2012DIRAC Tutorial.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Worker Node installation & configuration.
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
E-science grid facility for Europe and Latin America MAVs-Study Biologically Inspired, Super Maneuverable, Flapping Wing Micro-Air-Vehicles.
1 Grid2Win: porting of gLite middleware to Windows Dario Russo INFN Catania
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid2Win: Porting of gLite middleware to.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA Special Jobs Valeria Ardizzone INFN - Catania.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Porting an application to the EGEE Grid & Data management for Application Rachel Chen.
User Interface UI TP: UI User Interface installation & configuration.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The GILDA t-Infrastructure Roberto Barbera.
12th EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
Presentation of the results khiat abdelhamid
II EGEE conference Den Haag November, ROC-CIC status in Italy
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI MPI on the grid:
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
E-science grid facility for Europe and Latin America gRREEMM Report-1 Nov 7, 2008 E2GRIS1 Alina Roig Rassi Maikel Dominguez Garcia CUBAENERGIA.
Advanced Topics: MPI jobs
Grid2Win: Porting of gLite middleware to Windows XP platform
Long term job submission and monitoring uing grid services
gLite Job Management Christos Theodosiou
GENIUS Grid portal Hands on
Presentation transcript:

E-science grid facility for Europe and Latin America Watchdog: A job monitoring solution inside the EELA-2 Infrastructure Riccardo Bruno, Roberto Barbera, Elisa Ingrà INFN Sez. Catania (Italy) 2nd EELA-2 Conference Choroni (Venezuela),

Job Monitoring in gLite Before gLite v3.1 no job monitoring systems were available Jobs running into the WNs are considered as Black Boxes No prompted job status retrieval (Done/Abort/…) Output Sandbox available only after WMS recognize job completion This situation was not good for jobs requesting very long computational time. 2 Choroni (Venezuela), 2 nd EELA-2 Conference, Jobs WMS CE WNs WN ? Output SandBox

Analysis Need –Get in touch with the jobs running into the WN (especially for long term jobs) monitoring and controlling their execution. How –Perform job control and monitoring using grid services in the less invasive way for the application. Observations –Almost all Grid jobs are piloted by a main shell script:  Get precious info in case of faults  Pilot complex batch workflows –Both AMGA and SE+LFC can be used as a basic Grid Info System  lfc-* and lcg-* tools already available for Grid file management  mdcli AMGA command can be used by jobs on the WNs  cp command in case of shared file system on the WN  The latency of CLI tools is very low compared to long term jobs 3 Choroni (Venezuela), 2 nd EELA-2 Conference,

Requirements Monitor job execution timely watching files produced by the job while it executes on the WN –File snapshots will be reported on LFC+SE, AMGA servers or mounted shared FSs It would be useful to configure the monitoring tool accordingly to the user needs –The monitoring tool will consist only of bash script files –Few shell environment variables can be used to configure the monitoring behavior Control the job execution accessing directly on the WN –It is possible to send user commands on the WN –It is possible to change the monitoring while the Grid job runs 4 Choroni (Venezuela), 2 nd EELA-2 Conference,

The Watchdog The Watchdog consists of set of shell scripts to be included in the JDL InputSandbox and then called by the pilot script. Watchdog features: –It starts in background before to run the Grid job –The watchdog runs as long as the main job –The monitoring process can be piloted until the pilot script has not finished –Easily configurable and customizable –The watchdog does not compromise the CPU power of the WN –The watchdog can be used with MPI jobs –Files may be fully or partially reported (only last changes) 5 Choroni (Venezuela), 2 nd EELA-2 Conference,

WD Main Components watchdog.sh –The WD core main script, it is the responsible of the job monitoring file snapshot reporting and user command execution watchdog.ctrl –This script controls the execution of the WD core script; it can: start, stop, pause and resume the WD. It can be also used to: alter the time interval add/remove files to watch and change reporting strategy (full/partial) watchdog.conf –This script contains all environment variables needed to configure the WD – The use of AMGA reporting requires more files Choroni (Venezuela), 2 nd EELA-2 Conference,

WD Additional Components getinfo.sh / setinfo.sh getcontent.sh / setcontent.sh (AMGA) –Utilities to set/get WD reported information from/to AMGA metadata catalog uuencode / uudecode (shareutils) (AMGA) –Executables needed by WD to encode binaries and multiline text content into the AMGA metadata catalog in Base64 text format. –In EELA-2 (prod VO) available into:  $VO_PROD_VO_EU_EELA_EU_SW_DIR wdcli –CLI application to let the user interact with the WD Choroni (Venezuela), 2 nd EELA-2 Conference,

WD Usage Choroni (Venezuela), 2 nd EELA-2 Conference, Configure the Watchdog setting the watchdog.conf file 2.Applications using Watchdog MUST include the files: watchdog.sh, watchdog.ctrl, watchdog.conf, uuencode,uudecode (in case of AMGA reporting) or configure the PATH VO_PROD_VO_EU_EELA_EU_SW_DIR in the WN 3.Call the watchdog.ctrl into the pilot script Type = "Job"; JobType = "Normal"; Executable = "/bin/bash"; StdOutput = "file.out"; StdError = "file.err"; InputSandbox = {"watchdog.sh", "watchdog.ctrl", "watchdog.conf", "uuencode", "uudecode", "AppPilotScript.sh"}; OutputSandbox = {"MyApp.out","MyApp.err", "watchdog.log”,"watchdog.err"}; Arguments = "AppPilotScript.sh"; App JDL #!/bin/sh … # prepare and start the watchdog PATH=${VO_PROD_VO_EU_EELA_EU_SW_DIR}\ /:${PATH}:. chmod +x watchdog.*./watchdog.ctrl start #run application … # Use the./watchdog.ctrl # to control the WD anytime #stop and wait the watchdog completes./watchdog.ctrl stop AppPiloyScript.sh

WD Interaction Choroni (Venezuela), 2 nd EELA-2 Conference, /6-tPC2d2knO7m6GP2XC7-Q _watchdog/ _wdcli_cmd1.cmd _wdcli_cmd1.err _wdcli_cmd1.out _wdcli_cmd7.cmd _wdcli_cmd7.err _wdcli_cmd7.out WDEND or WDPID WDENV WDHST cmdlist/ wdcli_cmd _13156_file.err _13156_file.out _13156_watchdog.err … _13156_watchdog.log 6-tPC2d2knO7m6GP2XC7-Q Flags WD Control DIR watchdog.conf WD CMD Exe DIR OUT ERR CMD watchdog.sh WN File snapshots LFC/AMGA Mounted Sh FS

wdcli CLI to ease the WD user interaction – wd> Uses the watchdog.conf file to get user configuration Principal commands: – set Set MODE (LFC/AMGA/mounted Shared FS) – show jobs Get list of monitored jobs –Attach to a monitored job – show snapshots Get the list of file snapshots –View the snapshot content –Get generic info: ENV,PID,CE,WN,Proxy … – exec Execute a given command  Interactive commands are not allowed  It is possible to call the watchdog.ctrl command (use –n opt!) Choroni (Venezuela), 2 nd EELA-2 Conference,

WD in EELA-2 Presented 1 st time in E2GRIS1 at Itacuruca (Brazil) –G-HMMER/G-InterProScan  Bioinformatic – Get semi-real time info to be published on the WEB –CrossFire  Civil Protection – Get semi-real time info to view the simulation output Presented the 2 nd time in E2GRIS2 at Qeretaro (Mexico) –HeMoLab  Bioinformatic – Long run jobs, check output files while running –AeroVANT  Engineering – Long run jobs, get data while running –BioMD  Bioinformatic – Long run job, monitor the simulation –Seismic Sensors (planned to)  Earth Science – Monitor the job execution Cinefilia  Recommender Systems – Monitor the computation Choroni (Venezuela), 2 nd EELA-2 Conference,

Conclusions WD mainly used for: –Job monitoring (Long run) –Check/Get job produced data WD used as: –As a Debugging helper tool –As an application component (CrossFire) WD easy to integrate but needs a precise configuration –EELA-2 has 2 different AMGA server using different access rights (EU and LA) –EELA-2 does not have shareutils ( uuencode/uudecode ) package installed on the WNs. These tools available under WN path: VO_PROD_VO_EU_EELA_EU_SW_DIR or put ‘ uu**code ’ commands in the InputSandbox –EELA-2 several WNs were using a different BDII, some users were unable to retrieve easily the snapshot content (LFC) Choroni (Venezuela), 2 nd EELA-2 Conference,

Future Improve the User Interaction –Improve wdcli (due to the good success in E2GRIS2) –Create tools to easily create web based front ends –Provide tools to reconstruct a file monitored incrementally Ease the application integration (AMGA) – uuencode / uudecode independent –provide watchdog.conf file templates for VOs Improve the Monitoring –Provide independent time watching cycles for each file –Provide a sandboxing mechanism for file I/O from/to WN Choroni (Venezuela), 2 nd EELA-2 Conference,

Questions?