MPI probes OMB Meeting 26th February 2013

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
MPI support in gLite Enol Fernández CSIC. EMI INFSO-RI CREAM/WMS MPI-Start MPI on the Grid Submission/Allocation – Definition of job characteristics.
Large Scale File Distribution Troy Raeder & Tanya Peters.
Testing PanDA at ORNL Danila Oleynik University of Texas at Arlington / JINR PanDA UTA 3-4 of September 2013.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
Marian Babik, Luca Magnoni SAM Test Framework. Outline  SAM Test Framework  Update on Job Submission Timeouts  Impact of Condor and direct CREAM tests.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
Costantini Alessandro INFN - IGI MPI-VT within EGI 23/03/2012Costantini A. – MPI-Multicore1.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Machine/Job Features Update Stefan Roiser. Machine/Job Features Recap Resource User Resource Provider Batch Deploy pilot Cloud Node Deploy VM Virtual.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CIS250 OPERATING SYSTEMS Chapter One Introduction.
CERN IT Department t LHCb Software Distribution Roberto Santinelli CERN IT/GS.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI MPI and Parallel Code Support Alessandro Costantini, Isabel Campos, Enol.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
Probes Requirement Review OTAG-08 03/05/ Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT)
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
PanDA HPC integration. Current status. Danila Oleynik BigPanda F2F meeting 13 August 2013 from.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI 2 nd level support training Marian Babik, David Collados, Wojciech Lapka,
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Report from.
A GOS Interoperate Interface's Design & Implementation GOS Adapter For JSAGA Meng You BUAA.
SRM v2.2: service availability testing and monitoring SRM v2.2 deployment Workshop - Edinburgh, UK November 2007 Flavia Donno IT/GD, CERN.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI MPI VT report OMB Meeting 28 th February 2012.
Grid Engine batch system integration in the EMI era
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
SA2.6 Task: EMI Testbeds Danilo Dongiovanni INFN-CNAF.
Daniele Bonacorsi Andrea Sciabà
CX Introduction to Web Programming
gLExec and OS compatibility
NGI and Site Nagios Monitoring
Advanced Topics: MPI jobs
Dag Toppe Larsen UiB/CERN CERN,
Dag Toppe Larsen UiB/CERN CERN,
Special jobs with the gLite WMS
Benchmarking Changes and Accounting
The gLite Workload Management System
Summary on PPS-pilot activity on CREAM CE
Using Paraguin to Create Parallel Programs
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Security Monitoring in a Nagios world
The CREAM CE: When can the LCG-CE be replaced?
I2G CrossBroker Enol Fernández UAB
Accounting Information: MPI
March Availability Report for EGEE Sites based on Nagios
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
CRESCO Project: Salvatore Raia
Outline Introduction Objectives Motivation Expected Output
Monitoring Session Intro
Interoperability & Standards
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
UMD 2 Decommissioning Status
UMD 2 Decommissioning Status
Kashif Mohammad Deputy Technical Co-ordinator (South Grid) Oxford
Fill in the gaps for the unknown Voltage and current readings
Testing the EGI-DRIHM TestBed
Presentation transcript:

MPI probes OMB Meeting 26th February 2013

New MPI probes description. Outline Motivation: Why do we need different / new MPI probes? New MPI probes description. New MPI probes integration process.

The need for new probes Why present MPI probes are not sufficient Completely dependent of the EGI information system. If a site is publishing the MPI-START tag correctly, the resource is tested, otherwise it is not. Poor testing of MPI functionality No MPI Information sanity check Execution of an “Hello World” MPI job in two slots Only uses a single WN (in the majority of the cases)

New MPI probes description MPI Sanity Check (eu.egi.eu.mpi.EnvSanityCheck): Checks values published by the information system (MPI tags, MPI flavours, CPU and Wall- clock values coherent with the execution of the MPI applications). Simple Job (eu.egi.mpi.SimpleJob): Checks the ability to execute a simple parallel jobs with MPI-Start using 2 different slots. The probe detects typical errors such as misconfiguration of the batch system, incorrect installation of the MPI implementation or errors in the distribution of files across nodes. Complex Job (eu.egi.mpi.ComplexJob): Checks the execution of larger jobs and tests a broader functionality of the MPI standard. Requests 4 slots with 2 instances running in different dedicated machines (JobType="Normal"; CpuNumber = 4; NodeNumber=2; SMPGranularity=2; WholeNodes=True).

MPI probe specification example (1/2) Ex: Simple Job (eu.egi.mpi.SimpleJob): Target: CREAM-CE tagged as eu.egi.MPI Service in GOCDB Requirements: Job submission requesting two slots in different machines (JobType="Normal"; CpuNumber = 2; NodeNumber=2) Dependencies: Executed after eu.egi.mpi.envsanitycheck, and if it exits with WARNING or OK status. Frequency and Timeout: 2h

MPI probe specification example (2/2) Expected behaviour: Simple Job (eu.egi.mpi.SimpleJob) MPI-START not able to determine the scheduler type WARNING MPI-START is not able to determine if the MPI flavour under test is correctly set The compilation of the parallel application fails CRITICAL MPI-START fails to distribute the application binaries The MPI application execution failed MPI-START fails to collect the application results The application executed with less slots than requested The probe reaches a timeout and the execution is cancelled The probe reaches a 2nd timeout and execution is cancelled The application executed successfully OK

MPI probes integration process Development of new set of probes (SA3): DONE SAM nagios Testing (SA2/JRA1): DONE Delivery of SAM nagios probes to SA1: On Going Request for adding/modify MPI probe to OTAG: https://wiki.egi.eu/wiki/PROC07 Integration in SAM. SAM U23?

References MPI VT wiki: https://wiki.egi.eu/wiki/VT_MPI_within_EGI New MPI nagios details:https://wiki.egi.eu/wiki/VT_MPI_within_EGI:Nagio s MPI VT report: https://documents.egi.eu/document/1260 RT GOCDB new MPI service type: https://rt.egi.eu/rt/Ticket/Display.html?id=3396