Download presentation
Presentation is loading. Please wait.
Published byMiranda McKinney Modified over 8 years ago
1
PANDA PILOT FOR HPC Danila Oleynik (UTA)
2
Outline What is PanDA Pilot PanDA Pilot architecture (at nutshell) HPC specialty PanDA Pilot for HPC 2
3
What is PanDA Pilot PanDA Pilot is lightweight application for managing of execution of payload on some computing resource. PanDA Pilot responsible for: Requesting PanDA server for an abstract description of computing resource Requesting information about payload from PanDA server VO specific environment setup Stage-in/Stage-out procedures Monitoring of execution of payload Updating PanDA server with monitoring information Recovery of failed jobs 3
4
PanDA Pilot architecture (at nutshell) 4 PanDA Pilot have modular structure, with possibility of realization of each module through plugin-based architecture. So, environment setup and payload launch preparation can be VO specific It’s possible to use different data transfers technology through stage in/out plugins (data movers) Payload execution module may interact with different computational backends.
5
HPC specialty Seymour Cray : “supercomputer, it is hard to define, but you know it when you see it” Despite this, we found some common parts: Parallel file system shared between nodes Restricted access to facility and (usually) limited access to computing nodes Internal job management tool: PBS/TORQUE/SLURM etc. One job occupy minimum one node Limitation of number of jobs in scheduler for one user Special facilities for non-computing processes: external data transfers, intensive IO operations (merging) 5
6
PanDA Pilot on HPC 6 Pilot(s) executes on HPC interactive node Pilot interact with local job scheduler to manage job Number of executing pilots = number of available slots in local scheduler
7
SAGA API - uniform access-layer SAGA API was chosen for encapsulation of interactions with HPC internal batch system: High level job description API Set of adapters for different job submission systems (SSH and GSISSH; Condor and Condor-G; PBS and Torque; Sun Grid Engine; SLURM, IBM LoadLeveler) Local and remote intercommunication with job submission systems API for different data transfers protocols (SFTP/GSIFTP; HTTP/HTTPS; iRODS) To avoid deployment restrictions, SAGA API modules was directly included in PanDA pilot code Solution was successfully validated on Titan (OLCF), Hopper (NERSC), Edison (NERSC), Anselm (I4TI) http://saga-project.github.io 7
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.