PANDA PILOT FOR HPC Danila Oleynik (UTA). Outline What is PanDA Pilot PanDA Pilot architecture (at nutshell) HPC specialty PanDA Pilot for HPC 2.

Slides:

Advertisements

Similar presentations

XSEDE 13 July 24, Galaxy Team: PSC Team:

Advertisements

A Computation Management Agent for Multi-Institutional Grids

Grid Programming Environment (GPE) Grid Summer School, July 28, 2004 Ralf Ratering Intel - Parallel and Distributed Solutions Division (PDSD)

June 21, PROOF - Parallel ROOT Facility Maarten Ballintijn, Rene Brun, Fons Rademakers, Gunter Roland Bring the KB to the PB.

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu

Testing PanDA at ORNL Danila Oleynik University of Texas at Arlington / JINR PanDA UTA 3-4 of September 2013.

MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

- 1 - Grid Programming Environment (GPE) Ralf Ratering Intel Parallel and Distributed Solutions Division (PDSD)

Fabien Viale 1 Matlab & Scilab Applications to Finance Fabien Viale, Denis Caromel, et al. OASIS Team INRIA -- CNRS - I3S.

OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.

 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.

Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.

High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,

Service Architecture of Grid Faults Diagnosis Expert System Based on Web Service Wang Mingzan, Zhang ziye Northeastern University, Shenyang, China.

Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.

Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.

COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.

CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.

GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.

Jean-Sébastien Gay LIP ENS Lyon, Université Claude Bernard Lyon 1 INRIA Rhône-Alpes GRAAL Research Team Join work with DIET TEAM D istributed I nteractive.

Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,

CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei

PROGRESS: ICCS'2003 GRID SERVICE PROVIDER: How to improve flexibility of grid user interfaces? Michał Kosiedowski.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)

 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.

Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.

Experience and possible evolution Danila Oleynik (UTA), Sergey Panitkin (BNL), Taylor Childers (ANL) ATLAS TIM 2014.

Integrating Computing Resources on Multiple Grid-enabled Job Scheduling Systems Through a Grid RPC System Yoshihiro Nakajima, Mitsuhisa Sato, Yoshiaki.

Migrating Desktop Bartek Palak Bartek Palak Poznan Supercomputing and Networking Center The Graphical Framework.

MultiJob pilot on Titan. ATLAS workloads on Titan Danila Oleynik (UTA), Sergey Panitkin (BNL) US ATLAS HPC. Technical meeting 18 September 2015.

WebFlow High-Level Programming Environment and Visual Authoring Toolkit for HPDC (desktop access to remote resources) Tomasz Haupt Northeast Parallel Architectures.

ALICE-PanDA Pilot Factorizations Kaushik De Nov. 7, 2014.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

Tool Integration with Data and Computation Grid “Grid Wizard 2”

EUFORIA FP7-INFRASTRUCTURES , Grant Migrating Desktop Uniform Access to the Grid Marcin Płóciennik Poznan Supercomputing and Networking Center.

OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.

HPC pilot code. Danila Oleynik 18 December 2013 from.

CERES-2012 Deliverables Architecture and system overview 21 November 2011 Updated: 12 February

EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.

BalticGrid-II Project EGEE UF’09 Conference, , Catania Partner’s logo Framework for Grid Applications Migrating Desktop Framework for Grid.

Event Service Wen Guan University of Wisconsin 1.

Big PanDA on HPC/LCF Update Sergey Panitkin, Danila Oleynik BigPanDA F2F Meeting. March

Migrating Desktop Uniform Access to the Grid Marcin Płóciennik Poznan Supercomputing and Networking Center Poland EGEE’08 Conference, Istanbul, 24 Sep.

Danila Oleynik (On behalf of ATLAS collaboration)

ARC-CE: updates and plans Oxana Smirnova, NeIC/Lund University 1 July 2014 Grid 2014, Dubna using input from: D. Cameron, A. Filipčič, J. Kerr Nilsen,

PanDA HPC integration. Current status. Danila Oleynik BigPanda F2F meeting 13 August 2013 from.

1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.

Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.

Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.

A GOS Interoperate Interface's Design & Implementation GOS Adapter For JSAGA Meng You BUAA.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

StoRM: a SRM solution for disk based storage systems

Dynamic Deployment of VO Specific Condor Scheduler using GT4

Example: Rapid Atmospheric Modeling System, ColoState U

Peter Kacsuk – Sipos Gergely MTA SZTAKI

PanDA setup at ORNL Sergey Panitkin, Alexei Klimentov BNL

GWE Core Grid Wizard Enterprise (

HPC DOE sites, Harvester Deployment & Operation

HEP Computing Tools for Brain Studies

R.Mashinistov (UTA) July

P-GRADE and GEMLCA.

BigPanDA WMS for Brain Studies

Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.

Mike Becher and Wolfgang Rehm

User interaction and workflow management in Grid enabled e-VLBI experiments Dominik Stokłosa Poznań Supercomputing and Networking Center, Supercomputing.

Workflow Management Software For Tomorrow

Condor-G: An Update.

Presentation transcript:

PANDA PILOT FOR HPC Danila Oleynik (UTA)

Outline What is PanDA Pilot PanDA Pilot architecture (at nutshell) HPC specialty PanDA Pilot for HPC 2

What is PanDA Pilot PanDA Pilot is lightweight application for managing of execution of payload on some computing resource. PanDA Pilot responsible for: Requesting PanDA server for an abstract description of computing resource Requesting information about payload from PanDA server VO specific environment setup Stage-in/Stage-out procedures Monitoring of execution of payload Updating PanDA server with monitoring information Recovery of failed jobs 3

PanDA Pilot architecture (at nutshell) 4 PanDA Pilot have modular structure, with possibility of realization of each module through plugin-based architecture. So, environment setup and payload launch preparation can be VO specific It’s possible to use different data transfers technology through stage in/out plugins (data movers) Payload execution module may interact with different computational backends.

HPC specialty Seymour Cray : “supercomputer, it is hard to define, but you know it when you see it” Despite this, we found some common parts: Parallel file system shared between nodes Restricted access to facility and (usually) limited access to computing nodes Internal job management tool: PBS/TORQUE/SLURM etc. One job occupy minimum one node Limitation of number of jobs in scheduler for one user Special facilities for non-computing processes: external data transfers, intensive IO operations (merging) 5

PanDA Pilot on HPC 6 Pilot(s) executes on HPC interactive node Pilot interact with local job scheduler to manage job Number of executing pilots = number of available slots in local scheduler

SAGA API - uniform access-layer SAGA API was chosen for encapsulation of interactions with HPC internal batch system: High level job description API Set of adapters for different job submission systems (SSH and GSISSH; Condor and Condor-G; PBS and Torque; Sun Grid Engine; SLURM, IBM LoadLeveler) Local and remote intercommunication with job submission systems API for different data transfers protocols (SFTP/GSIFTP; HTTP/HTTPS; iRODS) To avoid deployment restrictions, SAGA API modules was directly included in PanDA pilot code Solution was successfully validated on Titan (OLCF), Hopper (NERSC), Edison (NERSC), Anselm (I4TI) 7