Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.

Slides:

Advertisements

Similar presentations

An Erlang Implementation of Restms. Why have messaging? Separates applications cheaply Feed information to the right applications cheaply Interpret feed.

Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

High Performance Computing Course Notes Grid Computing.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.

6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.

Two main requirements: 1. Implementation Inspection policies (scheduling algorithms) that will extand the current AutoSched software : Taking to account.

Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.

Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.

Testing PanDA at ORNL Danila Oleynik University of Texas at Arlington / JINR PanDA UTA 3-4 of September 2013.

Minerva Infrastructure Meeting – October 04, 2011.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

INTRODUCTION TO WEB DATABASE PROGRAMMING

CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.

Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.

Grid Toolkits Globus, Condor, BOINC, Xgrid Young Suk Moon.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Christopher Jeffers August 2012

Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.

ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

A Distributed Computing System Based on BOINC September - CHEP 2004 Pedro Andrade António Amorim Jaime Villate.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.

Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.

OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL

Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine.

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR

 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.

9 Systems Analysis and Design in a Changing World, Fourth Edition.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

Review of Condor,SGE,LSF,PBS

Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.

Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.

A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.

Progress on HPC’s (or building “HPC factory” at ANL) Doug Benjamin Duke University.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,

Introduction to Grid Computing and its components.

Chapter 14 Advanced Architectural Styles. Objectives Describe the characteristics of a distributed system Explain how middleware supports distributed.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

STAR Scheduler Gabriele Carcassi STAR Collaboration.

Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.

Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.

Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.

Parag Mhashilkar (Fermi National Accelerator Laboratory)

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

Advanced Higher Computing Science

Accessing the VI-SEEM infrastructure

Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.

Grid Computing: Running your Jobs around the World

Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016

GWE Core Grid Wizard Enterprise (

Grid Computing.

#01 Client/Server Computing

Mike Becher and Wolfgang Rehm

Enterprise Integration

Production Manager Tools (New Architecture)

#01 Client/Server Computing

Presentation transcript:

Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University

HPC Boundary conditions There are many scientific HPC machines across the US and the world. o Need to design system that is general enough to work on many different machines Each machine is independent of other o The “grid” side of the equation must aggregate the information There several different machine architectures o ATLAS jobs will not run unchanged on many of the machines o Need to compile programs for each HPC machine o Memory per node (each with multiple cores) varies from machine to machine The computational nodes typically do not have connectivity to the Internet o connectivity is through login node/edge machine o Pilot jobs typically can not run directly on the computational nodes o TCP/IP stack missing on computational nodes

Introduction My take on comparison between HPC and HTC(grid) HTC – goes fast and steady Goes really fast Similar but different

Additional HPC issues Each HPC machine has its own job management system Each HPC machine has its own identity management system Login/Interactive nodes have mechanisms for fetching information and data files HPC computational nodes are typically MPI Can get a large number of nodes The latency between job submission and completion can be variable. (Many other users)

Work Flow Some ATLAS simulation jobs can be broken up into 3 components (Tom LeCompte’s talked about this in greater detail) 1.Preparatory phase - Make the job ready for HPC o For example - generate computational grid for Alpgen o Fetch Database files for Simulation o Transfer input files to HPC system 2.Computational phase – can be done on HPC o Generate events o Simulate events 3.Post Computational phase (Cleanup) o Collect output files (log files, data files) from HPC jobs o Verify output o Unweight (if needed) and merge files

HTC->HPC->HTC ATLAS job management system (PANDA) need not run on HPC system o This represents a simplification o Nordu-grid has been running this way for a while Panda requires pilot jobs Autopy factory is used to submit Panda pilots Direct submission of pilots to a condor queue works well. o Many cloud sites use this mechanism – straight forward to use HPC portion should be coupled but independent of HTC work flow. o Use messaging system to send messages between the domains o Use grid tools to move files between HTC and HPC

HTC (“Grid side”) Infrastructure APF Pilot factory to submit pilots Panda queue – currently testing an ANALY QUEUE Local batch system Web server to provide steering XML files to HPC domain Message Broker system to exchange information between Grid Domain and HPC domain Gridftp server to transfer files between HTC domain and HPC domain. o Globus Online might be a good solution here (what are the costs?) ATLAS DDM Site - SRM and Gridftp server(s).

HPC code stack Work done by Tom Uram - ANL Work on HPC side is performed by two components o Service : Interacts with message broker to retrieve job descriptions, saves jobs in local database, notifies message broker of job state changes o Daemon : Stages input data from HTC GridFTP server, submits job to queue, monitors progress of job, and stages output data to HTC GridFTP server Service and Daemon are built in Python, using the Django Object Relational Mapper (ORM) to communicate with the shared underlying database o Django is a stable, open-source project with an active community o Django supports several database backends Current implementation relies on GridFTP for data transfer and the ALCF Cobalt scheduler Modular design enables future extension to alternative data transfer mechanisms and schedulers

Message Broker system System must have large community support beyond just HEP Solution must be open source (Keeps Costs manageable) Message Broker system must have good documentation Scalable Robust Secure Easy to use Must use a standard protocol (AMQP for example) Clients in multiple languages (like JAVA/Python)

RabbitMQ message broker ActiveMQ and RabbitMQ evaluated. Google analytics shows both are equally popular Bench mark measurements show that RabbitMQ server out performs ActiveMQ Found it easier to handle message routing and our work flow Pika python client easy to use.

Basic Message Broker design Each HPC has multiple permanent durable queues. o One queue per activity on HPC o Grid jobs send messages to HPC machines through these queues o Each HPC will consume messages from these queues o Routing string is used to direct message to the proper place Each Grid Job will have multiple durable queues o One queue per activity (Step in process) o Grid job creates the queues before sending any message to HPC queues o On completion of grid job job queues are removed o Each HPC cluster publishes message to these queues through an exchange o Routing string is used to direct message to the proper place o Grid jobs will consume messages the messages only on its queues. Grid domains and HPC domains have independent polling loops Message producer and Client code needs to be tweaked for additional robustness

Open issues for a production system Need a federated Identity management o Grid identify system is not used in HPC domain o Need to strictly regulate who can run on HPC machines Security-Security (need I say more) What is the proper scale for the Front-End grid cluster? o Now many nodes are needed? o How much data needs to be merged? Panda system must be able to handle large latencies. o Could expect jobs to wait a week before running o Could be flooded with output once the jobs run. Production task system should let HTC-HPC system have flexibility to decide how to arrange the task. o HPC scheduling decisions might require different Task geometry to get the work through in an expedient manner

Conclusions Many ATLAS MC jobs can be divided into a Grid (HTC) component and a HPC component Have demonstrated that using existing ATLAS tools that we can design and build a system to send jobs from grid to HPC and back to Grid Modular design of all components makes it easier to add new HPC sites and clone the HTC side if needed for scaling reasons. Lessons learned from Nordugrid Panda integration will be helpful A lightweight yet powerful system is being developed.