BigPanDA WMS for Brain Studies

Slides:

Advertisements

Similar presentations

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.

Advertisements

High Performance Computing Course Notes Grid Computing.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Integrating Network Awareness in ATLAS Distributed Computing Using the ANSE Project J.Batista, K.De, A.Klimentov, S.McKee, A.Petroysan for the ATLAS Collaboration.

Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu

Testing PanDA at ORNL Danila Oleynik University of Texas at Arlington / JINR PanDA UTA 3-4 of September 2013.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

Grid Initiatives for e-Science virtual communities in Europe and Latin America DIRAC TEAM CPPM – CNRS DIRAC Grid Middleware.

PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.

The Grid System Design Liu Xiangrui Beijing Institute of Technology.

PanDA: Exascale Federation of Resources for the ATLAS Experiment

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

LOGO PROOF system for parallel MPD event processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.

Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.

Event Service Intro, plans, issues and objectives for today Torre Wenaus BNL US ATLAS S&C/PS Workshop Aug 21, 2014.

Overview of ASCR “Big PanDA” Project Alexei Klimentov Brookhaven National Laboratory September 4, 2013, Arlington, TX PanDA UTA.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

PanDA & BigPanDA Kaushik De Univ. of Texas at Arlington BigPanDA Workshop, CERN October 21, 2013.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

The GridPP DIRAC project DIRAC for non-LHC communities.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Big PanDA on HPC/LCF Update Sergey Panitkin, Danila Oleynik BigPanDA F2F Meeting. March

The GridPP DIRAC project DIRAC for non-LHC communities.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Danila Oleynik (On behalf of ATLAS collaboration)

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.

DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.

WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

BigPanDA Status Kaushik De Univ. of Texas at Arlington Alexei Klimentov Brookhaven National Laboratory OSG AHM, Clemson University March 14, 2016.

Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008.

The LGI Pilot job portal EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

PANDA PILOT FOR HPC Danila Oleynik (UTA). Outline What is PanDA Pilot PanDA Pilot architecture (at nutshell) HPC specialty PanDA Pilot for HPC 2.

Workload Management Workpackage

PROOF system for parallel NICA event processing

Volunteer Computing for Science Gateways

AWS Integration in Distributed Computing

Virtualization and Clouds ATLAS position

U.S. ATLAS Grid Production Experience

Working With Azure Batch AI

PanDA setup at ORNL Sergey Panitkin, Alexei Klimentov BNL

POW MND section.

StratusLab Final Periodic Review

StratusLab Final Periodic Review

HPC DOE sites, Harvester Deployment & Operation

HEP Computing Tools for Brain Studies

DIRAC services.

Fine grained processing with an Event Service

R.Mashinistov (UTA) July

Univ. of Texas at Arlington BigPanDA Workshop, ORNL

LCG middleware and LHC experiments ARDA project

Cloud Computing R&D Proposal

Monitoring of the infrastructure from the VO perspective

Grid Canada Testbed using HEP applications

ExaO: Software Defined Data Distribution for Exascale Sciences

Module 01 ETICS Overview ETICS Online Tutorials

rvGAHP – Push-Based Job Submission Using Reverse SSH Connections

DBOS DecisionBrain Optimization Server

A Possible OLCF Operational Model for HEP (2019+)

Workflow Management Software For Tomorrow

Workflow and HPC erhtjhtyhy Doug Benjamin Argonne National Lab.

Presentation transcript:

BigPanDA WMS for Brain Studies R. Mashinistov on behalf of BigPanDA team (University of Texas at Arlington)

Outline Introduction PanDA - Production and Distributed Analysis Workload Management System Extending PanDA beyond Grid Blue Brain Project (BBP) computing model PanDA portal at BBP Summary & conclusions

Introduction In March 2017 we launched the pilot project with Blue Brain Project (BBP, Swiss Federal Institute of Technology in Lausanne). The proof of concept project aimed to demonstrate efficient application of the BigPanDA Workload Management System, initially developed for HEP applications, for the supercomputer-based reconstructions and simulations offering a radically new approach for understanding the multilevel structure and function of the brain. In the first phase, the goal of this joint project is to automate BBP workflow execution on a variety of distributed computing systems.

Workload Management. PanDA - Production and Distributed Analysis System https://twiki.cern.ch/twiki/bin/view/PanDA/PanDA PanDA Brief Story 2005: Initiated for US ATLAS (BNL and UTA) 2006: Support for analysis 2008: Adopted ATLAS-wide 2009: First use beyond ATLAS 2011: Dynamic data caching based on usage and demand 2012: ASCR/HEP BigPanDA 2014: Network-aware brokerage 2014 : Job Execution and Definition I/F (JEDI) adds complex task management and fine grained dynamic job management 2014: JEDI- based Event Service 2014:megaPanDA project supported by RF Ministry of Science and Education 2015: New ATLAS Production System, based on PanDA/JEDI 2015 :Manage Heterogeneous Computing Resources 2016: DOE ASCR BigPanDA@Titan project 2016:PanDA for bioinformatics 2016:COMPASS adopted PanDA (JINR, CERN, NRC-KI), NICA (JINR) PanDA beyond HEP : LSST, BlueBrain Global ATLAS operations Up to ~250k concurrent jobs 25-30M jobs/month at >150 sites ~1400 ATLAS users First exascale workload manager in HENP 1.4 Exabytes processed in 2016 Exascale scientific data processing today BigPanDA Monitor http://bigpanda.cern.ch/ ATLAS Production System Performance. Daily Completed Jobs.

PanDA - Production and Distributed Analysis System PanDA - Production and Distributed Analysis Workload Management System has been developed for ATLAS experiment at LHC. PanDA was able to cope with increasing LHC luminosity, ATLAS data taking rate, processing and analysis challenges Recently PanDA has been extended to run HEP scientific applications on Supercomputers PanDA beyond ATLAS HEP and astro-particle experiments COMPASS and AMS has chosen PanDA as WMS for data processing and analysis. nEDM with LSST will evaluate PanDA. ALICE is interested in PanDA evaluation for OLCF. JINR (Russia) is considering PanDA as main WMS for NICA collider Several PanDA instances beyond ATLAS : OLCF, Taiwan, Amazon EC2, Russia (Moscow, Dubna)

PanDA WMS basics Basic components are Server – provides mapping of all jobs in the system to all available resources Client – allows to submit jobs, performs actions on jobs and Server, retrieves some information from the Server Pilot – launched on resource retrieves jobs from the Server and handles payload execution Pilot schedulers – responsible to launch the pilots on resources Monitor – provides overall information about the jobs, resources and the system External to the PanDA components: Distributed Data Management system File catalog

Extending PanDA to Supercomputers BigPanDA is an extension of the PanDA WMS to run ATLAS and non-ATLAS applications on Leadership Class Facilities and supercomputers, as well as traditional grid and cloud resources. Use of supercomputers in ATLAS Supercomputing centers in USA, Europe, and Asia, in particular the Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), the National Energy Research Supercomputing Center (NERSC) in USA, Ostrava supercomputing center in Czech Republic, and “Kurchatov Institute” in Russia (NRC KI) are now integrated within the ATLAS workflow via the PanDA WMS. PanDA is a pilot based WMS. Using a pilot and pilot submission system customized for actual SC allows to integrate any opportunistic resources Each supercomputer is unique: Unique architecture and hardware, Specialized Operating System, “weak” worker nodes, limited memory per worker node, different job submission systems, Unique security environment, Own usage policy etc. See also J.Wells,D.Oleynik and M.Borodin talks

Extending PanDA beyound HEP. BigPanDA for Genomics At NRC KI PALEOMIX pipeline was adapted to run on local supercomputer resources powered by PanDA. It was used to map mammoth DNA on the African elephant reference genome Using software tools developed initially for HEP and Grid reduced genomics payload execution time for Mammoths DNA sample from weeks to days. ~ 4-5 weeks 135 chunks … ~ 4 days PanDA

BigPanDA service under OpenShift Origin PBS PanDA Pilot SAGA Job execution Payload execution MySQL Job submission PanDA Client PanDA Monitor Login Node A new BigPanDA WMS instance was installed within ORNL in order to serve various experiments at OLCF. Components are running as a Docker-containers within the OpenShift Origin system

Science impact Payloads PanDA WMS instance at ORNL currently operates in debug mode Series of initial tests are done in collaboration with the following scientific areas: Human brain studies Biology Molecular Dynamics HEP and astro-particle physics Payloads MPI+OpenMP CPU+GPU

Blue Brain Project – a Swiss Brain Initiative The goal of the Blue Brain Project is to build biologically detailed digital reconstructions and simulations of the rodent, and ultimately the human brain. The supercomputer-based reconstructions and simulations built by the project offer a radically new approach for understanding the multilevel structure and function of the brain. The novel research strategy allows the project to build computer models of the brain at an unprecedented level of biological detail. Supercomputer-based simulation turns understanding the brain into a tractable problem, providing a new tool to study the complex interactions within different levels of brain organization and to investigate the cross-level links leading from genes to cognition. http://bluebrain.epfl.ch

BBP Target resources PBS

BBP resources metrics Intel x86-NVIDIA GPU based BBP clusters located in Geneva (47 TFlops) and Lugano (81 TFlops) BBP IBM BlueGene/Q supercomputer ( 0.78 PFlops and 65 TB of DRAM memory) located in Lugano Titan Supercomputer with peak theoretical performance 27 PFlops operated by the Oak Ridge Leadership Computing Facility (OLCF)

Computing model of the Blue Brain Project Moving from machine specific setup… to common functional building blocks Software deployment --> Modules Data distribution --> Manual method (rsync) Job scheduling --> site specific scheduler

Computing model of the Blue Brain Project Moving from machine specific setup… to common functional building blocks Software deployment --> Modules --> NIX Data distribution --> Manual method (rsync) --> Globus Toolkit Job scheduling --> site specific scheduler --> BigPanDA PanDA server

Initial PanDA test @ BBP For initial test in March 2017 PanDA Server + Client were installed to the VM at the Campus Biotech (Geneva). Job submission has been done via the python based PanDA Client. Pilot were started manually on Geneve and Lugano clusters including BlueGene/Q, Titan and Amazon resources. SQL https PanDA Server PanDA DB PanDA Client REST API REST API https Pilot Pilot Pilot Resource A Resource B Resource Z

PanDA Portal @ BBP (1/2) Authentication File Catalog Data Transfer System Data Management System Web Services SQL SQL https PanDA Server PanDA DB PanDA Monitor PanDA Client REST API REST API https Pilot Scheduler Pilot Scheduler Pilot Pilot Pilot Resource A Resource B Resource Z

PanDA Portal @ BBP (2/2) PanDA Portal Authentication File Catalog Data Transfer System Data Management System Web Services PanDA Portal SQL SQL https PanDA Server PanDA DB PanDA Monitor PanDA Client REST API REST API https Pilot Scheduler Pilot Scheduler Pilot Pilot Pilot Resource A Resource B Resource Z

User Interfaces Web interfaces Authentication, Authorization submit jobs monitor jobs View results summary Authentication, Authorization integrated with BBP SSO ldap groups aware Multiple available sites (PanDA Queue)

Summary & Conclusion Phase 1 of “Proof of the concept” was successfully finished The test jobs were successfully submitted to the targeted resources via PanDA portal. The project demonstrated that the software tools and methods for processing large volumes of experimental data, which have been developed initially for experiments at the LHC, can be successfully applied to BBP. Phase 2 of “Pre-production” is under considerations PanDA server

Thanks This talk drew on presentations, discussions, comments, input from many Thanks to our ATLAS, BigPanDA and ORNL colleagues.