Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

Slides:



Advertisements
Similar presentations
SALSA HPC Group School of Informatics and Computing Indiana University.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
SARA Reken- en Netwerkdiensten ToPoS: High-Throughput Parallel Processing Pipelines on the Grid Pieter van Beek SARA Computing and Networking Services.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Testing PanDA at ORNL Danila Oleynik University of Texas at Arlington / JINR PanDA UTA 3-4 of September 2013.
EXTENDING SCIENTIFIC WORKFLOW SYSTEMS TO SUPPORT MAPREDUCE BASED APPLICATIONS IN THE CLOUD Shashank Gugnani Tamas Kiss.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Sergey Belov, Tatiana Goloskokova, Vladimir Korenkov, Nikolay Kutovskiy, Danila Oleynik, Artem Petrosyan, Roman Semenov, Alexander Uzhinskiy LIT JINR The.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Cloud Usage Overview The IBM SmartCloud Enterprise infrastructure provides an API and a GUI to the users. This is being used by the CloudBroker Platform.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
ATLAS Grid Data Processing: system evolution and scalability D Golubkov, B Kersevan, A Klimentov, A Minaenko, P Nevski, A Vaniachine and R Walker for the.
A scalable and flexible platform to run various types of resource intensive applications on clouds ISWG June 2015 Budapest, Hungary Tamas Kiss,
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
EMI INFSO-RI ARC tools for revision and nightly functional tests Jozef Cernak, Marek Kocan, Eva Cernakova (P. J. Safarik University in Kosice, Kosice,
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
HPC pilot code. Danila Oleynik 18 December 2013 from.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
High-throughput parallel pipelined data processing system for remote Earth sensing big data in the clouds Высокопроизводительная параллельно-конвейерная.
The LGI Pilot job portal EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers.
PANDA PILOT FOR HPC Danila Oleynik (UTA). Outline What is PanDA Pilot PanDA Pilot architecture (at nutshell) HPC specialty PanDA Pilot for HPC 2.
THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.
Petr Škoda, Jakub Koza Astronomical Institute Academy of Sciences
Compute and Storage For the Farm at Jlab
Review of the WLCG experiments compute plans
Status of WLCG FCPPL project
Volunteer Computing for Science Gateways
Joslynn Lee – Data Science Educator
Overview of the Belle II computing
U.S. ATLAS Grid Production Experience
Working With Azure Batch AI
Example: Rapid Atmospheric Modeling System, ColoState U
Outline Benchmarking in ATLAS Performance scaling
PanDA setup at ORNL Sergey Panitkin, Alexei Klimentov BNL
POW MND section.
Added value of new features of the ATLAS computing model and a shared Tier-2 and Tier-3 facilities from the community point of view Gabriel Amorós on behalf.
HPC DOE sites, Harvester Deployment & Operation
HEP Computing Tools for Brain Studies
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
R.Mashinistov (UTA) July
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Recap: introduction to e-science
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
BigPanDA WMS for Brain Studies
Статус ГРИД-кластера ИЯФ СО РАН.
FCT Follow-up Meeting 31 March, 2017 Fernando Meireles
An easier path? Customizing a “Global Solution”
WLCG Collaboration Workshop;
Cloud Computing R&D Proposal
Grid Canada Testbed using HEP applications
Xiaomei Zhang On behalf of CEPC software & computing group Nov 6, 2017
ELIXIR Competence Center
Distributing META-pipe on ELIXIR compute resources
This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.
Presentation transcript:

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov, A. Novikov, A. Poyda, E. Ryabinkin and I. Tertychnyy NRC “Kurchatov Institute”

Project objective Scientific data processing portal design, development and implementation to integrate large scale heterogeneous computing facilities as a common computing infrastructure.

“Kurchatov Institute” Computing Resources The supercomputing facilities for Collective Usage includes high performance cluster with peak performance of TFLOPS. It consists of 1280 nodes, with 2/8 cores CPUs per node (10240 cores total). Nodes operate under CentOS Linux and SLURM batch system. One of the largest Russian Grid centers (WLCG Tier-1 center). The Tier-1 facility processes, simulates and stores up to 10% of total data for three LHC experiments (ALICE, ATLAS and LHCb). OpenStack based Academic Cloud platform having performance 1.5 TFLOPS and providing 16 nodes, 256 cores, 512 Gb RAM and 60 Tb at storage system.

Method Adopt the methods being used in the LHC High Energy Physics Experiments for BigData processing in other sciences. – The ATLAS Production and Distributed Analysis workload management system (PanDA) has been chosen as a base technology for the portal. Basic PanDA scheme used in Kurchatov Institute

Computing portal at Kurchatov Institute 5 Web-user interface (python/FLASK) File catalog & data transfer system (DDM) Plugin based (ftp, http, grid and etc.) Authorization OAuth 2.0 API for external applications Token authentication Upload/fetch files Send a job Get job status/statistics Private FTP storage for each user

PanDA instance installation 6 KI ▫ server, auto pilots factory, monitor and database server (MySQL) After APF was installed in 2014 we immediately set up Analysis queue ▫ Condor-SLURM connector In 2015 we set up Production queue 15,5% ATLAS MC Production and Analysis Executed on It is transparent for Prodsys and Users

Web-user interface 7 Job setup interface ▫ Easy setup distributive, input files, parameters and output file names ▫ Local authentication (users don’t need a certificate) Jobs monitor After submitting a job you can monitor its status in the "Job list" tab. When the job finishes, you can get your results from the detailed info page ▫ Links to the input/output files

Portal operation: Authorization, API, FTP Storage interaction 8 OAuth 2.0 authentication API for external applications Celery as async executor DDM PanDA server WEBPANDA interface NFS VWN LUSTRE WN Cloud HPC WWW API FTP Private ftp storage for user’s input files External tools endpoint WEB-based user interface New job files

Typical portal jobs The portal fits standard PanDA computational scheme and shows most efficiency for compute- intensive tasks that could be split into many sub-jobs to be computed in parallel. To hide execution complexity and manual routines from end-users we introduced and seamlessly integrated into the portal original pipelines control system, that automatically (and without user prompt) split input data, prepare and run sub-tasks as ordinary PanDA jobs and merge results. Pipeline workflow Each step contains: 1.Server Python script 2.PanDA Jobs (if applicable) Every pipeline contains several steps Every step contains one or many standard PanDA jobs Steps with many jobs can be executed in parallel This workflow allows to reduce total computational time by using multi-node parallelization initprerunsplit prun mergepostrun finish Time runtime interface

Approbation: biology application 10 Next Generation Genome Sequencing (NGS) Analysis of ancient genomes sequencing data (Mammoths DNA) using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. PALEOMIX includes typical set of software used to process NGS data We adapted the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what was done by ATLAS to process and to simulate data. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days. … 135 chunks ~ 4-5 weeks ~ 2-4 days

Summary and Conclusion PanDA instance has been installed at NRC KI, and it was adapted for the scientific needs of the Center’s users community and integrated with the computing resources at NRC KI. We developed: User’s interface to provide scientists with a single access point to the computing resources; Data Management client to transfer data between disk storage and working node, as well as data stage-in/stage-out. HTTP API for external applications The portal has demonstrated its efficiency in running bioinformatical scientific applications, in particular for genome analysis.