PanDA@BBP R.Mashinistov (UTA) TIM@BNL July 20-21
Introduction In 2017 the pilot project between BigPanDA and Blue Brain Project (BBP) (Swiss Federal Institute of Technology in Lausanne) teams began. The proof of concept project aimed to demonstrate efficient application of the PanDA software for the supercomputer-based reconstructions and simulations offering a radically new approach for understanding the multilevel structure and function of the brain. In the first phase, the goal of this joint project is to support the execution of BBP software on a variety of distributed computing systems powered by PanDA. The targeted systems for demonstration include: Intel x86-NVIDIA GPU based BBP clusters located in Geneva (47 TFlops) and Lugano (81 TFlops), BBP IBM BlueGene/Q supercomputer ( 0.78 PFLops and 65 TB of DRAM memory) located in Lugano, the Titan Supercomputer with peak theoretical performance 27 PFlops operated by the Oak Ridge Leadership Computing Facility (OLCF), and Cloud based resources such as Amazon Cloud.
BBP computing infrastructure ssh slurm slurm slurm pbs api & sheduler BlueGene/Q viz cluster lxviz cluster Titan supercomputer On-demand cluster LUGANO GENEVE
Initial PanDA test @ BBP For initial test in March 2017 PanDA Server + Client were installed to the VM at the Campus Biotech (Geneva). Job submission has been done via the python based PanDA Client. Pilot were started manually on Geneve and Lugano clusters including BlueGene/Q. Titan & Amazon resources were integrated on later (on April). SQL https PanDA Server PanDA DB PanDA Client REST API REST API https Pilot Pilot Pilot Resource A Resource B Resource Z TIM@BNL July 20-21
PanDA Portal @ BBP (1/2) Authentication File Catalog Data Transfer System Data Management System Web Services Pre/post processing SQL SQL https PanDA Server PanDA DB PanDA Monitor PanDA Client REST API REST API https Pilot Scheduler Pilot Scheduler Pilot Pilot Pilot Resource A Resource B Resource Z TIM@BNL July 20-21
PanDA Portal @ BBP (2/2) PanDA Portal Authentication File Catalog Data Transfer System Data Management System Web Services Pre/post processing SQL SQL https PanDA Server PanDA DB PanDA Monitor PanDA Client REST API REST API https Pilot Scheduler Pilot Scheduler Pilot Pilot Pilot Resource A Resource B Resource Z TIM@BNL July 20-21
PanDA for BlueBrainProject ssh https https Various interfaces The test ‘Hello world’ jobs were successfully submited to the targeted resources via PanDA portal. CLI REST API WEB UI PanDA Portal@BBP Queues to execute jobs on different resources PanDA queues BlueGene/Q viz cluster lxviz cluster Titan OLCF GENEVE LUGANO
User Interfaces Web forms to define, submit and monitor jobs Authentication LDAP as a first solution Will be replaced by SSO Possible to integrate with Jupyter notebooks (widely used by BlueBrain scientists) Easy to change computing cluster (jobs are not atomic, typically scientist knows which cluster should be used with his computational job) TIM@BNL July 20-21
Impersonation Need to track job submitter on computing cluster Pilot should have access only to the private files of job owner It’s possible to propagate name and group of user inside job definition Root access approach Run pilot under root, use name/group as sbatch params Root writes transformation wrapper with SUID, use name/group as sbatch params inside wrapper No-root access approach Use separate queue for every working group (cannot add cluster user for every project participant) Run pilot under working group manager account Working groups can be mapped with existing groups in LDAP or SSO TIM@BNL July 20-21
Conclusion Phase 1 of “Proof of the concept” was successfully finished The project demonstrated that the software tools and methods for processing large volumes of experimental data, which have been developed initially for experiments at the LHC accelerator, can be successfully applied to BBP. Phase 2 of “Pre-production” is under investigation currently Next most significant tasks are: Data management system Jupyter integration