DIRAC services
Services FG-DIRAC DIRAC4EGI FG-DIRAC beyond France-Grilles Maintenance, operation Practically all the DIRAC@IN2P3 members are involved How this can be presented to the benefit of DIRAC@IN2P3 project Testing ground ? DIRAC4EGI CPPM together with UB and Cyfronet offered to maintin the service Awaiting the EGI answer Should DIRAC@IN2P3 be involved ? Playing ground for various activities, e.g. cloud management, COMDIRAC, data management FG-DIRAC beyond France-Grilles Merge FG-DIRAC and DIRAC4EGI Keep logically separate but technically unique Service administration tools should be further developed Part of the DIRAC@IN2P3 contract ?
The cloud case
Clouds VM scheduler developed for Belle MC production system Dynamic VM spawning taking Amazon EC2 spot prices and Task Queue state into account Discarding VMs automatically when no more needed The DIRAC VM scheduler by means of dedicated VM Directors is interfaced to OCCI compliant clouds: OpenStack, OpenNebula Apache-libcloud API compliant clouds Amazon EC2
VMDIRAC 2 VM submission ToDo Cloud endpoint abstraction Implementation Apache-libcloud ROCCI EC2 CloudDirector similar to SiteDirector ToDo Cloud endpoint testing/monitoring tools for site debugging Follow the endpoint interface evolution
VMDIRAC 2 VM contextualization (current) Standard minimal images No DIRAC proper images, no image maintenance costs, but … Cloudinit mechanism only Using a passwordless certificate passed as user data mardirac.in2p3.fr host certificate Using bootstrapping scripts similar to LHCb Vac/Vcycle Using pilot 2.0 On the fly installation of DIRAC, CVMFS, … Takes time, can be improved with custom images Starting VirtualMachineMonitorAgent Monitor and report the VM state, VM heartbeats Halt the VM in case of no activity Getting instructions from the central service, e.g. to halt the VM Starting as many pilots as they are cores ( single core jobs ) Starting one pilot for
VMDIRAC 2 VM contextualization in the works The goal Bootstrapping scripts shared with the Pilot package introduced recently Single pilot per VM capable to run multiple payloads single or multi-core Same logic as for multi-core queues VMMonitor agent enhanced logic Halting on no activity Signaling pilots to stop Machine Job Features The goal Make a fully functional dynamic cloud computing resource allocation system taking into account group fair shares
VMDIRAC 2 VM web application Enhanced monitoring, accounting No Google tools ! VM manipulation by administrators Start, halt, other instructions to the VMMonitor agent Possibility to connect to VM to debug problems Web terminal console On the fly public IP assignment
The supercomputer case
The supercomputer case Multiple HPC centers are available for large scientific communities E.g., HEP experiments started to have access to a number of HPC centers Using traditional HTC applications Filling in the gaps of empty slots Including HPC into their data production systems Advantages of federating HPC centers More users and applications for each centers - better efficiency of usage Elastic usage: users can have more resources for a limited time period Example: Partnership for Advanced Computing in Europe, PRACE Common agreements on sharing HPC resources No common interware for a uniform access
The supercomputer case Unlike grid sites, HPC centers are not uniform Different access protocols Different user authentication methods Different batch systems Different connectivity to outside world If we want to include HPC centers into a common infrastructure we have to find a way to overcome these differences Pilot agents can be very helpful here Needs effort from both interware and HPC center sides
HPC example Pilot submitted to the batch system through an (GSI)SSH tunnel Pilot communicates with the DIRAC service through the Gateway proxy service Output upload to the target SE through the SE proxy
Co-design problem of distributed HPC Common requirements for HPC Outside world connectivity User authentication SSO schema with federated identity providers Users representing whole communities Application software provisioning Monitoring, accounting Can be delegated to the Interware level Support from interware Common model for HPC resources description Algorithms for HPC workload management with more complex payload requirements specification Uniform user interface Support from applications Allow running in multiple HPC centers e.g. standardized MPI libraries Granularity
Towards Open Distributed Supercomputer Infrastructure A common project involving several supercomputer centers Lobachevsky, NNU HybriLIT, JINR, Dubna CC/IN2P3, Lyon Mesocenter, AMU, Marseille LRZ, … The goal is to provide necessary components to include supercomputers into a common infrastructures Together with other types of resources Based on the DIRAC interware technology Several centers are already connected Simple “grid”-like applications, multi-core applications Multi-processor, multi-node applications are in the works
Publications Workflows Big Data HPC Clouds COMDIRAC High level workflow treatment Metadata in workflows Big Data ?? HPC WMS for HPC ( reservation, masonry, multi-core, multi-host ) WMS for hybrid HPC/HTC/Cloud systems Clouds Managing cloud resources with community policies/shares/quotas COMDIRAC Interface to a distributed computer ( FSDIRAC included ? )