1 Building application portals with DIRAC A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 27 April 2010, Journée LuminyGrid, Marseille
2 Outline Scope of the DIRAC project DIRAC Overview User Interfaces Web Portals Conclusion
Large VO issues HEP experiments collect unprecedented volumes of data to be processed on large amount of geographically distributed computing resources 10s of PBytes of data per year 10s of thousands CPUs in 100s of centers 1000s of users from 100s of institutions However, other application domains are quickly approaching these scales Large user communities (Virtual Organizations) have specific problems Dealing with heterogeneous resources Various computing clusters, grids, etc Dealing with the intracommunity workload management User group quotas and priorities Priorities of different activities Dealing with a variety of applications Massive data productions Individual user applications, etc 3
General problems and solutions Overcome deficiencies of the standard grid middleware Inefficiencies, failures Production managers can afford that, users can not Lacking specific functionality Alleviate the excessive burden from sites – resource providers – in supporting multiple VOs Avoid complex VO specific configuration on sites Avoid VO specific services on sites The complexity of managing the VO workload resulted in specific software layer on top of the standard grid middleware. Among the LHC experiments AliEn in Alice PanDA in Atlas GlideIn WMS, Phedex in CMS DIRAC in LHCb 4
DIRAC and LHCb LHCb experiment is dedicated to the study of CP-violation in b-systems Smallest of the 4 big LHC experiments ~500 physicists ~60 institutes from 15 countries Nevertheless, computing is also a challenge…. DIRAC was originally developed as a distributed data production and analysis system used by the LHCb experiment Includes workload and data management components Started with the MC data production tasks Extended to data processing and user analysis The goal was: ●integrate all the heterogeneous computing resources available to LHCb ●Minimize human intervention at LHCb sites 5
DIRAC Project scope Large volumes of Computing and Storage resources are supplied by various institutes Grids define common rules of how to access these resources Users organized in virtual communities are providing payloads to be executes on the Grid resources Middleware is a set of software components enabling users to exploit the Grid resources DIRAC provides a complete set of middleware components for workload and data management tasks User community perspective 6
Brief DIRAC Overview 7
DIRAC Framework Services oriented architecture DIRAC systems consist of services, light distributed agents and client tools All the communications between the distributed components are secure DISET custom client/service protocol Control and data communications X509, GSI security standards Fine grained authorization rules Per individual user FQAN Per service interface method Per job 8
DIRAC base services Redundant Configuration Service Provides service discovery and setup parameters for all the DIRAC components Full featured proxy management system Proxy storage and renewal mechanism Support for multiuser pilot jobs System Logging service Collect essential error messages from all the components Monitoring service Monitor the service and agents behavior 9
Workload Management Workload Management System with Pilot Jobs Originally introduced by DIRAC Increase the visible user job efficiency Allow efficient and precise application of the community policies Allow to aggregate heterogeneous resources 10
Physicist User EGEE Pilot Director EGEE Grid NDG Pilot Director NDG Grid EELA Pilot Director EELA Grid CREAM Pilot Director CREAM CE Matcher Service Production Manager
WMS performance DIRAC performance measured in the recent Data Challenges and Production runs Up to 25K concurrent jobs in ~120 distinct sites One mid-range central server hosting DIRAC services Further optimizations to increase capacity are possible ●Hardware, database optimizations, service load balancing, etc 12
Support for MPI Jobs MPI Service developed for applications in the EELA Grid Non-HEP applications: Astrophysics, BioMed, Seismology No special MPI support on sites is required MPI software installed by Pilot Jobs Use site MPI support if exists MPI ring usage optimization Ring reuse for multiple jobs Lower load on the gLite WMS Variable ring sizes for different jobs 13
Other DIRAC components Request Management System Collect and execute asynchronously any kind of operation that can fail Data upload and registration Job status and parameter reports, etc Essential in the ever unstable Grid environment Production Management System Automatic data processing jobs creation and submission according to predefined scenarios Complex Workflow management Organization and handling of O(100K) of jobs Data Management System Full featured File Replica and Metadata Catalogs Automatic data replication Storage resources monitoring, data integrity checking 14
User Interfaces 15
DIRAC user interfaces Easy client installation for various platforms (Linux, MacOS) Includes security components JDL notation for job description Simplified with respect to the « standard » JDL Command line tools à la gLite UI commands e.g. dirac-wms-job-submit Extensive Python API for all the tasks Job creation and manipulation, results retrieval Possibility to use complex workflow templates Data operations, catalog inspection Used by the GANGA user front-end 16
17 Example job submission from DIRAC.Interfaces.API.Dirac import Dirac from Extensions.LHCb.API.LHCbJob import LHCbJob … myJob = LHCbJob() myJob.setCPUTime(50000) myJob.setSystemConfig('slc4_ia32_gcc34') myJob.setApplication('Brunel','v32r3p1','RealDataDst200Evts.opts','LogFileName.log') myJob.setName('DIRAC3-Job') myJob.setInputData(['/lhcb/data/CCRC08/RAW/LHCb/CCRC/420157/420157_ raw']) #myJob.setDestination('LCG.CERN.ch') dirac = Dirac() jobID = dirac.submit(myJob) … dirac.status( ) dirac.parameters( ) dirac.loggingInfo( ) … dirac.getOutputSandbox( )
DIRAC: Secure Web Portal Web portal with intuitive desktop application like interface Ajax, Pylons, ExtJS Javascript library Monitoring and control of all activities User job monitoring and manipulation Data production controls DIRAC Systems configuration Secure access Standard grid certificates Fine grained authorization rules Web pages for standard DIRAC tasks System Configuration, Services administration Job Monitoring and Controls Resources Accounting and Monitoring 18
Web Portal: example interfaces 19
LHCb Web: Bookkeeping page 20 Interface to the LHCb Metadata Catalog Part of the LHCb DIRAC Web Portal
LHCb Web: Production Requests 21 Comprehensive forms to define Data Production requests Multiple input parameters with the help for the parameter choices Support for the complex request verification and approval procedure
Web Portal: user tasks Job submission through the Web Portal Full GSI security Sandboxes uploading and downloading Generic Job Launchpad panel exists in the basic DIRAC Web Portal Can be useful for newcomers and occasional users Specific application Web Portals can be derived Community Application Servers All the grid computational tasks steered on the web VO “formation” DIRAC instance to be deployed at CC/IN2P3 22
DIRAC: Getting started Get your Grid certificate usercert.p12 dirac-cert-convert.sh to convert it to PEM format Register in a Grid VO To have access to the Grid resources Delegate user proxy proxy-init –g dirac_user Start using the DIRAC Web Portal 23
DIRAC Installations Latin American EELA Grid Part of the production infrastructure of the GISELA Grid Astrophysics, BioMed, Seismology applications HEP Experiments ILC Collaboration Belle Collaboration at KEK, Japan Using Amazon EC2 Cloud Computing Installation at CC/IN2P3, Lyon for the VO vo.formation.idgrilles.fr (training program) dirac.in2p3.fr Documentation is in preparation 24
DIRAC Installations An installation can be prepared and maintained for the LuminyGrid users This is a proposal to be discussed Ensure user access to the LuminyGrid, France NGI and EGEE resources Multiple VO support with the single DIRAC instance No need to have VO experts in the DIRAC services administration Support for specific VO applications Porting the applications to the grid Help in developing specific Web portals Interfacing existing Web Portals to the DIRAC backend 25
26 Conclusions DIRAC project provides a secure framework for building distributed computing (grid) system DIRAC is providing a complete middleware stack Can integrate standard (gLite) services as well The DIRAC Framework can be used to build application specific services and Web Portals Based on our many years experience, we are looking for ways how to help the other users in porting their applications to the Grid and making the Grid usage a fun
Backup slides 27
28 DIRAC development environment Python is the main development language Fast prototyping/development cycle Platform independence MySQL database for the main services ORACLE database backend for the LHCb Metadata Catalog Modular architecture allowing an easy customization for the needs of a particular community Simple framework for building custom services and agents