Distributed Computing Framework A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille EGI Webinar, 7 June 2016.

Slides:



Advertisements
Similar presentations
High Performance Computing Course Notes Grid Computing.
Advertisements

DIRAC Distributed Computing Services
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
Jiri Chudoba for the Pierre Auger Collaboration Institute of Physics of the CAS and CESNET.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
1 Managing distributed computing resources with DIRAC A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille September 2011, NEC’11, Varna.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
OSG Public Storage and iRODS
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
BESIII distributed computing and VMDIRAC
Grid Initiatives for e-Science virtual communities in Europe and Latin America DIRAC TEAM CPPM – CNRS DIRAC Grid Middleware.
YAN, Tian On behalf of distributed computing group Institute of High Energy Physics (IHEP), CAS, China CHEP-2015, Apr th, OIST, Okinawa.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
Transformation System report Luisa Arrabito 1, Federico Stagni 2 1) LUPM CNRS/IN2P3, France 2) CERN 5 th DIRAC User Workshop 27 th – 29 th May 2015, Ferrara.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
The GridPP DIRAC project DIRAC for non-LHC communities.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
VLDATA Common solution for the (very-)large data challenge EINFRA-1, focus on topics (4) & (5)
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
1 Cherenkov Telescope Array: a production system prototype L. Arrabito 1 C. Barbier 2, J. Bregeon 1, A. Haupt 3, N. Neyroud 2 for the CTA Consortium 1.
1 LHCb view on Baseline Services A.Tsaregorodtsev, CPPM, Marseille Ph.Charpentier CERN Baseline Services WG, 4 March 2005, CERN.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Jiri Chudoba for the Pierre Auger Collaboration Institute of Physics of the CAS and CESNET.
The GridPP DIRAC project DIRAC for non-LHC communities.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
LHCb/DIRAC week A.Tsaregorodtsev, CPPM 7 April 2011.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
DIRAC in LHCb and beyond Philippe Charpentier (LHCb-CERN) Slides courtesy of A.Tsaregorodtsev BigPanda Workshop, October Click to edit Master title.
DIRAC Distributed Computing Services A. Tsaregorodtsev, CPPM-IN2P3-CNRS FCPPL Meeting, 29 March 2013, Nanjing.
1 DIRAC Project Status A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 10 March, DIRAC Developer meeting.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
1 Building application portals with DIRAC A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 27 April 2010, Journée LuminyGrid, Marseille.
Multi-community e-Science service connecting grids & clouds R. Graciani 1, V. Méndez 2, T. Fifield 3, A. Tsaregordtsev 4 1 University of Barcelona 2 University.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
А.Ю.Царегородцев, ведущий инженер-исследователь
PaaS services for Computing and Storage
StoRM: a SRM solution for disk based storage systems
DIRAC services.
Readiness of ATLAS Computing - A personal view
Grid Deployment Board meeting, 8 November 2006, CERN
LCG middleware and LHC experiments ARDA project
Discussions on group meeting
WLCG Collaboration Workshop;
VMDIRAC status Vanessa HAMAR CC-IN2P3.
The LHCb Computing Data Challenge DC06
Presentation transcript:

Distributed Computing Framework A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille EGI Webinar, 7 June 2016

Plan 2  DIRAC Project  Origins  Agent based Workload Management System  Accessible computing resources  Data Management  Interfaces  DIRAC users  DIRAC as a service  Conclusions

MB/sec Data flow to permanent storage: 6-8 GB/sec ~ 4 GB/sec 1-2 GB/sec

Worldwide LHC Computing Grid Collaboration (WLCG) >100 PB of data at CERN and major computing centers Distributed infrastructure of 150 computing centers in 40 countries 300+ k CPU cores (~ 2M HEP-SPEC-06) The biggest site with ~50k CPU cores, 12 T2 with 2-30k CPU cores Distributed data, services and operation infrastructure 4

DIRAC Grid Solution  LHC experiments, all developed their own middleware to address the above problems  PanDA, AliEn, glideIn WMS, PhEDEx, …  DIRAC is developed originally for the LHCb experiment  The experience collected with a production grid system of a large HEP experiment is very valuable  Several new experiments expressed interest in using this software relying on its proven in practice utility  In 2009 the core DIRAC development team decided to generalize the software to make it suitable for any user community.  Consortium to develop, maintain and promote the DIRAC software  CERN, CNRS, University of Barcelona, IHEP, KEK  The results of this work allow to offer DIRAC as a general purpose distributed computing framework 5

Interware 6  DIRAC provides all the necessary components to build ad-hoc grid infrastructures interconnecting computing resources of different types, allowing interoperability and simplifying interfaces. This allows to speak about the DIRAC interware.

DIRAC Workload Management 7

Physicist User EGI Pilot Director EGI/WLCG Grid NDG Pilot Director NDG Grid Amazon Pilot Director Amazon EC2 Cloud CREAM Pilot Director CREAM CE Matcher Service Production Manager 8

WMS: using heterogeneous resources  Including resources in different grids and standalone clusters is simple with Pilot Jobs  Needs a specialized Pilot Director per resource type  Users just see new sites appearing in the job monitoring 9

WMS: applying VO policies 10  In DIRAC both User and Production jobs are treated by the same WMS  Same Task Queue  This allows to apply efficiently policies for the whole VO  Assigning Job Priorities for different groups and activities  Static group priorities are used currently  More powerful scheduler can be plugged in ● demonstrated with MAUI scheduler ● Users perceive the DIRAC WMS as a single large batch system

DIRAC computing resources 11

Computing Grids 12  DIRAC was initially developed with the focus on accessing conventional Grid computing resources  WLCG grid resources for the LHCb Collaboration  It fully supports gLite middleware based grids  European Grid Infrastructure (EGI), Latin America GISELA, etc  Using gLite/EMI middleware  Northern American Open Science Grid (OSG)  Using VDT middleware  Northern European Grid (NDGF)  Using ARC middleware  Other types of grids can be supported  As long we have customers needing that

Clouds 13  VM scheduler developed for Belle MC production system  Dynamic VM spawning taking Amazon EC2 spot prices and Task Queue state into account  Discarding VMs automatically when no more needed  The DIRAC VM scheduler by means of dedicated VM Directors is interfaced to  OCCI compliant clouds:  OpenStack, OpenNebula  CloudStack  Amazon EC2

Standalone computing clusters 14  Off-site Pilot Director  Site delegates control to the central service  Site must only define a dedicated local user account  The payload submission through the SSH tunnel  The site can be a single computer or a cluster with a batch system  LSF, BQS, SGE, PBS/Torque, Condor, OAR, SLURM  HPC centers  More to come:  LoadLeveler. etc  The user payload is executed with the owner credentials  No security compromises with respect to external services

15 Data Management

DM Problem to solve 16  Data is partitioned in files  File replicas are distributed over a number of Storage Elements world wide  Data Management tasks  Initial File upload  Catalog registration  File replication  File access/download  Integrity checking  File removal  Need for transparent file access for users  Often working with multiple ( tens of thousands ) files at a time  Make sure that ALL the elementary operations are accomplished  Automate recurrent operations

Storage plugins 17  Storage element abstraction with a client implementation for each access protocol  DIPS, SRM, XROOTD, RFIO, etc  gfal2 based plugin gives access to all protocols supported by the library  DCAP, WebDAV, S3, http, …  iRODS  Each SE is seen by the clients as a logical entity  With some specific operational properties  SE’s can be configured with multiple protocols

File Catalog 18  Central File Catalog ( DFC, LFC, … )  Keeps track of all the physical file replicas  Several catalogs can be used together  The mechanism is used to send messages to “pseudocatalog” services, e.g.  Transformation service (see later)  Bookkeeping service of LHCb  A user sees it as a single catalog with additional features  DataManager is a single client interface for logical data operations

File Catalog 19  DFC is the central component of the DIRAC Data Management system  Defines a single logical name space for all the data managed by DIRAC  Together with the data access components DFC allows to present data to users as single global file system  User ACLs  Rich metadata including user defined metadata

File Catalog: Metadata 20  DFC is Replica and Metadata Catalog  User defined metadata  The same hierarchy for metadata as for the logical name space  Metadata associated with files and directories  Allow for efficient searches  Efficient Storage Usage reports  Suitable for user quotas  Example query:  find /lhcb/mcdata LastAccess < GaussVersion=v1,v2 SE=IN2P3,CERN Name=*.raw

Bulk data transfers 21  Replication/Removal Requests with multiple files are stored in the RMS  By users, data managers, Transformation System  The Replication Operation executor  Performs the replication itself or  Delegates replication to an external service  E.g. FTS  A dedicated FTSManager service keeps track of the submitted FTS requests  FTSMonitor Agent monitors the request progress, updates the FileCatalog with the new replicas  Other data moving services can be connected as needed  EUDAT  Onedata

Transformation System 22  Data driven workflows as chains of data transformations  Transformation: input data filter + recipe to create tasks  Tasks are created as soon as data with required properties is registered into the system  Tasks: jobs, data replication, etc  Transformations can be used for automatic data driven bulk data operations  Scheduling RMS tasks  Often as part of a more general workflow

Interfaces 23

DM interfaces 24  Command line tools  Multiple dirac-dms-… commands  COMDIRAC  Representing the logical DIRAC file namespace as a parallel shell  dls, dcd, dpwd, dfind, ddu etc commands  dput, dget, drepl for file upload/download/replication  REST interface  Suitable for use with application portals  WS-PGRADE portal is interfaced with DIRAC this way

Web Portal 25

Distributed Computer 26  DIRAC is aiming at providing an abstraction of a single computer for massive computational and data operations from the user perspective  Logical Computing and Storage elements (Hardware )  Global logical name space ( File System )  Desktop-like GUI

DIRAC Users 27

LHCb Collaboration 28  Up to 100K concurrent jobs in ~120 distinct sites  Equivalent to running a virtual computing center with a power of 100K CPU cores  Further optimizations to increase the capacity are possible ● Hardware, database optimizations, service load balancing, etc

Community installations 29  Belle II Collaboration, KEK  First use of clouds (Amazon) for data production  ILC/CLIC detector Collaboration, Calice VO  Dedicated installation at CERN, 10 servers, DB-OD MySQL server  MC simulations  DIRAC File Catalog was developed to meet the ILC/CLIC requirements  BES III, IHEP, China  Using DIRAC DMS: File Replica and Metadata Catalog, Transfer services  Dataset management developed for the needs of BES III  CTA  CTA started as France-Grilles DIRAC service customer  Now is using a dedicated installation at PIC, Barcelona  Using complex workflows  Geant4  Dedicated installation at CERN  Validation of MC simulation software releases  DIRAC evaluations by other experiments  LSST, Auger, TREND, Daya Bay, Juno, ELI, NICA, …  Evaluations can be done with general purpose DIRAC services

National services 30  DIRAC services are provided by several National Grid Initiatives: France, Spain, Italy, UK, China, …  Support for small communities  Heavily used for training and evaluation purposes  Example: France-Grilles DIRAC service  Hosted by the CC/IN2P3, Lyon  Distributed administrator team  5 participating universities  15 VOs, ~100 registered users  In production since May 2012  >12M jobs executed in the last year  At ~90 distinct sites

DIRAC4EGI service 31  In production since 2014  Partners  Operated by EGI  Hosted by CYFRONET  DIRAC Project providing software, consultancy   10 Virtual Organizations  enmr.eu  vlemed  eiscat.se  fedcloud.egi.eu  training.egi.eu  …  Usage  > 6 million jobs processed in the last year DIRAC4EGI activity snapshot

DIRAC Framework 32

DIRAC Framework 33  DIRAC has a well defined architecture and development framework  Standard rules to create DIRAC extension  LHCbDIRAC, BESDIRAC, ILCDIRAC, …  Large part of the functionality is implemented as plugins  Almost the whole DFC service is implemented as a collection of plugins  Examples  Support for datasets first added to the BESDIRAC  LHCb has a custom Directory Tree module in the DIRAC File Catalog  Allows to customize the DIRAC functionality for a particular application with minimal effort

Conclusions 34  Computational grids and clouds are no more something exotic, they are used in a daily work for various applications  Agent based workload management architecture allows to seamlessly integrate different kinds of grids, clouds and other computing resources  DIRAC is providing a framework for building distributed computing systems and a rich set of ready to use services. This is used now in a number of DIRAC service projects on a regional and national levels  Services based on DIRAC technologies can help users to get started in the world of distributed computations and reveal its full potential

Demo 35

Demo 36  Using EGI sites and storage elements  Grid sites  Fed Cloud sites  DIRAC Storage Elements  Web Portal  Command line tools  Demo materials to try out off-line can be found here 