Download presentation
Presentation is loading. Please wait.
1
Управление большими массивами разнородных вычислительных ресурсов часть 2
А.Ю.Царегородцев, ведущий инженер-исследователь CPPM-IN2P3-CNRS, Marseille, главный научный сотрудник кафедры ЛОТАБД РЭУ им. Плеханова РЭУ, Москва, 6-7 октября 2016
2
Part 2 Managing large volumes of data User Interfaces
Interware approach Heterogeneous storage systems Distributed logical file system User Interfaces Development framework Examples of usage Large scientific communities DIRAC as a general purpose service Putting together resources at PRUE and other universities Qui a besoin de plus qu’un seul ordinateurs pour le travail ?
3
Big Data Data that exceeds the boundaries and sizes of normal processing capabilities, forcing you to take a non-traditional approach for the treatment Google trends: Big Data Cloud Computing Grid Computing Qui a besoin de plus qu’un seul ordinateurs pour le travail ?
4
DM Problem to solve There are many different formats in which data can be stored Structured and non-structured databases, objects stores, file systems, etc DM problem addressed By DIRAC Data is partitioned in files File replicas are distributed over a number of Storage Elements world wide Data Management tasks Initial File upload Catalog registration File replication File access/download Integrity checking File removal Need for transparent file access for users Often working with multiple ( tens of thousands ) files at a time Make sure that ALL the elementary operations are accomplished Automate recurrent operations
5
Definitions Элемент хранения данных (Storage Element, SE) – интернет сервис для удаленного доступа к системе накопления данных Протокол доступа (Access Protocol) – программный интерфейс для взаимодействия с удаленным сервисом Access Control List (ACL) – набор правил доступа к данным для пользователей и групп пользователей
6
Storage plugins Storage element abstraction with a client implementation for each access protocol DIPS – DIRAC data transfer protocol FTP, HTTP, WebDAV SRM, XROOTD, RFIO, DCAP, etc HEP centers specific protocols Using gfal2 library developed at CERN S3, Swift, CDMI: cloud specific data access protocols Like with CE’s, each SE is seen by the clients as a logical entity With some specific operational properties Archive, limited access, etc SE’s can be configured with multiple protocols Including new data access technologies requires creating new specific plug-in
7
Storage Element Proxy SE Proxy Service translates the DIRAC data transfer protocol to a particular storage protocol Using DIRAC authentication Using credentials specific to the target storage system SE Proxy Service allows access to storages not having access libraries on a given client machine DIRAC or HTTP protocol Allows third party like transfers between incompatible storages
8
File Catalog Service File Catalog is a service to keep track of all the physical file replicas in all the SE’s Stores also file properties: Size, creation/modification time stamps, ownership, checksums User ACLs DIRAC relies on a central File Catalog Defines a single logical name space for all the managed data Organizes files hierarchically like in common file systems Other projects, e.g. distributed file systems, keep file data in multiple distributed databases More scalable Maintaining data integrity is very difficult
9
File Catalog Interware
DIRAC, as for other components, defines an abstraction of a File Catalog service with several implementations LCG File Catalog (LFC) – de facto standard grid catalog (obsoleted) Alien File Catalog - catalog developed by the Alice experiment at LHC at CERN DIRAC File Catalog File Catalog implementation by the DIRAC Project itself Several catalogs can be used together The mechanism is used to send messages to “pseudocatalog” services: Transformation service (see later) Community specific services, e.g. Bookkeeping Service of LHCb A user sees it as a single catalog with additional features
10
Combined data API Together with the data access components DFC allows to present data to users as a single global file system Can be even mounted as a file system partition on a user computer (FSDIRAC project) DataManager API is a single client interface for logical data operations
11
File Catalog: Metadata
DFC is Replica and Metadata Catalog User defined metadata The same hierarchy for metadata as for the logical name space Metadata associated with files and directories Allow for efficient searches Efficient Storage Usage reports Suitable for user quotas Example query: find /lhcb/mcdata LastAccess < GaussVersion=v1,v2 SE=IN2P3,CERN Name=*.raw Result of file search is a precise list of corresponding files Unlike Google index
12
Massive data operations
DIRAC is dealing with large volumes of scientific data 10’s of Petabytes of files and directories There is a need for massive (bulk) operations Examples: Replicate 105 files from SE A to SE B Remove 105 files and all their replicas in all the storages Massive data operations require Asynchronous execution Automatic failure recovery Data integrity checking
13
Request Management system for asynchronous operations
Request Management System (RMS) receives and executes asynchronously requests for any kind of operation Data upload and registration Job status and parameter reports RMS is used heavily as part of the failure recovery procedure Any operation that can fail can be deferred to the RMS system Requests are collected by RMS instances at geographically distributed sites Extra redundancy in RMS service availability Requests are forwarded to the central Request Database For keeping track of the pending requests For efficient bulk request execution RequestExecution agents execute the stored requests With multiple retries if necessary until the operation is successful
14
Transformation System for data driven workflows
Data driven workflows as chains of data transformations Transformation: input data filter + recipe to create tasks Tasks are created as soon as data with required properties is registered into the system Tasks: jobs, data replication, etc Transformations can be used for automatic data driven bulk data operations Scheduling RMS tasks Often as part of a more general workflow
15
Bulk data transfers Replication/Removal Requests with multiple files are stored in the RMS By users, data managers, Transformation System The Request Executing Agent invokes a Replication Operation executor Performs the replication itself or Delegates replication to an external service E.g. File Transfer Service (FTS) A dedicated FTSManager service keeps track of the submitted FTS requests FTSMonitor Agent monitors the request progress, updates the FileCatalog with the new replicas Other data moving services can be connected as needed EUDAT, OneData
16
Other data services Data logging Data provenance Data integrity
Keeping a history of all operation for a given file Data provenance Keeping ancestor-descendant relations for each file Data integrity Collecting reports on all the data access failures Automated data recovery and validation Storage usage reports Storage resources consumption at any moment Help Data Managers Allow to impose user quotas Accounting Storage consumption Data transfer traffic and error rates
17
Interfaces
18
DM interfaces Command line tools COMDIRAC REST interface
Multiple dirac-dms-… commands COMDIRAC Representing the logical DIRAC file namespace as a parallel shell dls, dcd, dpwd, dfind, ddu etc commands dput, dget, drepl for file upload/download/replication REST interface Suitable for use with application portals WS-PGRADE portal is interfaced with DIRAC this way
19
Web Portal Desktop paradigm for the DIRAC Web interface
Intuitive for most of the users
20
Web Portal applications
21
DIRAC for CTA: DIRAC File Catalog
In use since 2012 in parallel with LFC. Full migration to DFC in summer 2015 More than 21 M of replicas registered About 10 meta-data defined to characterize MC datasets DFC web interface Query example: Catalog browsing cta-prod3-query --site=Paranal --particle=gamma --tel_sim_prog=simtel --array_layout=hex --phiP=180 --thetaP=20 --outputType=Data Typical queries return several hundreds of thousands of files Metadata selection L.Arrabito, LUPM Query result
22
Distributed Computer DIRAC is aiming at providing an abstraction of a single computer for massive computational and data operations from the user perspective Logical Computing and Storage elements (Hardware ) Global logical name space ( File System ) Desktop-like GUI Applications can be interfaced with graphical user interfaces in the DIRAC Web Portal framework
23
DIRAC Framework
24
DIRAC Framework Services Agents Clients Commands
DIRAC systems consist of well defined components with clear recipes for developing Services passive components reacting to client request Keep their state in a database Agents Light permanently running distributed components, animating the whole system Clients API’s used in user interfaces as well as in agent-service, service-service communications Commands Small applications using DIRAC APIs to perform a single well defined operation
25
DIRAC Framework All the communications between the distributed components are secure DISET custom client/service protocol Focus on efficiency Control and data transfer communications X509, GSI security standards Users and services are provided with digital certificates User certificate proxies ( passwordless, time limited temporary certificate copies ) are used for distributed operations on the users’s behalf Fine grained service access authorization rules
26
DIRAC Framework Standard rules to create DIRAC extension
LHCbDIRAC, BESDIRAC, ILCDIRAC, … Just create services specific for the community workflow Software releases, distribution and discovery are done by the DIRAC framework Using base services for standard tasks: configuration, monitoring, logging, etc Large part of the functionality is implemented as plugins Almost the whole DFC service is implemented as a collection of plugins Allows to customize the DIRAC functionality for a particular application with minimal effort Examples Support for datasets first added to the BESDIRAC LHCb has a custom Directory Tree module in the DIRAC File Catalog Custom workflow (massive job execution) for the CTA Astrophysics experiment Using base services for standard tasks: configuration, monitoring, logging, etc
27
DIRAC base services Configuration Service Provides service discovery and setup parameters for all the DIRAC components Backbone service of any distributed computing system Must be 100% available Multiple redundancy with automatically synchronized Slave Configuration Servers Single Master Server to ensure integrity of the configuration data
28
Accounting Comprehensive accounting of all the operations
Publication ready quality of the plots Plotting service can be used by users for there own data
29
Other framework services
Proxy Certificate Management service Proxy certificate storage and renewal mechanism Provisioning of short living limited proxies to perform operations on behalf of a user System Logging service Collect essential error messages from all the components Generate reports and alarms for the system administrators Monitoring service Monitor the service and agents behavior Load, CPU consumption, general availability Security Logging service Keep traces of all the service access events Mandatory service to track down incidents of the system misuse
30
Interware Users
31
LHCb Collaboration About 600 researchers from 40 institutes
Up to 100K concurrent jobs in ~120 distinct sites Equivalent to running a virtual computing center with a power of 100K CPU cores, which corresponds roughly to ~ 1PFlops Limited mostly by available capacity Further optimizations to increase the capacity are possible Hardware, database optimizations, service load balancing, etc
32
LHCb Production system
Making use of almost all the DIRAC service combining them to support the LHCb complex workflow Based on the DIRAC Transformation System Multiple extensions and custom plug-ins Data driven payload generation based on templates Generating data processing and replication tasks LHCb specific templates and catalogs
33
Experiments: Belle II Combination of the non-grid, grid sites and (commercial) clouds is a requirement 2 GB/s, 40 PB of data per year in 2019 Belle II grid resources WLCG, OSG grids KEK Computing Center Amazon EC2 cloud Thomas Kuhr, Belle II
34
Belle II DIRAC Scalability tests Hideki Miyake, KEK
35
Community installations
ILC/CLIC detector Collaboration, Calice VO Dedicated installation at CERN, 10 servers, DB-OD MySQL server MC simulations DIRAC File Catalog was developed to meet the ILC/CLIC requirements BES III, IHEP, China Using DIRAC DMS: File Replica and Metadata Catalog, Transfer services Dataset management developed for the needs of BES III CTA CTA started as France-Grilles DIRAC service customer Now is using a dedicated installation at PIC, Barcelona Using complex workflows Geant4 Dedicated installation at CERN Validation of MC simulation software releases DIRAC evaluations by other experiments LSST, Auger, TREND, Daya Bay, Juno, ELI, NICA, … Evaluations can be done with general purpose DIRAC services
36
Virtual Imaging Platform
Platform for medical image simulations at CREATIS, Lyon Example of a combined use of an Application Portal and DIRAC WMS Web portal with robot certificate File transfers, user/group/application management Workflow engine Generate jobs, (re-)submit, monitor, replicate DIRAC Resource provisioning, job scheduling Grid resources biomed VO LFC
37
DIRAC as a service DIRAC client is easy to install
Part of a usual tutorial DIRAC services are easy to install but Needs dedicated hardware for hosting Configuration, maintenance needs expert manpower Monitoring computing resources is a tedious every-day task Small user communities can not afford maintaining dedicated DIRAC services Still need easy access to computing resources Large grid infrastructures can provide DIRAC services for their users.
38
National services DIRAC services are provided by several National Grid Initiatives: France, Spain, Italy, UK, China, Romania … Support for small communities Heavily used for training and evaluation purposes Example: France-Grilles DIRAC service Hosted by the CC/IN2P3, Lyon Distributed administrator team 5 participating universities 15 VOs, ~100 registered users In production since May 2012 >12M jobs executed in the last year At ~90 distinct sites
39
DIRAC4EGI service In production since 2014 DIRAC4EGI activity snapshot
Partners Operated by EGI Hosted by CYFRONET, Krakow DIRAC Project providing software, consultancy 10 Virtual Organizations enmr.eu vlemed eiscat.se fedcloud.egi.eu training.egi.eu … Usage > 6 million jobs processed in the last year DIRAC4EGI activity snapshot
40
Multi-VO DIRAC services
Multi-VO DIRAC services are used by user communities in various scientific domains Life sciences, for example: Virtual Imaging Project (VIP): analysis of medical images WeNMR – molecular dynamics modeling Climate studies (Eiscat-3D project) Complex systems Services are also used for training programs to help users start using large computing infrastructures
41
At universities Several universities use DIRAC services for their local users Barcelona University Students course in distributed computing M3AMU Project at Aix-Marseille University Integration of computing resources available in the university units T2 grid site at Centre de Physique des Particlues HPC Mesocenter Small clusters and file servers in different laboratories Integration is done using DIRAC Interware Support for biomedical applications Student training programs
42
PRUE resources integration
Integration of various computing resources can be also done in Plekhanov University and its partners Cloud resources of the LCTBDA laboratory Cloud resources at JINR, Dubna Other computing and storage resources available at PRUE’s units Campus volunteer grid ? Can be a good project operated by the students community Work in progress to make a list of applications that can benefit from the integrated computing infrastructure Support for porting and running applications in the integrated computing resources is the essential part of the project
43
Conclusions http://diracgrid.org
Computational grids, clouds, supercomputers and volunteer grids are no more something exotic, they are used in a daily work for various applications The interware technology allows integration of different kinds of grids, clouds and other computing and storage resources transparently for the users DIRAC Interware provides a framework for building distributed computing systems and a rich set of ready to use services. This is used now in a number of DIRAC service projects on a regional and national levels Services based on DIRAC technologies can help users to get started in the world of distributed computations and reveal its full potential
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.