А.Ю.Царегородцев, ведущий инженер-исследователь

Slides:



Advertisements
Similar presentations
High Performance Computing Course Notes Grid Computing.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
1 Managing distributed computing resources with DIRAC A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille September 2011, NEC’11, Varna.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
DIRAC Web User Interface A.Casajus (Universitat de Barcelona) M.Sapunov (CPPM Marseille) On behalf of the LHCb DIRAC Team.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Grid Initiatives for e-Science virtual communities in Europe and Latin America DIRAC TEAM CPPM – CNRS DIRAC Grid Middleware.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
VMware vSphere Configuration and Management v6
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
The GridPP DIRAC project DIRAC for non-LHC communities.
DIRAC 4 EGI: Report on the experience R.G. 1,3 & A.Tsaregorodtsev 2,3 1 Universitat de Barcelona 2 Centre de Physique des Particules de Marseille 3 DIRAC.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
1 Cherenkov Telescope Array: a production system prototype L. Arrabito 1 C. Barbier 2, J. Bregeon 1, A. Haupt 3, N. Neyroud 2 for the CTA Consortium 1.
Jiri Chudoba for the Pierre Auger Collaboration Institute of Physics of the CAS and CESNET.
The GridPP DIRAC project DIRAC for non-LHC communities.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
LHCb/DIRAC week A.Tsaregorodtsev, CPPM 7 April 2011.
Core and Framework DIRAC Workshop October Marseille.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
DIRAC Distributed Computing Services A. Tsaregorodtsev, CPPM-IN2P3-CNRS FCPPL Meeting, 29 March 2013, Nanjing.
1 DIRAC Project Status A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 10 March, DIRAC Developer meeting.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
1 Building application portals with DIRAC A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 27 April 2010, Journée LuminyGrid, Marseille.
Multi-community e-Science service connecting grids & clouds R. Graciani 1, V. Méndez 2, T. Fifield 3, A. Tsaregordtsev 4 1 University of Barcelona 2 University.
Distributed Computing Framework A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille EGI Webinar, 7 June 2016.
1 The Life-Science Grid Community Tristan Glatard 1 1 Creatis, CNRS, INSERM, Université de Lyon, France The Spanish Network for e-Science 2/12/2010.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
PaaS services for Computing and Storage
Jean-Philippe Baud, IT-GD, CERN November 2007
CernVM-FS vs Dataset Sharing
Accessing the VI-SEEM infrastructure
StoRM: a SRM solution for disk based storage systems
Overview of the Belle II computing
Data Bridge Solving diverse data access in scientific applications
POW MND section.
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
DIRAC services.
Introduction to Data Management in EGI
Project Status A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille,
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Grid Computing.
Recap: introduction to e-science
Grid Deployment Board meeting, 8 November 2006, CERN
THE STEPS TO MANAGE THE GRID
CHAPTER 3 Architectures for Distributed Systems
LCG Monte-Carlo Events Data Base: current status and plans
Discussions on group meeting
VMDIRAC status Vanessa HAMAR CC-IN2P3.
Data Management cluster summary
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Cloud Management Mechanisms
Module 01 ETICS Overview ETICS Online Tutorials
Cloud computing mechanisms
#01 Client/Server Computing
Presentation transcript:

Управление большими массивами разнородных вычислительных ресурсов часть 2 А.Ю.Царегородцев, ведущий инженер-исследователь CPPM-IN2P3-CNRS, Marseille, главный научный сотрудник кафедры ЛОТАБД РЭУ им. Плеханова РЭУ, Москва, 6-7 октября 2016

Part 2 Managing large volumes of data User Interfaces Interware approach Heterogeneous storage systems Distributed logical file system User Interfaces Development framework Examples of usage Large scientific communities DIRAC as a general purpose service Putting together resources at PRUE and other universities Qui a besoin de plus qu’un seul ordinateurs pour le travail ?

Big Data Data that exceeds the boundaries and sizes of normal processing capabilities, forcing you to take a non-traditional approach for the treatment Google trends: Big Data Cloud Computing Grid Computing Qui a besoin de plus qu’un seul ordinateurs pour le travail ?

DM Problem to solve There are many different formats in which data can be stored Structured and non-structured databases, objects stores, file systems, etc DM problem addressed By DIRAC Data is partitioned in files File replicas are distributed over a number of Storage Elements world wide Data Management tasks Initial File upload Catalog registration File replication File access/download Integrity checking File removal Need for transparent file access for users Often working with multiple ( tens of thousands ) files at a time Make sure that ALL the elementary operations are accomplished Automate recurrent operations

Definitions Элемент хранения данных (Storage Element, SE) – интернет сервис для удаленного доступа к системе накопления данных Протокол доступа (Access Protocol) – программный интерфейс для взаимодействия с удаленным сервисом Access Control List (ACL) – набор правил доступа к данным для пользователей и групп пользователей

Storage plugins Storage element abstraction with a client implementation for each access protocol DIPS – DIRAC data transfer protocol FTP, HTTP, WebDAV SRM, XROOTD, RFIO, DCAP, etc HEP centers specific protocols Using gfal2 library developed at CERN S3, Swift, CDMI: cloud specific data access protocols Like with CE’s, each SE is seen by the clients as a logical entity With some specific operational properties Archive, limited access, etc SE’s can be configured with multiple protocols Including new data access technologies requires creating new specific plug-in

Storage Element Proxy SE Proxy Service translates the DIRAC data transfer protocol to a particular storage protocol Using DIRAC authentication Using credentials specific to the target storage system SE Proxy Service allows access to storages not having access libraries on a given client machine DIRAC or HTTP protocol Allows third party like transfers between incompatible storages

File Catalog Service File Catalog is a service to keep track of all the physical file replicas in all the SE’s Stores also file properties: Size, creation/modification time stamps, ownership, checksums User ACLs DIRAC relies on a central File Catalog Defines a single logical name space for all the managed data Organizes files hierarchically like in common file systems Other projects, e.g. distributed file systems, keep file data in multiple distributed databases More scalable Maintaining data integrity is very difficult

File Catalog Interware DIRAC, as for other components, defines an abstraction of a File Catalog service with several implementations LCG File Catalog (LFC) – de facto standard grid catalog (obsoleted) Alien File Catalog - catalog developed by the Alice experiment at LHC at CERN DIRAC File Catalog File Catalog implementation by the DIRAC Project itself Several catalogs can be used together The mechanism is used to send messages to “pseudocatalog” services: Transformation service (see later) Community specific services, e.g. Bookkeeping Service of LHCb A user sees it as a single catalog with additional features

Combined data API Together with the data access components DFC allows to present data to users as a single global file system Can be even mounted as a file system partition on a user computer (FSDIRAC project) DataManager API is a single client interface for logical data operations

File Catalog: Metadata DFC is Replica and Metadata Catalog User defined metadata The same hierarchy for metadata as for the logical name space Metadata associated with files and directories Allow for efficient searches Efficient Storage Usage reports Suitable for user quotas Example query: find /lhcb/mcdata LastAccess < 01-01-2012 GaussVersion=v1,v2 SE=IN2P3,CERN Name=*.raw Result of file search is a precise list of corresponding files Unlike Google index

Massive data operations DIRAC is dealing with large volumes of scientific data 10’s of Petabytes 107-108 of files and directories There is a need for massive (bulk) operations Examples: Replicate 105 files from SE A to SE B Remove 105 files and all their replicas in all the storages Massive data operations require Asynchronous execution Automatic failure recovery Data integrity checking

Request Management system for asynchronous operations Request Management System (RMS) receives and executes asynchronously requests for any kind of operation Data upload and registration Job status and parameter reports RMS is used heavily as part of the failure recovery procedure Any operation that can fail can be deferred to the RMS system Requests are collected by RMS instances at geographically distributed sites Extra redundancy in RMS service availability Requests are forwarded to the central Request Database For keeping track of the pending requests For efficient bulk request execution RequestExecution agents execute the stored requests With multiple retries if necessary until the operation is successful

Transformation System for data driven workflows Data driven workflows as chains of data transformations Transformation: input data filter + recipe to create tasks Tasks are created as soon as data with required properties is registered into the system Tasks: jobs, data replication, etc Transformations can be used for automatic data driven bulk data operations Scheduling RMS tasks Often as part of a more general workflow

Bulk data transfers Replication/Removal Requests with multiple files are stored in the RMS By users, data managers, Transformation System The Request Executing Agent invokes a Replication Operation executor Performs the replication itself or Delegates replication to an external service E.g. File Transfer Service (FTS) A dedicated FTSManager service keeps track of the submitted FTS requests FTSMonitor Agent monitors the request progress, updates the FileCatalog with the new replicas Other data moving services can be connected as needed EUDAT, OneData

Other data services Data logging Data provenance Data integrity Keeping a history of all operation for a given file Data provenance Keeping ancestor-descendant relations for each file Data integrity Collecting reports on all the data access failures Automated data recovery and validation Storage usage reports Storage resources consumption at any moment Help Data Managers Allow to impose user quotas Accounting Storage consumption Data transfer traffic and error rates

Interfaces

DM interfaces Command line tools COMDIRAC REST interface Multiple dirac-dms-… commands COMDIRAC Representing the logical DIRAC file namespace as a parallel shell dls, dcd, dpwd, dfind, ddu etc commands dput, dget, drepl for file upload/download/replication REST interface Suitable for use with application portals WS-PGRADE portal is interfaced with DIRAC this way

Web Portal Desktop paradigm for the DIRAC Web interface Intuitive for most of the users

Web Portal applications

DIRAC for CTA: DIRAC File Catalog In use since 2012 in parallel with LFC. Full migration to DFC in summer 2015 More than 21 M of replicas registered About 10 meta-data defined to characterize MC datasets DFC web interface Query example: Catalog browsing cta-prod3-query --site=Paranal --particle=gamma --tel_sim_prog=simtel --array_layout=hex --phiP=180 --thetaP=20 --outputType=Data Typical queries return several hundreds of thousands of files Metadata selection L.Arrabito, LUPM Query result

Distributed Computer DIRAC is aiming at providing an abstraction of a single computer for massive computational and data operations from the user perspective Logical Computing and Storage elements (Hardware ) Global logical name space ( File System ) Desktop-like GUI Applications can be interfaced with graphical user interfaces in the DIRAC Web Portal framework

DIRAC Framework

DIRAC Framework Services Agents Clients Commands DIRAC systems consist of well defined components with clear recipes for developing Services passive components reacting to client request Keep their state in a database Agents Light permanently running distributed components, animating the whole system Clients API’s used in user interfaces as well as in agent-service, service-service communications Commands Small applications using DIRAC APIs to perform a single well defined operation

DIRAC Framework All the communications between the distributed components are secure DISET custom client/service protocol Focus on efficiency Control and data transfer communications X509, GSI security standards Users and services are provided with digital certificates User certificate proxies ( passwordless, time limited temporary certificate copies ) are used for distributed operations on the users’s behalf Fine grained service access authorization rules

DIRAC Framework Standard rules to create DIRAC extension LHCbDIRAC, BESDIRAC, ILCDIRAC, … Just create services specific for the community workflow Software releases, distribution and discovery are done by the DIRAC framework Using base services for standard tasks: configuration, monitoring, logging, etc Large part of the functionality is implemented as plugins Almost the whole DFC service is implemented as a collection of plugins Allows to customize the DIRAC functionality for a particular application with minimal effort Examples Support for datasets first added to the BESDIRAC LHCb has a custom Directory Tree module in the DIRAC File Catalog Custom workflow (massive job execution) for the CTA Astrophysics experiment Using base services for standard tasks: configuration, monitoring, logging, etc

DIRAC base services Configuration Service Provides service discovery and setup parameters for all the DIRAC components Backbone service of any distributed computing system Must be 100% available Multiple redundancy with automatically synchronized Slave Configuration Servers Single Master Server to ensure integrity of the configuration data

Accounting Comprehensive accounting of all the operations Publication ready quality of the plots Plotting service can be used by users for there own data

Other framework services Proxy Certificate Management service Proxy certificate storage and renewal mechanism Provisioning of short living limited proxies to perform operations on behalf of a user System Logging service Collect essential error messages from all the components Generate reports and alarms for the system administrators Monitoring service Monitor the service and agents behavior Load, CPU consumption, general availability Security Logging service Keep traces of all the service access events Mandatory service to track down incidents of the system misuse

Interware Users

LHCb Collaboration About 600 researchers from 40 institutes Up to 100K concurrent jobs in ~120 distinct sites Equivalent to running a virtual computing center with a power of 100K CPU cores, which corresponds roughly to ~ 1PFlops Limited mostly by available capacity Further optimizations to increase the capacity are possible Hardware, database optimizations, service load balancing, etc

LHCb Production system Making use of almost all the DIRAC service combining them to support the LHCb complex workflow Based on the DIRAC Transformation System Multiple extensions and custom plug-ins Data driven payload generation based on templates Generating data processing and replication tasks LHCb specific templates and catalogs

Experiments: Belle II Combination of the non-grid, grid sites and (commercial) clouds is a requirement 2 GB/s, 40 PB of data per year in 2019 Belle II grid resources WLCG, OSG grids KEK Computing Center Amazon EC2 cloud Thomas Kuhr, Belle II

Belle II DIRAC Scalability tests Hideki Miyake, KEK

Community installations ILC/CLIC detector Collaboration, Calice VO Dedicated installation at CERN, 10 servers, DB-OD MySQL server MC simulations DIRAC File Catalog was developed to meet the ILC/CLIC requirements BES III, IHEP, China Using DIRAC DMS: File Replica and Metadata Catalog, Transfer services Dataset management developed for the needs of BES III CTA CTA started as France-Grilles DIRAC service customer Now is using a dedicated installation at PIC, Barcelona Using complex workflows Geant4 Dedicated installation at CERN Validation of MC simulation software releases DIRAC evaluations by other experiments LSST, Auger, TREND, Daya Bay, Juno, ELI, NICA, … Evaluations can be done with general purpose DIRAC services

Virtual Imaging Platform Platform for medical image simulations at CREATIS, Lyon Example of a combined use of an Application Portal and DIRAC WMS Web portal with robot certificate File transfers, user/group/application management Workflow engine Generate jobs, (re-)submit, monitor, replicate DIRAC Resource provisioning, job scheduling Grid resources biomed VO LFC

DIRAC as a service DIRAC client is easy to install Part of a usual tutorial DIRAC services are easy to install but Needs dedicated hardware for hosting Configuration, maintenance needs expert manpower Monitoring computing resources is a tedious every-day task Small user communities can not afford maintaining dedicated DIRAC services Still need easy access to computing resources Large grid infrastructures can provide DIRAC services for their users.

National services DIRAC services are provided by several National Grid Initiatives: France, Spain, Italy, UK, China, Romania … Support for small communities Heavily used for training and evaluation purposes Example: France-Grilles DIRAC service Hosted by the CC/IN2P3, Lyon Distributed administrator team 5 participating universities 15 VOs, ~100 registered users In production since May 2012 >12M jobs executed in the last year At ~90 distinct sites http://dirac.france-grilles.fr

DIRAC4EGI service In production since 2014 DIRAC4EGI activity snapshot Partners Operated by EGI Hosted by CYFRONET, Krakow DIRAC Project providing software, consultancy 10 Virtual Organizations enmr.eu vlemed eiscat.se fedcloud.egi.eu training.egi.eu … Usage > 6 million jobs processed in the last year DIRAC4EGI activity snapshot

Multi-VO DIRAC services Multi-VO DIRAC services are used by user communities in various scientific domains Life sciences, for example: Virtual Imaging Project (VIP): analysis of medical images WeNMR – molecular dynamics modeling Climate studies (Eiscat-3D project) Complex systems CERN@school Services are also used for training programs to help users start using large computing infrastructures

At universities Several universities use DIRAC services for their local users Barcelona University Students course in distributed computing M3AMU Project at Aix-Marseille University Integration of computing resources available in the university units T2 grid site at Centre de Physique des Particlues HPC Mesocenter Small clusters and file servers in different laboratories Integration is done using DIRAC Interware Support for biomedical applications Student training programs

PRUE resources integration Integration of various computing resources can be also done in Plekhanov University and its partners Cloud resources of the LCTBDA laboratory Cloud resources at JINR, Dubna Other computing and storage resources available at PRUE’s units Campus volunteer grid ? Can be a good project operated by the students community Work in progress to make a list of applications that can benefit from the integrated computing infrastructure Support for porting and running applications in the integrated computing resources is the essential part of the project

Conclusions http://diracgrid.org Computational grids, clouds, supercomputers and volunteer grids are no more something exotic, they are used in a daily work for various applications The interware technology allows integration of different kinds of grids, clouds and other computing and storage resources transparently for the users DIRAC Interware provides a framework for building distributed computing systems and a rich set of ready to use services. This is used now in a number of DIRAC service projects on a regional and national levels Services based on DIRAC technologies can help users to get started in the world of distributed computations and reveal its full potential http://diracgrid.org