Download presentation
Presentation is loading. Please wait.
Published byHarry Neal Modified over 8 years ago
1
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN
2
2 Outline DIRAC Data Management Components Storage Element File Catalogs Replica Manager
3
3 DM Components DIRAC Data Management tools are built on top of or provide interfaces to the existing services The main components are: Storage Element and Storage access plug-ins Replica Manager File Catalogs
4
4 DM Components FileCatalogC FileCatalogB SE Service AAAStorage BBBStorage CCCStorage StorageElement ReplicaManager FileCatalogA UserInterface WMS TransferAgent Data Management Clients Physical storage DIRAC Data Management Components
5
5 Storage Element DIRAC StorageElement is an abstraction of a Storage facility The actual access to storage is provided by plug-in modules for each available access protocol. The Storage Element properties are fully determined by its description in the Configuration Service Pluggable transport modules: srm,gridftp,bbftp,sftp,http,… SRM like functionality for protocol (TURL) resolution
6
6 Storage components
7
7 Storage Element description Each Storage Element can be described with several access points Each access points has full description sufficient to construct the corresponding URL Including host,ports,paths Storage Element is identified by its name Aliases are possible Very useful for defining generic Storage Elements with local specification, e.g. Tier1SE, LogSE [CERN_Log] SEName = CERN_Log SEHost = lxb2003.cern.ch SEPath = /storage SEProtocol = GSIFTP SEHost.1 = lxb2003.cern.ch SEPort.1 = 80 SEPath.1 = /storage SEProtocol.1 = HTTP SEHost.2 = lxb2003.cern.ch SEPath.2 = /storage SEProtocol.2 = FTP [LogSE] SEName = CERN_Log
8
8 Storage Element usage Storage Element is used mostly to get access to the files It provides a choice for the Replica Manager (see below) of all the available protocols leaving it to decide which is the best one in the current context. The file PFN is always constructed on the fly No dependency on possible change of the SE end- point – it is sufficient to just change it also in the Configuration Service
9
9 Storage plug-ins Available Plug-in modules are available for multiple protocols: srm,gridftp,bbftp,sftp,http,… The modules are providing all the operations to manage the physical name space of the storage: Creating directories Uploading and getting files and entire directories Removing files and directories Checking existence Getting file sizes and stat parameters
10
10 File Catalogs DIRAC Data Management was designed to work with multiple File Catalogs No clear choice in the beginning Turned out to be useful for redundancy and reliability Easy to incorporate specialized services exposing catalog interface All the catalogs have identical client API’s Can be used interchangeably
11
11 File Catalogs (2) Available catalogs BK database replica tables Limited to just the production data No hierarchy Continued to be used for the time being AliEn File Catalog Was very useful for redundancy and gaining experience Discontinued now LCG File Catalog – LFC Currently the baseline choice see Juan’s presentation POOL XML catalog Simple XML file based catalog Processing Database File Catalog Exposing Processing DB Datafiles and Replicas table as a File Catalog Made it easy to populate the Processing DB
12
12 Replica Manager Replica Manager is providing a high level API for all the data management operations in DIRAC Uploading to Storage Elements and registering files Getting files from the Storage Elements Replication of the files File removal The Replica Manager users do not usually have to deal with the File Catalogs or Storage Elements directly.
13
13 Replica Manager components
14
14 RM functionality Keeps a list of active File Catalogs Initial list is obtained from the Configuration Service at construction time All the registration operations will be applied to all the catalogs Queries are also done for all the catalogs. The result is the union of all the queries
15
15 RM functionality: File replication File replication details are handled entirely by the Repluca Manager: dirac-rm-replicate /lhcb/user/a/atsareg/file.dat PIC_Castor Inside the Replica Manager Will find a replica which can be copied to the destination SE by a third part transfer If no replica can be copied by third party transfer, it will be first copied to the local cache and then transfered to the destination The new PFN will be determined by the intial replica PFN The new replica will be registered in all the active catalogs
16
16 RM functionality: Getting files from the storage Replica Manager applies the « best replica » strategy to get the files: Checks if a replica exists on any local storage Local SE is usually defined in the local configuration or in the global SiteToLocalSEMapping section Attempts to get file with some local protocol ( file or rfio ) In case of failure remote protocol ( e.g. gridftp ) is tried out on the Local SE If no replica found on the local SE or getting it fails then replicas on the remote SEs are tried out one by on until successful download It is possible to define in the configuration that a symbolic link is created instead of the physical copy for given protocols: Can be useful for file or rfio protocols
17
17 Replica Manager usage Command line tools dirac-rm-copyAndRegister [ ] If GUID is not specified, Replica Manager assignes one derived from the file checksum dirac-rm-get dirac-rm-replicate dirac-rm-remove
18
18 Replica Manager usage (2) Job workflow Downloading input sandbox/data Uploading output sandbox/data Transfer Agent All the operations are performed through the Replica Manager Replica Manager returns a result dictionary with a full log of all the performed operations Status codes Timings Transfer Agent performs the retries based on the Replica Manager logs
19
19 Status and outlook Replica Manager was introduced when LCG utilities were not yet available More functionality should move now to the lcg-utils making Replica Manager a thin layer Better “best replica” strategies can be elaborated Interface to the FTS service ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.