Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.

Similar presentations


Presentation on theme: "1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN."— Presentation transcript:

1 1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN

2 2 Outline  DIRAC Data Management Components  Storage Element  File Catalogs  Replica Manager

3 3 DM Components  DIRAC Data Management tools are built on top of or provide interfaces to the existing services  The main components are:  Storage Element and Storage access plug-ins  Replica Manager  File Catalogs

4 4 DM Components FileCatalogC FileCatalogB SE Service AAAStorage BBBStorage CCCStorage StorageElement ReplicaManager FileCatalogA UserInterface WMS TransferAgent Data Management Clients Physical storage DIRAC Data Management Components

5 5 Storage Element  DIRAC StorageElement is an abstraction of a Storage facility  The actual access to storage is provided by plug-in modules for each available access protocol.  The Storage Element properties are fully determined by its description in the Configuration Service  Pluggable transport modules: srm,gridftp,bbftp,sftp,http,…  SRM like functionality for protocol (TURL) resolution

6 6 Storage components

7 7 Storage Element description  Each Storage Element can be described with several access points  Each access points has full description sufficient to construct the corresponding URL Including host,ports,paths  Storage Element is identified by its name  Aliases are possible Very useful for defining generic Storage Elements with local specification, e.g. Tier1SE, LogSE [CERN_Log] SEName = CERN_Log SEHost = lxb2003.cern.ch SEPath = /storage SEProtocol = GSIFTP SEHost.1 = lxb2003.cern.ch SEPort.1 = 80 SEPath.1 = /storage SEProtocol.1 = HTTP SEHost.2 = lxb2003.cern.ch SEPath.2 = /storage SEProtocol.2 = FTP [LogSE] SEName = CERN_Log

8 8 Storage Element usage  Storage Element is used mostly to get access to the files  It provides a choice for the Replica Manager (see below) of all the available protocols leaving it to decide which is the best one in the current context.  The file PFN is always constructed on the fly  No dependency on possible change of the SE end- point – it is sufficient to just change it also in the Configuration Service

9 9 Storage plug-ins  Available Plug-in modules are available for multiple protocols:  srm,gridftp,bbftp,sftp,http,…  The modules are providing all the operations to manage the physical name space of the storage:  Creating directories  Uploading and getting files and entire directories  Removing files and directories  Checking existence  Getting file sizes and stat parameters

10 10 File Catalogs  DIRAC Data Management was designed to work with multiple File Catalogs  No clear choice in the beginning  Turned out to be useful for redundancy and reliability  Easy to incorporate specialized services exposing catalog interface  All the catalogs have identical client API’s  Can be used interchangeably

11 11 File Catalogs (2)  Available catalogs  BK database replica tables Limited to just the production data No hierarchy Continued to be used for the time being  AliEn File Catalog Was very useful for redundancy and gaining experience Discontinued now  LCG File Catalog – LFC Currently the baseline choice see Juan’s presentation  POOL XML catalog Simple XML file based catalog  Processing Database File Catalog Exposing Processing DB Datafiles and Replicas table as a File Catalog Made it easy to populate the Processing DB

12 12 Replica Manager  Replica Manager is providing a high level API for all the data management operations in DIRAC  Uploading to Storage Elements and registering files  Getting files from the Storage Elements  Replication of the files  File removal  The Replica Manager users do not usually have to deal with the File Catalogs or Storage Elements directly.

13 13 Replica Manager components

14 14 RM functionality  Keeps a list of active File Catalogs  Initial list is obtained from the Configuration Service at construction time  All the registration operations will be applied to all the catalogs  Queries are also done for all the catalogs. The result is the union of all the queries

15 15 RM functionality: File replication  File replication details are handled entirely by the Repluca Manager: dirac-rm-replicate /lhcb/user/a/atsareg/file.dat PIC_Castor  Inside the Replica Manager  Will find a replica which can be copied to the destination SE by a third part transfer  If no replica can be copied by third party transfer, it will be first copied to the local cache and then transfered to the destination  The new PFN will be determined by the intial replica PFN  The new replica will be registered in all the active catalogs

16 16 RM functionality: Getting files from the storage  Replica Manager applies the « best replica » strategy to get the files:  Checks if a replica exists on any local storage Local SE is usually defined in the local configuration or in the global SiteToLocalSEMapping section Attempts to get file with some local protocol ( file or rfio ) In case of failure remote protocol ( e.g. gridftp ) is tried out on the Local SE  If no replica found on the local SE or getting it fails then replicas on the remote SEs are tried out one by on until successful download  It is possible to define in the configuration that a symbolic link is created instead of the physical copy for given protocols: Can be useful for file or rfio protocols

17 17 Replica Manager usage  Command line tools  dirac-rm-copyAndRegister [ ] If GUID is not specified, Replica Manager assignes one derived from the file checksum  dirac-rm-get  dirac-rm-replicate  dirac-rm-remove

18 18 Replica Manager usage (2)  Job workflow  Downloading input sandbox/data  Uploading output sandbox/data  Transfer Agent  All the operations are performed through the Replica Manager  Replica Manager returns a result dictionary with a full log of all the performed operations Status codes Timings  Transfer Agent performs the retries based on the Replica Manager logs

19 19 Status and outlook  Replica Manager was introduced when LCG utilities were not yet available  More functionality should move now to the lcg-utils making Replica Manager a thin layer  Better “best replica” strategies can be elaborated  Interface to the FTS service ?


Download ppt "1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN."

Similar presentations


Ads by Google