1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.

Slides:



Advertisements
Similar presentations
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Advertisements

Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management David García Aristegui Grid.
Andrew C. Smith, 11 th May 2007 EGEE User Forum 2 - DIRAC Data Management System User Forum 2 Data Management System “Tell me and I forget. Show me and.
Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
LHCb week, 27 May 2004, CERN1 Using services in DIRAC A.Tsaregorodtsev, CPPM, Marseille 2 nd ARDA Workshop, June 2004, CERN.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
New DM Clients and retirement plans for old ones Alejandro Alvarez Ayllon on behalf of the DM Clients developers IT/SDC 11/12/2013.
DIRAC Review (13 th December 2005)Stuart K. Paterson1 DIRAC Review Exposing DIRAC Functionality.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
Data Access for Analysis Jeff Templon PDP Groep, NIKHEF A. Tsaregorodtsev, F. Carminati, D. Liko, R. Trompert GDB Meeting 8 march 2006.
LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
SEE-GRID-SCI Storage Element Installation and Configuration Branimir Ackovic Institute of Physics Serbia The SEE-GRID-SCI.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Transformation System report Luisa Arrabito 1, Federico Stagni 2 1) LUPM CNRS/IN2P3, France 2) CERN 5 th DIRAC User Workshop 27 th – 29 th May 2015, Ferrara.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
DIRAC Data Management: consistency, integrity and coherence of data Marianne Bargiotti CERN.
The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.
Data Management The European DataGrid Project Team
EGEE-II INFSO-RI Enabling Grids for E-sciencE P-GRADE overview and introduction: workflows & parameter sweeps (Advanced features)
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
1 Egrid portal Stefano Cozzini and Angelo Leto. 2 Egrid portal Based on P-GRADE Portal 2.3 –LCG-2 middleware support: broker, CEs, SEs, BDII –MyProxy.
1 LHCb view on Baseline Services A.Tsaregorodtsev, CPPM, Marseille Ph.Charpentier CERN Baseline Services WG, 4 March 2005, CERN.
EGEE is a project funded by the European Union under contract IST Enabling bioinformatics applications to.
1 DIRAC agents A.Tsaregorodtsev, CPPM, Marseille ARDA Workshop, 7 March 2005, CERN.
GAG meeting, 5 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, Marseille N. Brook, Bristol/CERN GAG Meeting, 5 July 2004, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
1 DIRAC project A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
CMS data access Artem Trunov. CMS site roles Tier0 –Initial reconstruction –Archive RAW + REC from first reconstruction –Analysis, detector studies, etc.
Federating Data in the ALICE Experiment
gLite Basic APIs Christos Filippidis
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
ATLAS Use and Experience of FTS
Classic Storage Element
Data Bridge Solving diverse data access in scientific applications
gLite Data management system overview
Gfal/lcg-util -> Gfal2/gfal2-util
Hands-On Session: Data Management
GFAL 2.0 Devresse Adrien CERN lcgutil team
Data Management in Release 2
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Data Management Ouafa Bentaleb CERIST, Algeria
Data services in gLite “s” gLite and LCG.
Data Management in LHCb: consistency, integrity and coherence of data
Architecture of the gLite Data Management System
DIRAC Data Management: consistency, integrity and coherence of data
Presentation transcript:

1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN

2 Outline  DIRAC Data Management Components  Storage Element  File Catalogs  Replica Manager

3 DM Components  DIRAC Data Management tools are built on top of or provide interfaces to the existing services  The main components are:  Storage Element and Storage access plug-ins  Replica Manager  File Catalogs

4 DM Components FileCatalogC FileCatalogB SE Service AAAStorage BBBStorage CCCStorage StorageElement ReplicaManager FileCatalogA UserInterface WMS TransferAgent Data Management Clients Physical storage DIRAC Data Management Components

5 Storage Element  DIRAC StorageElement is an abstraction of a Storage facility  The actual access to storage is provided by plug-in modules for each available access protocol.  The Storage Element properties are fully determined by its description in the Configuration Service  Pluggable transport modules: srm,gridftp,bbftp,sftp,http,…  SRM like functionality for protocol (TURL) resolution

6 Storage components

7 Storage Element description  Each Storage Element can be described with several access points  Each access points has full description sufficient to construct the corresponding URL Including host,ports,paths  Storage Element is identified by its name  Aliases are possible Very useful for defining generic Storage Elements with local specification, e.g. Tier1SE, LogSE [CERN_Log] SEName = CERN_Log SEHost = lxb2003.cern.ch SEPath = /storage SEProtocol = GSIFTP SEHost.1 = lxb2003.cern.ch SEPort.1 = 80 SEPath.1 = /storage SEProtocol.1 = HTTP SEHost.2 = lxb2003.cern.ch SEPath.2 = /storage SEProtocol.2 = FTP [LogSE] SEName = CERN_Log

8 Storage Element usage  Storage Element is used mostly to get access to the files  It provides a choice for the Replica Manager (see below) of all the available protocols leaving it to decide which is the best one in the current context.  The file PFN is always constructed on the fly  No dependency on possible change of the SE end- point – it is sufficient to just change it also in the Configuration Service

9 Storage plug-ins  Available Plug-in modules are available for multiple protocols:  srm,gridftp,bbftp,sftp,http,…  The modules are providing all the operations to manage the physical name space of the storage:  Creating directories  Uploading and getting files and entire directories  Removing files and directories  Checking existence  Getting file sizes and stat parameters

10 File Catalogs  DIRAC Data Management was designed to work with multiple File Catalogs  No clear choice in the beginning  Turned out to be useful for redundancy and reliability  Easy to incorporate specialized services exposing catalog interface  All the catalogs have identical client API’s  Can be used interchangeably

11 File Catalogs (2)  Available catalogs  BK database replica tables Limited to just the production data No hierarchy Continued to be used for the time being  AliEn File Catalog Was very useful for redundancy and gaining experience Discontinued now  LCG File Catalog – LFC Currently the baseline choice see Juan’s presentation  POOL XML catalog Simple XML file based catalog  Processing Database File Catalog Exposing Processing DB Datafiles and Replicas table as a File Catalog Made it easy to populate the Processing DB

12 Replica Manager  Replica Manager is providing a high level API for all the data management operations in DIRAC  Uploading to Storage Elements and registering files  Getting files from the Storage Elements  Replication of the files  File removal  The Replica Manager users do not usually have to deal with the File Catalogs or Storage Elements directly.

13 Replica Manager components

14 RM functionality  Keeps a list of active File Catalogs  Initial list is obtained from the Configuration Service at construction time  All the registration operations will be applied to all the catalogs  Queries are also done for all the catalogs. The result is the union of all the queries

15 RM functionality: File replication  File replication details are handled entirely by the Repluca Manager: dirac-rm-replicate /lhcb/user/a/atsareg/file.dat PIC_Castor  Inside the Replica Manager  Will find a replica which can be copied to the destination SE by a third part transfer  If no replica can be copied by third party transfer, it will be first copied to the local cache and then transfered to the destination  The new PFN will be determined by the intial replica PFN  The new replica will be registered in all the active catalogs

16 RM functionality: Getting files from the storage  Replica Manager applies the « best replica » strategy to get the files:  Checks if a replica exists on any local storage Local SE is usually defined in the local configuration or in the global SiteToLocalSEMapping section Attempts to get file with some local protocol ( file or rfio ) In case of failure remote protocol ( e.g. gridftp ) is tried out on the Local SE  If no replica found on the local SE or getting it fails then replicas on the remote SEs are tried out one by on until successful download  It is possible to define in the configuration that a symbolic link is created instead of the physical copy for given protocols: Can be useful for file or rfio protocols

17 Replica Manager usage  Command line tools  dirac-rm-copyAndRegister [ ] If GUID is not specified, Replica Manager assignes one derived from the file checksum  dirac-rm-get  dirac-rm-replicate  dirac-rm-remove

18 Replica Manager usage (2)  Job workflow  Downloading input sandbox/data  Uploading output sandbox/data  Transfer Agent  All the operations are performed through the Replica Manager  Replica Manager returns a result dictionary with a full log of all the performed operations Status codes Timings  Transfer Agent performs the retries based on the Replica Manager logs

19 Status and outlook  Replica Manager was introduced when LCG utilities were not yet available  More functionality should move now to the lcg-utils making Replica Manager a thin layer  Better “best replica” strategies can be elaborated  Interface to the FTS service ?