INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Data Management System Jean Salzemann CNRS/IN2P3 ACGRID School, Hanoi (Vietnam) November 6th,

Slides:



Advertisements
Similar presentations
FP62004Infrastructures6-SSA E-infrastructure shared between Europe and Latin America Architecture of the gLite DMS Claudio Cherubino.
Advertisements

© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
HEPiX GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD.
GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management Hands-on David García Aristegui.
Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management David García Aristegui Grid.
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Data Management System Yaodong Cheng CC-IHEP, Chinese Academy.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
 CASTORFS web page - CASTOR web site - FUSE web site -
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management and Interoperability Peter Kunszt (JRA1 DM Cluster) 2 nd EGEE Conference,
E-science grid facility for Europe and Latin America Data Management Services E2GRIS1 Rafael Silva – UFCG (Brazil) Universidade Federal.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
SEE-GRID-SCI Storage Element Installation and Configuration Branimir Ackovic Institute of Physics Serbia The SEE-GRID-SCI.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Data management in LCG and EGEE David Smith.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Data Management The European DataGrid Project Team
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Data Management Hands-on Juan Eduardo Murrieta.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra gLite 1.4 Data Management System Salvatore Scifo, Riccardo Bruno Test.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra Data Management System gLite – LCG – FiReMan Salvatore Scifo INFN Catania.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Algiers, EUMED/Epikh Application Porting Tutorial, 2010/07/04.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) LFC Installation and Configuration Dong Xu IHEP,
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Data Management Maha Metawei
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Martedi 8 novembre 2005 Consorzio COMETA “Progetto PI2S2” FESR Data Management System Annamaria Muoio -- INFN Catania PI2S2 First Tutorial -- Messina,
EGEE Data Management Services
Jean-Philippe Baud, IT-GD, CERN November 2007
GFAL Grid File Access Library
GFAL: Grid File Access Library
gLite Basic APIs Christos Filippidis
gLite Data Management Services
Java API del Logical File Catalog (LFC)
LFC Installation and Configuration
gLite Data management system overview
Introduction to reading and writing files in Grid
Hands-On Session: Data Management
GFAL 2.0 Devresse Adrien CERN lcgutil team
Data Management Ouafa Bentaleb CERIST, Algeria
Data services in gLite “s” gLite and LCG.
Architecture of the gLite Data Management System
gLite Data and Metadata Management
INFNGRID Workshop – Bari, Italy, October 2004
Data Management system in gLite middleware
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE Data Management System Jean Salzemann CNRS/IN2P3 ACGRID School, Hanoi (Vietnam) November 6th, 2007 Credits: Giuseppe Misurelli

Enabling Grids for E-sciencE INFSO-RI Outline Grid Data Management Challenge Storage Elements and SRM LFC File Catalog Data Movement Utils

Enabling Grids for E-sciencE INFSO-RI Grid DM Challenge Grid Data Management Challenge Storage Elements and SRM LCG File Catalog Data Movement Utils

Enabling Grids for E-sciencE INFSO-RI The Grid DM Challenge /1 NEEDSREQUIREMENTSSOLUTIONS Heterogeneous : Data are stored on different storage systems using different technologies. A common interface to storage resources is required in order to hide the underlying complexity. Storage Resource Manager (SRM) interface; (gLite File I/O Server) Distributed : Data are stored in different locations; in most cases there is no shared file system or common namespace. Data need to be moved between different locations. Need to keep track where data are stored. File Transfer Service (FTS) – to move files among GRID sites. Catalog – to keep track where data are stored. Data Retrieving : Applications are located in different places from where data are stored. Need of scheduled reliable file transfer service. File Transfer Service Data Scheduler File Placement Service Transfer Agent File Transfer Library Security : Data must be managed according to the VO membership access control policy. Centralized Access control Service. File Authorization Service

Enabling Grids for E-sciencE INFSO-RI The Grid DM Challenge /2 DM works with files, this assumption is due the following reasons: –semantic of file is very good understood by everyone –file is the smallest granularity of data.

Enabling Grids for E-sciencE INFSO-RI File services File Access Patterns: –Write once, read many –Rare append - only updates with one owner –Frequently updated at one source - replicas check/pull new version –(NOT frequent updates, many users, many sites) File naming –Mostly, see the “logical file name” (LFN) –LFN must be unique:  includes logical directory name  in a VO namespace –E.g. /gLite/myVOname.org/runs/12aug05/data1.res

Enabling Grids for E-sciencE INFSO-RI Data Management Services Storage Element – common interface to storage –Storage Resource Manager Castor, dCache, DPM, … –POSIX-I/O gLite-I/O –Native Access protocolsrfio, dcap –Transfer protocolsgsiftp Catalogs – keep track where data are stored –File Catalog –Replica Catalog LFC, Metadata Catalog (es. AMGA) –File Authorization Service –Metadata Catalog File Transfer – schedules reliable file transfer –Data Scheduler –File Transfer Service lcg-utils, gLite FTS

Enabling Grids for E-sciencE INFSO-RI SE and SRM Grid Data Management Challenge Storage Elements and SRM LFC File Catalog Data Movement Utils

Enabling Grids for E-sciencE INFSO-RI SRM in an example /1 She is running a job which needs: Data for physics event reconstruction Simulated Data Some data analysis files She will write files remotely too They are at CERN In dCache They are at Fermilab In a disk array They are at Nikhef in a classic SE

Enabling Grids for E-sciencE INFSO-RI SRM in an example /2 dCache Own system, own protocols and parameters Castor No connection with dCache or DPM gLite DPM Independent system from dCache or Castor You as a user need to know all the systems!!! SRM I talk to them on your behalf I will even allocate space for your files And I will use transfer protocols to send your files there

Enabling Grids for E-sciencE INFSO-RI Storage Resource Management Data are stored on disk pool servers or Mass Storage Systems storage resource management needs to take into account –Transparent access to files (migration to/from disk pool) –File pinning –Space reservation –File status notification –Life time management The SRM (Storage Resource Manager) takes care of all these details –The SRM is a single interface that takes care of local storage interaction and provides a Grid interface to the outside world

Enabling Grids for E-sciencE INFSO-RI gLite SE types /1 gLite 3.0 data access protocols: –File Transfer:GSIFTP (GridFTP) –File I/O (Remote File access):gsidcap insecure RFIO secured RFIO (gsirfio) Classic SE: –GridFTP server –Insecure RFIO daemon (rfiod) – only LAN limited file access –Single disk or disk array –No quota management –Does not support the SRM interface

Enabling Grids for E-sciencE INFSO-RI gLite SE types /2 Mass Storage Systems (Castor) –Files migrated between front-end disk and back-end tape storage hierarchies –GridFTP server –Insecure RFIO (Castor) –Provide a SRM interface with all the benefits Disk pool managers (dCache and gLite DPM) –manage distributed storage servers in a centralized way –Physical disks or arrays are combined into a common (virtual) file system –Disks can be dynamically added to the pool –GridFTP server –Secure remote access protocols (gsidcap for dCache, gsirfio for DPM) –SRM interface

Enabling Grids for E-sciencE INFSO-RI File Catalog and DM Tools Grid Data Management Challenge Storage Elements and SRM LFC File Catalog Data Movement Utils

Enabling Grids for E-sciencE INFSO-RI Files & replicas: Naming Conventions Logical File Name (LFN) –An alias created by a user to refer to some item of data, e.g. “lfn:cms/ /run2/track1” Globally Unique Identifier (GUID) –A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” Site URL (SURL) (or Physical File Name (PFN) or Site FN) –The location of an actual piece of data on a storage system, e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE) Transport URL (TURL) –Temporary locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”

Enabling Grids for E-sciencE INFSO-RI Provides Bulk operations Cursors for large queries Timeouts and retries for client operations Features User exposed transaction API Hierarchical namespace and namespace operations Integrated GSI Authentication and Authorization Access Control Lists (Unix Permissions and POSIX ACLs) Checksums Supported database backends: Oracle and MySQL LFC - Description

Enabling Grids for E-sciencE INFSO-RI LFC stores both logical and physical mappings for the file in the same database  Speed up of operations Treats all entities as files in a UNIX-like filesystem. File API also similar to UNIX (create(), mkdir(), chown()….) Hierarchical namespace of LFNs mapped to the GUIDs GUIDs mapped to the physical locations of file replicas in the storage System attributes of files (creation time, file size and checksum…) stored as LFN attributes One field for user-defined metadata Multiple LFNs per GUID allowed as symbolic links to the primary LFN. File Metadata Logical File Name (LFN) GUID System Metadata (ACLs, Ownership,etc Symlinks Link name User Metadata User defined Metadata File Replica Storage File Name Storage Host LFC - Architecture

Enabling Grids for E-sciencE INFSO-RI File Catalog and DM Tools Grid Data Management Challenge Storage Elements and SRM LFC File Catalog Data Movement Utils

Enabling Grids for E-sciencE INFSO-RI GFAL: Grid File Access Library Interactions with SE require some components: → File catalog services to locate replicas → SRM → File access mechanism to access files from the SE on the WN GFAL does all this tasks for you: → Hides all these operations → Presents a POSIX interface for the I/O operations → User can create all commands needed for storage management → It offers as well an interface to SRM Supported protocols: → file (local or nfs-like access) → dcap, gsidcap and kdcap (dCache access) → rfio (castor access) and gsirfio (dpm)

Enabling Grids for E-sciencE INFSO-RI lcg-utils DM tools High level interface (CL tools and APIs) to –Upload/download files to/from the Grid (UI,CE and WN SEs) –Replicate data between SEs and locate the best replica available –Interact with the file catalog Definition: A file is considered to be a Grid File if it is both physically present in a SE and registered in the File Catalog –lfc commands to interact with file catalog features –lcg-utils commands ensure the consistency between files in the Storage Elements and entries in the File Catalog

Enabling Grids for E-sciencE INFSO-RI LFC commands lfc-chmodChange access mode of the LFC file/directory lfc-chownChange owner and group of the LFC file-directory lfc-delcommentDelete the comment associated with the file/directory lfc-getaclGet file/directory access control lists lfc-lnMake a symbolic link to a file/directory lfc-lsList file/directory entries in a directory lfc-mkdirCreate a directory lfc-renameRename a file/directory lfc-rmRemove a file/directory lfc-setaclSet file/directory access control lists lfc-setcommentAdd/replace a comment LFC Catalog commands

Enabling Grids for E-sciencE INFSO-RI Listing the entries of a LFC directory lfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path… where path specifies the LFN pathname (mandatory) –Remember that LFC has a directory tree structure  /grid/ /  All members of a VO have read-write permissions for their own directory –You can set LFC_HOME to use relative path > lfc-ls /grid/gilda/misurelli > export LFC_HOME=/grid/gilda > lfc-ls -l misurelli lfc-ls Defined by the user LFC Namespace

Enabling Grids for E-sciencE INFSO-RI lfc-mkdir Creating directories in the LFC lfc-mkdir [-m mode] [-p] path... Where path specifies the LFC pathname Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand: –lfc-mkdir /grid/gilda/misurelli/practise –lfc-ls -l /grid/gilda/misurelli

Enabling Grids for E-sciencE INFSO-RI lcg-utils commands Replica Management lcg-cpCopies a grid file to a local destination lcg-crCopies a file to a SE and registers the file in the catalog lcg-delDelete one file lcg-repReplication between SEs and registration of the replica lcg-gtGets the TURL for a given SURL and transfer protocol lcg-sdSets file status to “Done” for a given SURL in a SRM request File Catalog Interaction lcg-aaAdd an alias in LFC for a given GUID lcg-raRemove an alias in LFC for a given GUID lcg-rfRegisters in LFC a file placed in a SE lcg-ufUnregisters in LFC a file placed in a SE lcg-laLists the alias for a given SURL, GUID or LFN lcg-lgGet the GUID for a given LFN or SURL lcg-lrLists the replicas for a given GUID, SURL or LFN

Enabling Grids for E-sciencE INFSO-RI lcg-utils: lcg-cr Upload a file to a SE and register it into the catalog lcg-cr -d dest_file | dest_host [-g guid] [-l lfn] [-v | --verbose] --vo vo src_file where: –dest_host is the fully qualified hostname of the destination SE –dest_file is a valid SURL (both sfn:// or srm:// format are valid) –guid specifies the Grid Unique IDentifier. If this option is not present, a GUID is generated internally –lfn specifies the Logical File Name associated with the file –vo specifies the Virtual Organization the user belongs to –src_file specifies the source file name: the protocol can be file:/// or gsiftp:///

Enabling Grids for E-sciencE INFSO-RI edg-gridftp-exists TURL Checks if file/dir exists on a SE edg-gridftp-ls TURL Lists a directory on a SE globus-url-copy srcTURL dstTURL Copies files between SEs edg-gridftp-mkdir TURL Creates a directory on a SE edg-gridftp-rename srcTURL dstTURL Renames a file on a SE edg-gridftp-rm TURL Removes a file from a SE edg-gridftp-rmdir TURL Removes a directory on a SE Used for low level management of file/directories in SEs Advanced utilities: gridftp commands

Enabling Grids for E-sciencE INFSO-RI Globus-url-copy globus-url-copy srcTURL destTURL –low level file transfer Interaction with RLS components –edg-lrc command (actions on LRC) –edg-rmc command (actions on RMC) –C++ and Java API for all catalog operations   Using low level CLI and API is STRONGLY discouragedUsing low level CLI and API is STRONGLY discouraged –Risk : loose consistency between SEs and catalogues –REMEMBERBOTH: –REMEMBER: a file is in Grid if it is BOTH:  stored in a Storage Element  registered in the file catalog

Enabling Grids for E-sciencE INFSO-RI References gLite documentation homepage – LFC and DPM documentation – mentationhttps://uimon.cern.ch/twiki/bin/view/LCG/DataManagementDocu mentation

Enabling Grids for E-sciencE INFSO-RI Questions…