Data Management Ouafa Bentaleb CERIST, Algeria

Slides:



Advertisements
Similar presentations
Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management Hands-on David García Aristegui.
Advertisements

Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management David García Aristegui Grid.
1 CHEP 2000, Roberto Barbera Tests of data management services in EDG 1.2 ALICE Off-line Week,
Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Ninth EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Data Management System Yaodong Cheng CC-IHEP, Chinese Academy.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Nov. 18, EGEE and gLite are registered trademarks gLite Middleware Usage Dusan.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
E-science grid facility for Europe and Latin America Data Management Services E2GRIS1 Rafael Silva – UFCG (Brazil) Universidade Federal.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
SEE-GRID-SCI Storage Element Installation and Configuration Branimir Ackovic Institute of Physics Serbia The SEE-GRID-SCI.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Data Management The European DataGrid Project Team
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Data Management Hands-on Juan Eduardo Murrieta.
12th EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra gLite 1.4 Data Management System Salvatore Scifo, Riccardo Bruno Test.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra Data Management System gLite – LCG – FiReMan Salvatore Scifo INFN Catania.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Algiers, EUMED/Epikh Application Porting Tutorial, 2010/07/04.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) LFC Installation and Configuration Dong Xu IHEP,
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Data Management Maha Metawei
User Domain Storage Elements SURL  TURL LFC Domain (LCG File Catalogue) SA1 – Data Grid Interoperation Enabling Grids for E-sciencE EGEE-III INFSO-RI
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Martedi 8 novembre 2005 Consorzio COMETA “Progetto PI2S2” FESR Data Management System Annamaria Muoio -- INFN Catania PI2S2 First Tutorial -- Messina,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
Enabling Grids for E-sciencE EGEE-II INFSO-RI The Development of SRM interface for SRB Fu-Ming Tsai Academia Sinica Grid Computing.
EGEE Data Management Services
Gri2Win: Porting gLite to run under Windows XP Platform
GFAL Grid File Access Library
GFAL Grid File Access Library
GFAL: Grid File Access Library
LFC Server Installation & Configuration
gLite Basic APIs Christos Filippidis
gLite Information System
Classic Storage Element
Java API del Logical File Catalog (LFC)
The gLite Data Management System
LFC Installation and Configuration
Practical: The Information Systems
Introductions Using gLite Grid Miguel Angel Díaz Corchero
gLite Data management system overview
gLite Grid Services Salma Saber
Grid2Win: Porting of gLite middleware to Windows XP platform
Introduction to reading and writing files in Grid
Grid Services Ouafa Bentaleb CERIST, Algeria
The gLite API – Part II Giuseppe LA ROCCA ACGRID-II School
Hands-On Session: Data Management
LFC Installation and configuration
GFAL 2.0 Devresse Adrien CERN lcgutil team
Data Management in Release 2
Gri2Win: Porting gLite to run under Windows XP Platform
Enrico Fattibene INFN-CNAF
Data services in gLite “s” gLite and LCG.
EGEE Middleware: gLite Information Systems (IS)
Architecture of the gLite Data Management System
gLite Data and Metadata Management
Data Management system in gLite middleware
Presentation transcript:

Data Management Ouafa Bentaleb (o.bentaleb@dtri.cerist.dz) CERIST, Algeria Africa 6, Rabat ­ 2011 Joint CHAIN/EUMEDGRID-Support/EPIKH School Application Porting

Outline gLite Data Management Data Management Practical Introduction Scope of data services in gLite Examples Storage Elements LCG File Catalog Data Management Practical Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 2

The overall goal of this presentation Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting

Data Management System (DMS) Provides file manipulation services for users and other Grid services. DMS enables the location, access and transfer of data User do not need to know data location, just the logical name Data is accessed through standard interfaces Data can be replicated or transferred to several locations as needed Data is shared within a VO Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 4 4

Scope of data services in gLite Simply, DMS provides all operation that all of us are used to performing Uploading /downloading files Creating file /directories Renaming file /directories Deleting file /directories Moving file /directories Listing directories Creating symbolic links Note: Files are write-once, read-many Files cannot be changed unless remove or replaced No intention of providing a global file management system Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 5 5

Scope of data services in gLite Resource centers need meet growing demand for storage Storage Element capable to manage multiple disk pools Disk Pool Manager (DPM), dCache, CASTOR Allows users and applications (programs) to store/retrieve data (files) Data is stored on different storage systems technologies Common interface required to hide underlying complexity Storage Resource Manager (SRM) – storage management protocol GridFTP – secure file transfer Data is stored at different locations with separate namespace File catalogue to provide uniform view of Grid data LCG File Catalog (LFC) Applications need to access Grid data management services Data management API GFAL Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 6 6

Data Management Exemple Resource Broker Storage Element Computing DataSets info Input “sandbox” Input “sandbox” + Broker Info Output “sandbox” “User interface” LCG FileCatalogue (LFC) File replicated onto 2 SEs Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting

Data Management Exemple Storage Element1 “User interface” LCG FileCatalogue (LFC) Element 2 File replicated onto 2 SEs “Myfile.dat” Myfile.dat File_on_se1 File_on_se2 guid Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 8 8

Data Management Exemple Storage Element1 “User interface” LCG FileCatalogue (LFC) Element2 “Myfile.dat” Myfile.dat “Logical filename” File_on_se1 (“SURL”: site URL) File_on_se2 (“SURL”: site URL) “GUID” Global Unique Identifier Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 9 9

Storage Element (SE) The Storage Element (SE) is the service which allows users and applications (programs) to store/retrieve data (files) Provide storage space for files. Provide transfer protocol (GSIFTP) ~ GSI based FTP server Provide an interface for the management of disk and tape storage resources: Storage Resource Manager (SRM) Files located in the Storage Elements (SEs)… Are mostly write-once, read-many. Accessible by users and applications from “anywhere” in the Grid. Several replicas of one file can be replicated at different sites. Cannot be changed unless remove or replaced. Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 10 10

A practical example (1) Storm at SiteA dCache at SiteB DPM at SiteD She is working on a job which needs: - read MonteCarlo simulations from siteA - read experiment data from siteB - read environmental data from siteC - write output to home siteD Storm at SiteA dCache at SiteB DPM at SiteD DPM at SiteC Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting

File Naming conventions (1) Grid Unique IDentifier (GUID) Every file has a GUID A non-human-readable unique identifier, e.g.: guid:38ed3f60-c402-11d7-a6b0-f53ee5a37e1d Note: all replicas of a file will share the same GUID Logical File Name (LFN) An alias that can be used to refer to a file, e.g.: lfn://grid/gilda/users/maroc/myfile.dat Logical File Name 1 Logical File Name N GUID ... Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 12 12

File Naming conventions (2) Storage URL (SURL) or Physical File Name (PFN) The location of an actual file on a storage system, e.g.: srm://se01.grid.arn.dz/dpm/home/eumed/project1/test.dat Note: Used by the system to find where the replica is physically stored Transport URL (TURL) Complete URI with the necessary information to access a file in a SE (including the access protocol) e.g.: rfio://lxshare0209.cern.ch//data/alice/ntuples.dat Logical File Name 1 Physical File SURL 1 TURL 1 ... GUID ... ... Logical File Name N Physical File SURL N TURL 1 Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 13

Needles in a haystack File Catalogue LFC = LCG File Catalogue How do I keep track of all files I have on the Grid? Even if I remember all the LFN’s of my files, what about someone else's files? How does the Grid keep track of the mapping between LFN(s), GUID and SURL(s)? LFC = LCG File Catalogue LCG = LHC Compute Grid LHC = Large Hadron Collider File Catalogue Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 14

File Catalogue Is the service which maintains mappings between LFN(s), GUID and SURL(s) It keeps track of the location of copies (replicas) of files It consists of a unique catalogue, where the LFN is the main key Looks like a “top-level” directory in the Grid For each of the supported VO a separate subdirectory exists under the "/grid" directory. All members of a given VO have read-write permissions in such a directory Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 15

The LFC Service lfn:/grid/gilda/tcaland/mpi.txt File Catalogue SE A SE B User Interface SE C Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 16

Job submission – example 1 Small files: InputSandbox / OutputSandbox CE WMS User Interface Worker Nodes Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 17

Data Management – example 2 CE WMS User Interface Worker Nodes LFC SE SE Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 18

LFC Commands Summary of the LFC Catalog commands lfc-chmod Change access mode of the LFC file/directory lfc-chown Change owner and group of the LFC file-directory lfc-delcomment Delete the comment associated with the file/directory lfc-getacl Get file/directory access control lists lfc-ln Make a symbolic link to a file/directory lfc-ls List file/directory entries in a directory lfc-mkdir Create a directory lfc-rename Rename a file/directory lfc-rm Remove a file/directory lfc-setacl Set file/directory access control lists lfc-setcomment Add/replace a comment Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting

lcg utils commands Replica Management File Catalog Interaction lcg-aa lcg-cp Copies a grid file to a local destination lcg-cr Copies a file to a SE and registers the file in the catalog lcg-del Delete one file lcg-rep Replication between SEs and registration of the replica lcg-gt Gets the TURL for a given SURL and transfer protocol lcg-sd Sets file status to “Done” for a given SURL in a SRM request File Catalog Interaction lcg-aa Add an alias in LFC for a given GUID lcg-ra Remove an alias in LFC for a given GUID lcg-rf Registers in LFC a file placed in a SE lcg-uf Unregisters in LFC a file placed in a SE lcg-la Lists the alias for a given SURL, GUID or LFN lcg-lg Get the GUID for a given LFN or SURL lcg-lr Lists the replicas for a given GUID, SURL or LFN Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting

References gLite documentation homepage DM subsystem documentation http://glite.web.cern.ch/glite/documentation/default.asp DM subsystem documentation http://egee-jra1-dm.web.cern.ch/egee-jra1-dm/doc.htm LFC and DPM documentation https://twiki.cern.ch/twiki/bin/view/LCG/DataManagementDocumentation gLite Data Management Tutorial: https://grid.ct.infn.it/twiki/bin/view/GILDA/DataManagement#Create_directory Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting

Reference: https://grid.ct.infn.it/twiki/bin/view/GILDA/DataManagement Let’s practice! Reference: https://grid.ct.infn.it/twiki/bin/view/GILDA/DataManagement Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 22 22

Environment Variables Pointing to the right BDII Pointing to the right LFC echo $LCG_GFAL_INFOSYS export LCG_GFAL_INFOSYS = bdii.eumedgrid.eu:2170 echo $LFC_HOST export LFC_HOST = lfc.grid.arn.dz Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 23 23

Before starting… Make sure to have a proxy created voms-proxy-info -all voms-proxy-init --voms eumed Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 24 24

LFC: creating a directory List directories Create your own personal directory inside lfc-ls /grid/eumed/tutorials/ lfc-mkdir /grid/eumed/tutorials/epikh You can check the creation typing: lfc-ls /grid/eumed/tutorials/ Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 25 25

Copying and registering a file 1/2 lcg-cr Copies a file to a SE and registers the file in the catalogue lcg-cr --vo <vo name> -l <LFN destination> -d <SE> <local file> Make sure to have a directory in the LFC (/grid/eumed/tutorials/yourname/) Use the lcg-info or lcg-infosites commands to figure out the available SEs This command will return the GUID for your file Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 26 26

Copying and registering a file 2/2 lcg-infosites --vo eumed se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 16241327 3464020 n.a se01.grid.arn.dz 2390269393 97444791 n.a se02.grid.arn.dz 24377757 13973700 n.a se03.grid.arn.dz 24377757 13973700 n.a gilda-02.pd.infn.it 80605789 4411630 n.a sirius-se.ct.infn.it 4089114112 704958582 n.a grisuse.scope.unina.it lcg-cr --vo eumed -l lfn:/grid/eumed/tutorials/epikh/test.txt -d se01.grid.arn.dz file://$HOME/test.txt guid:0d8ef3b9-7f73-4c57-80c9-e827bace8597 Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 27 27

Downloading a file First of all, let’s download a file from a SE to start “playing” with it. Basic Usage: Try it: lcg-cp --vo <vo name> <LFN origin> <local destination> lcg-cp --vo eumed lfn:/grid/eumed/tutorials/epikh/test.txt file://$HOME/test1.txt Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 28 28

Replicate a file between SEs Basic Usage: Try it: lcg-rep --vo <vo name> -d <destination SE> <LFN of your file> lcg-rep --vo eumed -d se01.grid.arn.dz lfn:/grid/eumed/tutorials/epikh/test.txt Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 29 29

From where it was downloaded? List the Replicas of the file: This command will return the SURL of all replicas A file can be stored on multiple SE's so that a job can download it from the closest SE while is running. lcg-lr --vo eumed lfn:/grid/eumed/tutorials/epikh/test.txt Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 30 30

Deleting a file Basic Usage: When used with '-a' switch will delete all replicas and delete entry from catalog Try it: lcg-del -a --vo <vo name> <LFN> lcg-del -a --vo eumed lfn:/grid/eumed/tutorials/epikh/test.txt Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 31 31

Removing a LFC directory Basic Usage: Try it: lfc-rm -r <LFC file path> lfc-rm -r /grid/eumed/tutorials/epikh Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting 32 32

Any questions …? Ouafa Bentaleb, CERIST Rabat, Joint CHAIN/EUMEDGRID-Support/EPIKH ,Africa 6 2011 School Application Porting