Download presentation
Presentation is loading. Please wait.
Published byMichael Gibson Modified over 8 years ago
1
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org Data management in EGEE
2
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 2 Data services on Grids Simple data files on grid-specific storage Middleware supporting –Replica files to be close to where you want computation For resilience –Logical filenames –Catalogue: maps logical name to physical storage device/file –Virtual filesystems, POSIX-like I/O –Services provided: storage, transfer, catalogue that maps logical filenames to replicas. Solutions include –gLite data service –Globus: Data Replication Service –Storage Resource Broker Other data! e.g. …. –Structured data: RDBMS, XML databases,… –Files on project’s filesystems –Data that may already have other user communities not using a Grid Require extendable middleware tools to support –Computation near to data –Controlled exposure of data without replication Basis for integration and federation OGSA –DAI –In Globus 4 –Not (yet...) in gLite
3
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 3 Scope of data services in gLite Files that are write-once, read-many –If users edit files then They manage the consequences! Maybe just create a new filename! –No intention of providing a global file management system 3 service types for data –Storage –Catalogs –Transfer
4
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 4 Data management example ResourceBrokerStorage Element 1 ComputingElement Input “sandbox” Input “sandbox” + Broker Info Output “sandbox” “User interface” Storage Element 2 1 st job writes and replicates output onto 2 SEs Max. 20MByt e DataSets info LCG FileCatalogue (LFC)
5
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 5 Data management example 2 ResourceBrokerStorage Element 1 ComputingElement Input “sandbox” Input “sandbox” + Broker Info Output “sandbox” “User interface” Storage Element 2 job reads input from an SE Max. 20MByt e DataSets info LCG FileCatalogue (LFC) Keep computation close to data
6
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 6 Logical file namesStorage Element 1 “User interface” LCG FileCatalogue (LFC) Storage Element 2 Content is available on 2 SEs “Myfile.dat” Myfile.dat File_on_se1 File_on_se2 guid
7
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 7Storage Element 1 “User interface” LCG FileCatalogue (LFC) Storage Element 2 “Myfile.dat” Myfile.dat “Logical filename” File_on_se1 (“SURL”: site URL) File_on_se2 (“SURL”: site URL) “GUID” Global Unique Identifier Resolving logical file name Content is available on 2 SEs File content cannot change No need to synchronize replicas
8
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 8 Name conventions Logical File Name (LFN) –An alias created by a user to refer to some item of data, e.g. lfn:/grid/gilda/budapest23/run2/track1 Globally Unique Identifier (GUID) –A non-human-readable unique identifier for an item of data, e.g. guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Site URL (SURL) (or Physical File Name (PFN) or Site FN) –The location of an actual piece of data on a storage system, e.g. srm://pcrd24.cern.ch/flatfiles/cms/output10_1 (SRM) sfn://lxshare0209.cern.ch/data/alice/ntuples.dat (Classic SE) Transport URL (TURL) –Temporary locator of a replica + access protocol: understood by a SE, e.g. rfio://lxshare0209.cern.ch//data/alice/ntuples.dat
9
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 9 Name conventions Users primarily access and manage files through “logical filenames” Mapping by the “LFC” catalogue server Defined by the userLFC Namespace LFC has a directory tree structure lfn:/grid/ /
10
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 10 Storage Element 3 sfn://trigriden01.unime.it/flatfiles/SE00/gilda/generated/2007-06-23/filec79a9e3c-2485-4206-a2a5-235f Storage Element 2 srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/2007-06-23/filea21ab3e2-8ff6-4a44-82a7-f2 LFC directories LFC directories = virtual directories –Each entry in the directory is a pointer to files stored on SEs lfn:/grid/gilda/budapest23/run2/ input1 input2 input3 Storage Element 1 sfn://grid005.iucc.ac.il/storage/gilda/generated/2007-06-23/fileb233d43f-5bc6-4ede-a5fe-611d48be2ba5 LCG FileCatalogue (LFC) Storage Element 4 sfn://grid005.iucc.ac.it/flatfiles/SE00/gilda/generated/2007-06-23/filec79a9e3c-2485-4206-a2a5-235f
11
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 11 Two sets of commands lfc-* LFC = LCG File Catalogue LCG = LHC Compute Grid LHC = Large Hadron Collider –Use LFC commands to interact with the catalogue only To create catalogue directory List files –Used by you, your application and by lcg-utils (see below) lcg-* –Couples catalogue operations with file management Keeps SEs and catalogue in step! –Copy files to/from/between SEs –Replicated
12
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 12 LFC basics Defined by the userLFC Namespace LFC has a directory tree structure /grid/ / All members of a given VO have read-write permissions in their directory Commands look like UNIX with “lfc-” in front (often)
13
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 13 Storage Element Provides –Storage for files : massive storage system - disk or tape based –Transfer protocol (gsiFTP) ~ GSI based FTP server Striped file transfer – cluster as back-end Storage Element server File request + VOMS proxy File system Authentication, authorization
14
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 14 GFAL C API GFAL (Grid File Access Library) is a POSIX interface for operation on file on Storage Element Enable remote handling of files Libraries are in C and can be included in C/C++ sources GFAL Java API – wrapper around the C code The most common of I/O operations are available, just prefix gfal_ to the function name (open(), read()…) man gfal for further details The destination SE must provide secure rfio (classic SEs don’t) GFAL API Description –http://grid-deployment.web.cern.ch/grid- deployment/documentation/LFC_DPM/gfal/htmlhttp://grid-deployment.web.cern.ch/grid- deployment/documentation/LFC_DPM/gfal/html
15
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE Tutorial, Taipei, 1 May 2006 GFAL API code sniffet Examples in gLite3 User Guide (Appendix F) –https://edms.cern.ch/file/722398//gLite-3-UserGuide.pdfhttps://edms.cern.ch/file/722398//gLite-3-UserGuide.pdf int fd; struct stat remote_file_stat; fd = gfal_open(file_ref, O_RDONLY, 0644); cod_ex = gfal_stat(file_ref, &file_stat)... cod_ex = gfal_read(fd, buffer, file_stat.st_size));... cod_ex = gfal_close(fd);
16
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 16 Metadata on the GRID Metadata is data about data On the EGEE Grid: information about files –Describes files –Locate files based on their metadata You many have 1000’s of files, being shared with other researchers –Either: You all access data by remembering lfns (or guids…) .. And hope you know what is in the file… –Or Have a metadata catalogue Allow selection of files based on metadata Metadata is fundamental to e-research
17
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 17 AMGA Implementation AMGA – ARDA Metadata Grid Application –ARDA: A Realisation of Distributed Analysis for LHC Hundreds of millions of files No special security requirements Protection against DoS attacks Now part of gLite middleware –Official Metadata Service for EGEE –Also available as standalone component Expanding user community –HEP, Biomed, UNOSAT…
18
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 18 Metadata concepts Schema Attribute 1: name 1– type 1 Attribute 2: name 2 – type 2 … Collection Entry 1 Entry 2 Entry 3 … A set of entries. Entries: The objects (e.g. files) that need to be described with metadata Schema: a set of attributes. Defines the structure of the metadata
19
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 19 Metadata concepts Metadata catalog Schema Attribute 1: name 1– type 1 Attribute 2: name 2 – type 2 … Collection Entry 1 Entry 2 Entry 3 … Schema 2 Attribute 1: name 1– type 1 Attribute 2: name 2 – type 2 … Collection 2 Entry 1 Entry 2 Entry 3 … Schema 3 Attribute 1: name 1– type 1 Attribute 2: name 2 – type 2 … Collection 3 Entry 1 Entry 2 Entry 3 … Schema 4 Attribute 1: name 1– type 1 Attribute 2: name 2 – type 2 … Collection 4 Entry 1 Entry 2 Entry 3 …
20
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 20 Metadata Concepts Some Concepts –Metadata - List of attributes associated with entries –Attribute – name/value pair with type information Type – The type (int, float, string,…) Name – The name of the attribute Value - Value of an entry's attribute –Schema – A set of attributes –Collection – A set of entries associated with a schema –Think of schemas as tables, attributes as columns, entries as rows
21
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 21 Implementation of the concept in AMGA Schema lfn varchar(100) description varchar(200) Collection /grid/sipos/run2 AMGA server lfn:/grid/gilda/sipos/maps/hungary “Map of Hungary” The collection is a directory on the AMGA file system A schema is a table in an Relational Data Base. One schema is associated to each directory of the file system Input1 Input2 lfn:/grid/gilda/sipos/temp/data “Temperature values of Hungarian cities, 2007.06.25” images Files in an AMGA directory are entities described by metadata Content of AMGA files are irrelevant. Metadata is stored in the DB records. A DB record is stored for each file Collections can be nested Sub-Schema lfn varchar(100) description varchar(200) X_res int Y_res int
22
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 22 An example: AMGA and LFC in UNOSAT ◘ LFC Catalogue ➸ Mapping of LFN to TURL ◘ UNOSAT requires ➸ User will give as input data certain coordinates (x, y, z) ➸ As output, want the satellite image file for downloading ◘ The ARDA Group assists us setting up the AMGA tool for UNOSAT AMGA Oracle DB ARDA APP LFC Storage Element SRM Metadata (x,y,z) LFN TURL
23
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 23 During practicals1: LFC and LCG utils List directory Create a local file then upload it to an SE and register with a logical name (lfn) in the catalogue Create a duplicate in another SE List the replicas LCG File Catalogue (LFC) Storage Element 1 “User interface” Storage Element 2 lfc-* lcg-*
24
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 24 List directory Create a local file then upload it to an SE and register with a logical name (lfn) in the catalogue Create a duplicate in another SE List the replicas Create a second logical file name for a file Download a file from an SE to the UI LCG File Catalogue (LFC) Storage Element 1 “User interface” Storage Element 2 ? lcg-* lfc-* During practicals1: LFC and LCG utils
25
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 25 During practicals2: GFAL examples Write a file to an SE Read a file from an SE Submit the reader code as a job into the GILDA, read the file remotelyStorage Element 1 GFAL writer GFAL reader “User interface” ComputingElement GFAL reader
26
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 26 During practicals3: AMGA examples Create metadata collections Manage metadata schemas … $ mdclient Connecting to amga.ct.infn.it:8822... ARDA Metadata Server 1.3.0 Query> help commands Query> help command_name
27
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 27 Please go to the web page for this practical
28
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 28 Spare slides follow – could be used after the practical
29
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 29 LFC Catalog commands Add/replace a commentlfc-setcomment Set file/directory access control listslfc-setacl Remove a file/directorylfc-rm Rename a file/directorylfc-rename Create a directorylfc-mkdir List file/directory entries in a directorylfc-ls Make a symbolic link to a file/directorylfc-ln Get file/directory access control listslfc-getacl Delete the comment associated with the file/directorylfc-delcomment Change owner and group of the LFC file-directorylfc-chown Change access mode of the LFC file/directorylfc-chmod Summary of the LFC Catalog commands
30
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 30 Summary of lcg-utils commands Replica Management lcg-cpCopies a grid file to a local destination lcg-crCopies a file to a SE and registers the file in the catalog lcg-delDelete one file lcg-repReplication between SEs and registration of the replica lcg-gtGets the TURL for a given SURL and transfer protocol lcg-sdSets file status to “Done” for a given SURL in a SRM request
31
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 31 Summary of fts client commands FTS client glite-transfer-submitSubmit a transfer job : needs at least source and destination SURL glite-transfer-statusGiven one or more job ID, query about their status glite-transfer-cancelDelete the transfer with the give Job ID glite-transfer-listQuery about status of all user’s jobs; support options for query restrictions glite-transfer- channel-list Show all available channel; detailed info only if user has admin privileges
32
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE Tutorial, Taipei, 1 May 2006 LFC server If a site acts as a central catalog for several VOs, it can either have: One LFC server, with one DB account containing the entries of all the supported VOs. You should then create one directory per VO. Several LFC servers, having each a DB account containing the entries for a given VO. Both scenarios have consequences on the handling of database backups Minimum requirements (First scenario) 2Ghz processor with 1GB of memory (not a hard requirement) Dual power supply Mirrored system disk
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.