EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org Data management in EGEE.

Slides:



Advertisements
Similar presentations
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Advertisements

Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management David García Aristegui Grid.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Data Management System Yaodong Cheng CC-IHEP, Chinese Academy.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
EGEE-III INFSO-RI Enabling Grids for E-sciencE Nov. 18, EGEE and gLite are registered trademarks gLite Middleware Usage Dusan.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Data and storage services on the NGS Mike Mineter Training Outreach and Education
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
E-science grid facility for Europe and Latin America Data Management Services E2GRIS1 Rafael Silva – UFCG (Brazil) Universidade Federal.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
SEE-GRID-SCI Storage Element Installation and Configuration Branimir Ackovic Institute of Physics Serbia The SEE-GRID-SCI.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Data management in LCG and EGEE David Smith.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Command Line Grid Programming Spiros Spirou Greek Application Support Team NCSR “Demokritos”
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Data Management Components Presenter.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Data and storage services on the NGS.
Data Management The European DataGrid Project Team
EGEE is a project funded by the European Union under contract IST Data Management Data Access From WN Paolo Badino Ricardo.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Data Management Hands-on Juan Eduardo Murrieta.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra gLite 1.4 Data Management System Salvatore Scifo, Riccardo Bruno Test.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra Data Management System gLite – LCG – FiReMan Salvatore Scifo INFN Catania.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Algiers, EUMED/Epikh Application Porting Tutorial, 2010/07/04.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Data Management Maha Metawei
User Domain Storage Elements SURL  TURL LFC Domain (LCG File Catalogue) SA1 – Data Grid Interoperation Enabling Grids for E-sciencE EGEE-III INFSO-RI
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Scuola Grid INFN, Trieste, 1-12 Dic Managing Confidential Data in the gLite Middleware – The Secure Storage.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Martedi 8 novembre 2005 Consorzio COMETA “Progetto PI2S2” FESR Data Management System Annamaria Muoio -- INFN Catania PI2S2 First Tutorial -- Messina,
EGEE Data Management Services
GFAL Grid File Access Library
GFAL: Grid File Access Library
gLite Basic APIs Christos Filippidis
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
Java API del Logical File Catalog (LFC)
Data services on the NGS
gLite Data management system overview
gLite Grid Services Salma Saber
Data services on the NGS
Hands-On Session: Data Management
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Plovdiv, Bulgaria,
Data Management Ouafa Bentaleb CERIST, Algeria
AMGA Web Interface Vincenzo Milazzo
Data services in gLite “s” gLite and LCG.
EGEE Middleware: gLite Information Systems (IS)
Architecture of the gLite Data Management System
gLite Data and Metadata Management
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data services on Grids Simple data files on grid-specific storage Middleware supporting –Replica files  to be close to where you want computation  For resilience –Logical filenames –Catalogue: maps logical name to physical storage device/file –Virtual filesystems, POSIX-like I/O –Services provided: storage, transfer, catalogue that maps logical filenames to replicas. Solutions include –gLite data service –Globus: Data Replication Service –Storage Resource Broker Other data! e.g. …. –Structured data: RDBMS, XML databases,… –Files on project’s filesystems –Data that may already have other user communities not using a Grid Require extendable middleware tools to support –Computation near to data –Controlled exposure of data without replication Basis for integration and federation OGSA –DAI –In Globus 4 –Not (yet...) in gLite

Enabling Grids for E-sciencE EGEE-II INFSO-RI Scope of data services in gLite Files that are write-once, read-many –If users edit files then  They manage the consequences!  Maybe just create a new filename! –No intention of providing a global file management system 3 service types for data –Storage –Catalogs –Transfer

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data management example ResourceBrokerStorage Element 1 ComputingElement Input “sandbox” Input “sandbox” + Broker Info Output “sandbox” “User interface” Storage Element 2 1 st job writes and replicates output onto 2 SEs Max. 20MByt e DataSets info LCG FileCatalogue (LFC)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Data management example 2 ResourceBrokerStorage Element 1 ComputingElement Input “sandbox” Input “sandbox” + Broker Info Output “sandbox” “User interface” Storage Element 2 job reads input from an SE Max. 20MByt e DataSets info LCG FileCatalogue (LFC) Keep computation close to data

Enabling Grids for E-sciencE EGEE-II INFSO-RI Logical file namesStorage Element 1 “User interface” LCG FileCatalogue (LFC) Storage Element 2 Content is available on 2 SEs “Myfile.dat” Myfile.dat File_on_se1 File_on_se2 guid

Enabling Grids for E-sciencE EGEE-II INFSO-RI Storage Element 1 “User interface” LCG FileCatalogue (LFC) Storage Element 2 “Myfile.dat” Myfile.dat “Logical filename” File_on_se1 (“SURL”: site URL) File_on_se2 (“SURL”: site URL) “GUID” Global Unique Identifier Resolving logical file name Content is available on 2 SEs File content cannot change  No need to synchronize replicas

Enabling Grids for E-sciencE EGEE-II INFSO-RI Name conventions Logical File Name (LFN) –An alias created by a user to refer to some item of data, e.g. lfn:/grid/gilda/budapest23/run2/track1 Globally Unique Identifier (GUID) –A non-human-readable unique identifier for an item of data, e.g. guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Site URL (SURL) (or Physical File Name (PFN) or Site FN) –The location of an actual piece of data on a storage system, e.g. srm://pcrd24.cern.ch/flatfiles/cms/output10_1 (SRM) sfn://lxshare0209.cern.ch/data/alice/ntuples.dat (Classic SE) Transport URL (TURL) –Temporary locator of a replica + access protocol: understood by a SE, e.g. rfio://lxshare0209.cern.ch//data/alice/ntuples.dat

Enabling Grids for E-sciencE EGEE-II INFSO-RI Name conventions Users primarily access and manage files through “logical filenames” Mapping by the “LFC” catalogue server Defined by the userLFC Namespace LFC has a directory tree structure lfn:/grid/ /

Enabling Grids for E-sciencE EGEE-II INFSO-RI Storage Element 3 sfn://trigriden01.unime.it/flatfiles/SE00/gilda/generated/ /filec79a9e3c a2a5-235f Storage Element 2 srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/ /filea21ab3e2-8ff6-4a44-82a7-f2 LFC directories LFC directories = virtual directories –Each entry in the directory is a pointer to files stored on SEs lfn:/grid/gilda/budapest23/run2/ input1 input2 input3 Storage Element 1 sfn://grid005.iucc.ac.il/storage/gilda/generated/ /fileb233d43f-5bc6-4ede-a5fe-611d48be2ba5 LCG FileCatalogue (LFC) Storage Element 4 sfn://grid005.iucc.ac.it/flatfiles/SE00/gilda/generated/ /filec79a9e3c a2a5-235f

Enabling Grids for E-sciencE EGEE-II INFSO-RI Two sets of commands lfc-* LFC = LCG File Catalogue  LCG = LHC Compute Grid  LHC = Large Hadron Collider –Use LFC commands to interact with the catalogue only  To create catalogue directory  List files –Used by you, your application and by lcg-utils (see below) lcg-* –Couples catalogue operations with file management  Keeps SEs and catalogue in step! –Copy files to/from/between SEs –Replicated

Enabling Grids for E-sciencE EGEE-II INFSO-RI LFC basics Defined by the userLFC Namespace LFC has a directory tree structure /grid/ / All members of a given VO have read-write permissions in their directory Commands look like UNIX with “lfc-” in front (often)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Storage Element Provides –Storage for files : massive storage system - disk or tape based –Transfer protocol (gsiFTP) ~ GSI based FTP server  Striped file transfer – cluster as back-end Storage Element server File request + VOMS proxy File system Authentication, authorization

Enabling Grids for E-sciencE EGEE-II INFSO-RI GFAL C API GFAL (Grid File Access Library) is a POSIX interface for operation on file on Storage Element Enable remote handling of files Libraries are in C and can be included in C/C++ sources GFAL Java API – wrapper around the C code The most common of I/O operations are available, just prefix gfal_ to the function name (open(), read()…) man gfal for further details The destination SE must provide secure rfio (classic SEs don’t) GFAL API Description – deployment/documentation/LFC_DPM/gfal/htmlhttp://grid-deployment.web.cern.ch/grid- deployment/documentation/LFC_DPM/gfal/html

Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE Tutorial, Taipei, 1 May 2006 GFAL API code sniffet Examples in gLite3 User Guide (Appendix F) – int fd; struct stat remote_file_stat; fd = gfal_open(file_ref, O_RDONLY, 0644); cod_ex = gfal_stat(file_ref, &file_stat)... cod_ex = gfal_read(fd, buffer, file_stat.st_size));... cod_ex = gfal_close(fd);

Enabling Grids for E-sciencE EGEE-II INFSO-RI Metadata on the GRID Metadata is data about data On the EGEE Grid: information about files –Describes files –Locate files based on their metadata You many have 1000’s of files, being shared with other researchers –Either:  You all access data by remembering lfns (or guids…) .. And hope you know what is in the file… –Or  Have a metadata catalogue  Allow selection of files based on metadata Metadata is fundamental to e-research

Enabling Grids for E-sciencE EGEE-II INFSO-RI AMGA Implementation AMGA – ARDA Metadata Grid Application –ARDA: A Realisation of Distributed Analysis for LHC  Hundreds of millions of files  No special security requirements  Protection against DoS attacks Now part of gLite middleware –Official Metadata Service for EGEE –Also available as standalone component Expanding user community –HEP, Biomed, UNOSAT…

Enabling Grids for E-sciencE EGEE-II INFSO-RI Metadata concepts Schema Attribute 1: name 1– type 1 Attribute 2: name 2 – type 2 … Collection Entry 1 Entry 2 Entry 3 … A set of entries. Entries: The objects (e.g. files) that need to be described with metadata Schema: a set of attributes. Defines the structure of the metadata

Enabling Grids for E-sciencE EGEE-II INFSO-RI Metadata concepts Metadata catalog Schema Attribute 1: name 1– type 1 Attribute 2: name 2 – type 2 … Collection Entry 1 Entry 2 Entry 3 … Schema 2 Attribute 1: name 1– type 1 Attribute 2: name 2 – type 2 … Collection 2 Entry 1 Entry 2 Entry 3 … Schema 3 Attribute 1: name 1– type 1 Attribute 2: name 2 – type 2 … Collection 3 Entry 1 Entry 2 Entry 3 … Schema 4 Attribute 1: name 1– type 1 Attribute 2: name 2 – type 2 … Collection 4 Entry 1 Entry 2 Entry 3 …

Enabling Grids for E-sciencE EGEE-II INFSO-RI Metadata Concepts Some Concepts –Metadata - List of attributes associated with entries –Attribute – name/value pair with type information  Type – The type (int, float, string,…)  Name – The name of the attribute  Value - Value of an entry's attribute –Schema – A set of attributes –Collection – A set of entries associated with a schema –Think of schemas as tables, attributes as columns, entries as rows

Enabling Grids for E-sciencE EGEE-II INFSO-RI Implementation of the concept in AMGA Schema lfn varchar(100) description varchar(200) Collection /grid/sipos/run2 AMGA server lfn:/grid/gilda/sipos/maps/hungary “Map of Hungary” The collection is a directory on the AMGA file system A schema is a table in an Relational Data Base. One schema is associated to each directory of the file system Input1 Input2 lfn:/grid/gilda/sipos/temp/data “Temperature values of Hungarian cities, ” images Files in an AMGA directory are entities described by metadata Content of AMGA files are irrelevant. Metadata is stored in the DB records. A DB record is stored for each file Collections can be nested Sub-Schema lfn varchar(100) description varchar(200) X_res int Y_res int

Enabling Grids for E-sciencE EGEE-II INFSO-RI An example: AMGA and LFC in UNOSAT ◘ LFC Catalogue ➸ Mapping of LFN to TURL ◘ UNOSAT requires ➸ User will give as input data certain coordinates (x, y, z) ➸ As output, want the satellite image file for downloading ◘ The ARDA Group assists us setting up the AMGA tool for UNOSAT AMGA Oracle DB ARDA APP LFC Storage Element SRM Metadata (x,y,z) LFN TURL

Enabling Grids for E-sciencE EGEE-II INFSO-RI During practicals1: LFC and LCG utils List directory Create a local file then upload it to an SE and register with a logical name (lfn) in the catalogue Create a duplicate in another SE List the replicas LCG File Catalogue (LFC) Storage Element 1 “User interface” Storage Element 2 lfc-* lcg-*

Enabling Grids for E-sciencE EGEE-II INFSO-RI List directory Create a local file then upload it to an SE and register with a logical name (lfn) in the catalogue Create a duplicate in another SE List the replicas Create a second logical file name for a file Download a file from an SE to the UI LCG File Catalogue (LFC) Storage Element 1 “User interface” Storage Element 2 ? lcg-* lfc-* During practicals1: LFC and LCG utils

Enabling Grids for E-sciencE EGEE-II INFSO-RI During practicals2: GFAL examples Write a file to an SE Read a file from an SE Submit the reader code as a job into the GILDA, read the file remotelyStorage Element 1 GFAL writer GFAL reader “User interface” ComputingElement GFAL reader

Enabling Grids for E-sciencE EGEE-II INFSO-RI During practicals3: AMGA examples Create metadata collections Manage metadata schemas … $ mdclient Connecting to amga.ct.infn.it: ARDA Metadata Server Query> help commands Query> help command_name

Enabling Grids for E-sciencE EGEE-II INFSO-RI Please go to the web page for this practical

Enabling Grids for E-sciencE EGEE-II INFSO-RI Spare slides follow – could be used after the practical

Enabling Grids for E-sciencE EGEE-II INFSO-RI LFC Catalog commands Add/replace a commentlfc-setcomment Set file/directory access control listslfc-setacl Remove a file/directorylfc-rm Rename a file/directorylfc-rename Create a directorylfc-mkdir List file/directory entries in a directorylfc-ls Make a symbolic link to a file/directorylfc-ln Get file/directory access control listslfc-getacl Delete the comment associated with the file/directorylfc-delcomment Change owner and group of the LFC file-directorylfc-chown Change access mode of the LFC file/directorylfc-chmod Summary of the LFC Catalog commands

Enabling Grids for E-sciencE EGEE-II INFSO-RI Summary of lcg-utils commands Replica Management lcg-cpCopies a grid file to a local destination lcg-crCopies a file to a SE and registers the file in the catalog lcg-delDelete one file lcg-repReplication between SEs and registration of the replica lcg-gtGets the TURL for a given SURL and transfer protocol lcg-sdSets file status to “Done” for a given SURL in a SRM request

Enabling Grids for E-sciencE EGEE-II INFSO-RI Summary of fts client commands FTS client glite-transfer-submitSubmit a transfer job : needs at least source and destination SURL glite-transfer-statusGiven one or more job ID, query about their status glite-transfer-cancelDelete the transfer with the give Job ID glite-transfer-listQuery about status of all user’s jobs; support options for query restrictions glite-transfer- channel-list Show all available channel; detailed info only if user has admin privileges

Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE Tutorial, Taipei, 1 May 2006 LFC server If a site acts as a central catalog for several VOs, it can either have: One LFC server, with one DB account containing the entries of all the supported VOs. You should then create one directory per VO. Several LFC servers, having each a DB account containing the entries for a given VO. Both scenarios have consequences on the handling of database backups Minimum requirements (First scenario) 2Ghz processor with 1GB of memory (not a hard requirement) Dual power supply Mirrored system disk