Data Management The European DataGrid Project Team

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Author - Title- Date - n° 1 GDMP The European DataGrid Project Team
1 CHEP 2000, Roberto Barbera Tests of data management services in EDG 1.2 ALICE Off-line Week,
Job Submission The European DataGrid Project Team
Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
10 April 2003Deploy Grid in Israel Universities1 Deploy Grid testbed in Israel universities Lorne Levinson David Front Weizmann Institute.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
The EU DataGrid Architecture The European DataGrid Project Team
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
File and Object Replication in Data Grids Chin-Yi Tsai.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Magda Distributed Data Manager Status Torre Wenaus BNL ATLAS Data Challenge Workshop Feb 1, 2002 CERN.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Grid Scheduler: Plan & Schedule Adam Arbree Jang Uk In.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
CGW 04, Stripped replication for the grid environment as a web service1 Stripped replication for the Grid environment as a web service Marek Ciglan, Ondrej.
Stephen Burke – Data Management - 3/9/02 Partner Logo Data Management Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF.
University of Bristol 5th GridPP Collaboration Meeting 16/17 September, 2002Owen Maroney University of Bristol 1 Testbed Site –EDG 1.2 –LCFG GridPP Replica.
Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
ATLAS Magda Distributed Data Manager Torre Wenaus BNL PPDG Robust File Replication Meeting Jefferson Lab January 10, 2002.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
10 May 2001WP6 Testbed Meeting1 WP5 - Mass Storage Management Jean-Philippe Baud PDP/IT/CERN.
Data Management The European DataGrid Project Team
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
Author - Title- Date - n° 1 Partner Logo WP5 Status John Gordon Budapest September 2002.
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
Magda Distributed Data Manager Torre Wenaus BNL October 2001.
The EDG Testbed Deployment Details
Classic Storage Element
The Data Grid: Towards an architecture for Distributed Management
The European DataGrid Project Team
gLite Data management system overview
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Data Management in Release 2
OGSA Data Architecture Scenarios
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
The EU DataGrid Data Management
Data services in gLite “s” gLite and LCG.
The EU DataGrid Data Management
The EU DataGrid Genius Data Management Services
Grid Data Replication Kurt Stockinger Scientific Data Management Group Lawrence Berkeley National Laboratory.
Architecture of the gLite Data Management System
Presentation transcript:

Data Management The European DataGrid Project Team

EDG DataManagement Tutorial - n° 2 Overview  Data Management Issues  Main Components n EDG Replica Catalog n EDG Replica Manager n GDMP

EDG DataManagement Tutorial - n° 3 Data Management Issues

EDG DataManagement Tutorial - n° 4 Data Management Issues

EDG DataManagement Tutorial - n° 5 Data Management Tools  Tools for n Locating data n Copying data n Managing and replicating data n Meta Data management  On EDG Testbed you have n EDG Replica catalog n globus-url-copy (GridFTP) n EDG Replica Manager n Grid Data Mirroring Package (GDMP) n Spitfire

EDG DataManagement Tutorial - n° 6 EDG Replica Catalog  Based upon the Globus LDAP Replica Catalog  Stores LFN/PFN mappings and additional information (e.g. filesize): n Physical File Name (PFN): host + full path & and file name n Logical File Name (LFN): logical name that may be resolved to PFNs n LFN : PFN = 1 : n  Only files on storage elements may be registered  Each VO has a specific storage dir on an SE  Example PFN: lxshare0222.cern.ch/flatfiles/SE1/iteam/file1.dat host storage dir  LFN must be full path of file starting from storage dir LFN of above PFN: file1.dat

EDG DataManagement Tutorial - n° 7 EDG Replica Catalog  API and command line tools n addLogicalFileName n getLogicalFileName n deleteLogicalFileName n getPhysicalFileName n addPhysicalFileName n deletePhysicalFileName n addLogicalFileAttribute n getLogicalFileAttribute n deleteLogicalFileAttribute

EDG DataManagement Tutorial - n° 8 globus-url-copy  Low level tool for secure copying globus-url-copy :// \ ://  Main Protocols: n gsiftp – for secure transfer, only available on SE and CE n file – for accessing files stored on the local file system on e.g. UI, WN globus-url-copy file://`pwd`/file1.dat \ gsiftp://lxshare0222.cern.ch/ \ flatfiles/SE1/EDGTutorial/file1.dat

EDG DataManagement Tutorial - n° 9 The EDG Replica Manager  Extends the Globus replica manager  Only client side tool  Allows replication (copy) and registering of files in RC  Keeps RC consistent with stored data.

EDG DataManagement Tutorial - n° 10 The Replica Manager APIs  (un)registerEntry(LogicalFileName lfn, FileName source) n Replica Catalogue operations only - no file transfer  copyFile(FileName source, FileName destination, String protocol) n allows for third-party transfer n transfer between:  two StorageElements or  ComputingElement and Storage Element  Space management policies under development n all tools support parallel streams for file transfers

EDG DataManagement Tutorial - n° 11  copyAndRegisterFile(LogicalFileName lfn, FileName source, FileName destination, String protocol) n third-party transfer but : files can only be registered in Replica Catalogue if destination PFN contains a valid SE (i.e. needs to be registered in the RC)!  replicateFile(LogicalFileName lfn, FileName source, FileName destination, String protocol)  deleteFile(LogicalFileName lfn, FileName source) The Replica Manager APIs

EDG DataManagement Tutorial - n° 12  based on CMS requirements for replicating Objectivity files for High Level Trigger studies  production prototype project for evaluating Grid technologies (especially Globus)  experience will directly be used in DataGrid n input also for PPDG and GriPhyN 

EDG DataManagement Tutorial - n° 13 Overview of Components Globus Replica Catalogue Site1 Site3 Site2 GDMP client

EDG DataManagement Tutorial - n° 14 Subscription Model n All the sites that subscribe to a particular site get notified whenever there is an update in its catalog. Site 1 Site 3 Site 2 Subscriber list Subscriber list subscribe

EDG DataManagement Tutorial - n° 15 Export / Import Catalogue n Export Catalog  information about the new files produced.  is published n Import Catalog  information about the files which have been published by other sites but not yet transferred locally  As soon as the file is transferred locally, it is removed from the import catalogue. n Possible to pull the information about new files into your import catalogue. Site 1 Site 3 export catalog import catalog Site 2 export catalog 1)register, publish new files 2) transfer files 1) get info about new files 3) delete files

EDG DataManagement Tutorial - n° 16 Usage  gdmp_ping n Ping a GDMP server and get its status  gdmp_host_subscribe n first thing to be done by a site  gdmp_register_local_file n Registers a file in local file catalogue but NOT in Replica Catalogue (RC)  gdmp_publish_catalogue n send information of newly created files to subscribed hosts (no real data transfer) – update RC  gdmp_replicate_get - gdmp_replicate_put n get/put all the files from the import catalogue – update RC  gdmp_remove_local_file n Delete a local file and update RC  gdmp_get_catalogue n Get remote catalogue contents – for error recovery

EDG DataManagement Tutorial - n° 17 Using GDMP Site 2 Site 1 Site 3 Site 4 Site 5 Data produced at site 1 to be replicated to other sites Register all files in a directory at site 1 gdmp_register_local_file –d /data/files /data/files/file1 /data/files/file2 …

EDG DataManagement Tutorial - n° 18 Using GDMP 2  Start with subscription n gdmp_host_subscribe –r -p Site 2 Site 1 Site 3 Site 4 Site 5 gdmp_host_subscribe Subscriber list

EDG DataManagement Tutorial - n° 19 Using GDMP 3  Publish new files…can combine with filtering n gdmp_publish_catalogue (might use filter option) Site 2 Site 1 Site 3 Site 4 Site 5 Subscriber list gdmp_publish_catalogue Export catalog Import catalog Import catalog Import catalog

EDG DataManagement Tutorial - n° 20 Site 2 Site 1 Site 3 Site 4 Site 5 Subscriber list Export catalog Import catalog Import catalog Import catalog gdmp_get_catalogue Import catalog Using GDMP 4  Poll for change in catalog (pull model)…can combine with filtering…also used for error recovery. n gdmp_get_catalogue –host

EDG DataManagement Tutorial - n° 21 Site 2 Site 1 Site 3 Site 4 Site 5 Subscriber list Export catalog Import catalog Import catalog Import catalog Import catalog gdmp_replicate_get Using GDMP 5  Transfer files…can use the progress meter n gdmp_replicate_get n get_progress_meter…produces a progress.log. n replica.log has all files already transferred.

EDG DataManagement Tutorial - n° 22 GDMP vs. EDG Replica Manager  GDMP n Replicates sets of files n Replication between SEs n Mass storage interface n File size as logical attribute n Subscription model n Event notification n CRC file size check n Support for Objectivity  Replica Manager n Replicates single files n Replication between SEs, CEs to SE.

EDG DataManagement Tutorial - n° 23 File Management Summary Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D File Transfer

EDG DataManagement Tutorial - n° 24 File Management Summary Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer

EDG DataManagement Tutorial - n° 25 File Management Summary Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Selection: Get ‘best’ file

EDG DataManagement Tutorial - n° 26 File Management Summary Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file

EDG DataManagement Tutorial - n° 27 File Management Summary Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription

EDG DataManagement Tutorial - n° 28 File Management Summary Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage

EDG DataManagement Tutorial - n° 29 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Manager: ‘atomic’ replication operation single client interface orchestrator Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage

EDG DataManagement Tutorial - n° 30 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Manager: ‘atomic’ replication operation single client interface orchestrator Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage Metadata: LFN metadata Transaction information Access patterns

EDG DataManagement Tutorial - n° 31 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Manager: ‘atomic’ replication operation single client interface orchestrator Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage Metadata: LFN metadata Transaction information Access patterns