Presentation is loading. Please wait.

Presentation is loading. Please wait.

DataGrid is a project funded by the European Union CHEP 2003 – 24-28 March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and.

Similar presentations


Presentation on theme: "DataGrid is a project funded by the European Union CHEP 2003 – 24-28 March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and."— Presentation transcript:

1 DataGrid is a project funded by the European Union CHEP 2003 – 24-28 March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and Supporting Data Management Services in the EU DataGrid Project Flavia Donno (Former EDG WP2, LCG) Flavia.Donno@cern.ch http://chep03.ucsd.edu/files/249.ppt

2 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 2 Talk Outline u Introduction u Replication Tools u Architecture Overview u GDMP and edg-replica-manager details u History and Deployment u Summary and Future Work Authors Heinz Stockinger – CERN/EP, CMS Flavia Donno, CERN/IT LCG and INFN Pisa Erwin Laure, Shazhad Muzaffar – CERN/EP Giuseppe Andronico – INFN Catania Peter Kunszt - CERN/IT Paul Millar - PPARC

3 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 3 Introduction u Data management: large amounts of data at distributed sites SE u Assumption: data is read-only u Replication is required between Storage Elements (SEs) u In Grid environment n File transfer from User Interface and Computing Nodes to Storage resources n Upload of files into Grid

4 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 4 Replication Tools u We have designed, developed and deployed two major replication packages: n GDMP - Grid Data Mirroring Package n edg-replica-manager EDGPPDG u GDMP was a pioneer effort started initially in the CMS collaboration. It became later a joint project between EDG and PPDG. It allows for mirroring of data between Storage Elements through a host subscription method. u edg-replica-manager deals with point-to-point single file replication. The tool is built around the Globus Replica Manager and Replica Catalogue/Replica Location Service libraries.

5 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 5 Globus Replica Catalog or Replica Location Service Globus Replica Catalog or Replica Location Service GDMP in detail StorageElement1StorageElement3StorageElement2 GDMP client

6 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 6 Subscription Model n All the sites that subscribe to a particular site get notified whenever there is an update in its catalog. Site 1 Site 3 Site 2 Subscriber list Subscriber list subscribe

7 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 7 SE SESE Architecture Overview GDMP GDMP GridFtp gdmp_replicate_get fileA Globus Replica Catalog or Replica Location Service Globus Replica Catalog or Replica Location Service MSS MSS UIWN GDMP Client GDMP Pros Very stable and scalable architecture Reliable and robust replication retries on error file checksumming complex logging Users can control file transfer via local catalogues Back-ends available for actions to be performed on replication (MSS hooks, automatic replication, post replication actions,…) MSS interface GDMP Cons It was designed to handle mirroring among sites and not for point-to-point replication Several steps involved for replication Configuration difficult: can be improved, with the introduction of new Grid services No space management provided since it is responsibility of the SE service Error messages not always clear Some time recovery from errors requires manual intervention

8 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 8 edg-replica-manager in detail u Extends the Globus replica manager u Only client side tool u Allows for replication (copy) and registering of files in RC n works with LDAP based Globus Replica Catalog and Replica Location Service u Keeps RC consistent with stored data. u Uses GDMP’s staging interface to stage to MSS

9 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 9 SE SESEGDMP GDMP GridFtp Architecture Overview GDMP Client Globus Replica Catalog or Replica Location Service Globus Replica Catalog or Replica Location Service Edg-replica-manager fileB edg-rm-creg fileA Edg-rm/edg-rc Pros User friendly interface Functional Third party transfer available GSI authorization available for RM and RC Easy configuration Edg-rm/edg-rc Cons RM: Error messages not always clear RM: No roll-back; no transactions RM: No complete interface to schema RC: Performance deterioration with number of entries RC: Centralized, non-scalable RC: No high level user CLI for browsing RC: Schema non flexible MSS MSS

10 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 10 GDMP vs edg-replica-manager u GDMP n Replicates sets of files n Replication between SEs only n Mass storage interface n logical file attributes (size, timestamp, etc. … extensible) n Subscription model n Event notification n CRC file size check n Support for Objectivity/DB n Automatic retries n Support for multiple VOs u Replica Manager n Replicates single files n Replication between SEs, UI or CE to SE. n Uses GDMP’s Mass Storage interface at the SE client-server client side only

11 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 11 History: Replication tool development GDMP 1.x September 2000 u First prototype of basic SE-SE replication of Objectivity files u Based on Globus 1.1.3 GDMP 2.x October 2001 u General file replication tools (not only Objectivity files) u Uses GridFTP + Globus Replica Catalog u Full Mass Storage Support GDMP 3.x April 2002 u Split into client and server side tool u Improved server functionality/security u Support for multiple VO Edg-replica-manager 1.x May 2002 u Based on globus-replica-management and globus-replica-catalog libs Edg-replica-manager 2.x December 2002 u Several improvement – Replica Location Service binding GDMP 3.2.x October 2002 u RLS + several improvements GDMP 4.0 October 2002 u Globus 2.2.4 + RH 7.3 gcc 2.95.2 + gcc 3.2

12 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 12 Deployment u GDMP first used for High Level Trigger studies (“production”) of HEP experiments in 2000/2001 n Replication between SEs u Later introduced also in European DataGrid testbed: n Requirements changed: s All user commands needed to be executed from a User Interface machine or from Worker Nodes of Computing Element n Caused some redesign u Both tools (GDMP and edg-replica-manager) are used in European and US testbeds n EDG n EDG: ATLAS, CMS, Alice and LHCb stress tests n WorldGrid n WorldGrid: first transatlantic testbed – interoperable tools n LCG-0 n LCG-0: deployed and interoperable with WorldGrid and GLUE testbeds We thank our user community for valuable feedback

13 CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 13 Summary and Future Work u First generation of EDG replica management tools satisfy basic use case and requirements u Client-only tools are simple to use but no server side logging u Limitations of certain services proved: Globus and EDG working together to design and implement new tools u A lot of experience gained: new software tools under development (see talk “Next-Generation EU DataGrid Data Management Services “)Next-Generation EU DataGrid Data Management Services Thanks to the EU and our national funding agencies for their support of this work


Download ppt "DataGrid is a project funded by the European Union CHEP 2003 – 24-28 March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and."

Similar presentations


Ads by Google