Presentation is loading. Please wait.

Presentation is loading. Please wait.

GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October 16--20, 2000.

Similar presentations


Presentation on theme: "GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October 16--20, 2000."— Presentation transcript:

1 GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October 16--20, 2000

2 Mission Statement Use case (CMS) requirements Data Model Architecture and Middle-ware Integration into the CMS environment Deliverables, milestones and status Performance results Conclusions Introduction

3 A prototype project for data management modules of other projects like DataGrid, PPDG and GriPhyn. Can be used to test new ideas, strategies and tools to be later adopted. A project with low inertia and the team is open to everyone for collaboration. CMS as the first use-case so currently a CMS specific implementation. Discussions with Babar, ROOT and LHCb are in progress. Mission Statement

4 Data Management Requirements –Data production (currently tens of Terabytes). –Managing the data locally at regional centers. –Replicating this data to other centers. High speed transfers. Secure access. Disk management facilities Minimizing human interference. Fault tolerance and error recovery mechanisms. –Data integration on the destination. –Logging and book-keeping. Submit job Replicate data Replicate data Site A Site B Site C " Jobs are executed locally or remotely " Data is always written locally " Data is replicated to remote sites Job writes data locally Use Case (CMS) Requirements

5 Assumptions –The replicated files are Objectivity files only (current restriction to be removed by December). –All participating sites have their own Objectivity federations All sites are schema compatible. Database file names are globally unique. Database Ids are globally unique. Data Model

6 …Data Model Subscription Model –All the sites that subscribe to a particular site get notified whenever there is an update in its catalog. –The sites that don’t subscribe have to poll themselves for any changes in the catalog. –Support both a pull and a push mechanism. Site 1 Site 3 Site 2 Export/Import Catalog Model Subscriber list Subscriber list subscribe Poll for changes

7 …Data Model Catalog Model –Export Catalog Contains information about the new files produced. –Import Catalog Contains the information about the files which have been published by other sites but not yet transferred locally. As soon as the file is transferred locally, it is removed from the import catalog. –Possible to pull the information about new files in your import catalog. Site 1 Site 3 Export catalog Import catalog Site 2 Export catalog 1) Publish new files 2) Transfer files Export/Import Catalog Model 1) Get info about new files 3) Delete files

8 Layered Architecture Modular Flexibility Extensibility Re-usability Blah blah… Globus solves most middleware problems Architecture and Middle-ware Information Service Application Request Manager Replica Manager gssapi GIS Globus Rep. Manager Security Globus-ftp Globus_io Layered Architecture for Distributed Data Management Control Comm. Data Mover DB Manager Objy API Globus-dc Globus- threads

9 Physics software CheckDB script Production federation User federation MSS Stage & Purge scripts catalog Copy file to MSS Update catalog Purge file Generate new catalog Publish new catalog Subscriber’s list Write DBDB completeness check Stage file (opt) trigger GDMP export catalog GDMP import catalog GDMP server Generate import catalog Replicate files trigger User federation catalog MSS Stage & Purge scripts Copy file to MSS Purge file Transfer & attach trigger write read CMS environment GDMP system CMS/GDMP interface wan Site B Site A Integration into the CMS environment

10 Performance Results pccit1 cmsb21 cmsun1 FNAL suncms66jasper CERN Caltech kBytes/sec 790 KB/sec 1152KB/sec802 KB/sec 590 KB/sec Comparison with plain manual FTPs

11 Deliverables and Milestones First Prototype –Released in September 2000 Second Prototype –Jan 2001 Updated Replica Manager (Globus Replica Catalog) File transfer updates (GridFTP libraries) Final Prototype –Jun 2001 Information services –Network monitoring (using NWS) –Data server loads Replica selection

12 A prototype project for other Data Grid projects A production system for CMS Design is flexible enough to incorporate extensions and/or modifications Automates the replication process Ready to be used…download from http://cmsdoc.cern.ch/cms/grid Conclusions


Download ppt "GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October 16--20, 2000."

Similar presentations


Ads by Google