Presentation is loading. Please wait.

Presentation is loading. Please wait.

Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.

Similar presentations


Presentation on theme: "Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute."— Presentation transcript:

1 Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute

2 Motivation for Data Replication Services l Data-intensive applications need higher-level data management services that integrate lower-level Grid functionality u Efficient data transfer (GridFTP, RFT) u Replica registration and discovery (RLS) u Eventually validation of replicas, consistency management, etc. l Goal is to generalize the custom data management systems developed by several application communities l Eventually plan to provide a suite of general, configurable, higher-level data management services l Globus Data Replication Service (DRS) is the first of these services

3 The Data Replication Service l Included in the Tech Preview of GT4.0 release l Design is based on the publication component of the Lightweight Data Replicator system u Developed by Scott Koranda from U. Wisconsin at Milwaukee l Functionality u Replicate a set of files in the Grid on a local site u Users identify a set of desired files u DRS queries Replica Location Service to discover current locations of these files u Creates local replicas of desired files using the Reliable File Transfer Service u Registers new replicas in Replica Location Service for discovery

4 Outline l Terminology l Functionality of Data Replication Service l Background: Components used by DRS u Replica Location Service u GridFTP Data Transport protocol u Reliable File Transfer Service l DRS Design l Implementation in GT4 environment l Evaluation of DRS performance in a wide area Grid l Future work

5 Some Terminology l A logical file name (LFN) is a unique identifier for the contents of a file u Typically, a scientific collaboration defines and manages the logical namespace u Guarantees uniqueness of logical names within that organization l A physical file name (PFN) is the location of a copy of the file on a storage system. u The physical namespace is managed by the file system or storage system l For example, the LIGO environment currently contains: u More than six million unique logical files u More than 40 million physical files stored at ten sites

6 DRS Overview l Client uses DRS interface to specify which files are required at local site l DRS uses: u Delegation Service to delegate proxy credentials u Globus RLS to discover whether replicas exist locally and where they exist in the Grid u Selection algorithm to choose among available source replicas u Globus Reliable File Transfer service to copy data to local site l This uses GridFTP data transport protocol u Globus RLS to register new replicas

7 Background: The Replica Location Service A Replica Location Service (RLS) is a distributed registry that records the locations of data copies and allows replica discovery u RLS maintains mappings between logical identifiers and target names u Must perform and scale well: support hundreds of millions of objects, hundreds of clients l E.g., Laser Interferometer Gravitational Wave Observ. u RLS servers at 8 sites u Maintain associations between 6 million logical file names & 40 million physical file locations

8 LRC RLI LRC Replica Location Indexes Local Replica Catalogs Replica Location Index (RLI) nodes aggregate information about one or more LRCs LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index Optional compression of state updates reduces communication, CPU and storage overheads RLS Features Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings

9 Background: GridFTP l A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol l Features: u Standard FTP: get/put etc., 3 rd -party transfer u GSS binding, extended directory listing, simple restart u Striped/parallel data channels u Partial file u TCP buffer setting u Progress monitoring, extended restart l The Globus Toolkit supplies a reference implementation: u Server u Client tools (globus-url-copy) u Development Libraries

10 Background: Reliable File Transfer Service l RFT accepts SOAP description of transfer l Writes state to a database l Uses Java GridFTP client library to initiate 3 rd part transfers l Restart Markers stored in the database u Allow for restart in the event of RFT failure l Supports concurrency, i.e., multiple files in transit l Check status: u Subscribe to notifications u Poll for status Control Data Control Data RFT Service RFT Client SOAP Messages Notifications (Optional)

11 DRS Functionality l Initiate a DRS Request l Create a delegated credential l Create a Replicator resource l Monitor Replicator resource l Discover replicas of desired files in Replica Location Service, select among replicas l Transfer data to local site with Reliable File Transfer Service l Register new replicas in RLS catalogs l Allow client inspection of DRS results l Destroy Replicator resource DRS implemented in Globus Toolkit Version 4, complies with Web Services Resource Framework (WS-RF)

12 Relationship to Other Globus Services At requesting site, deploy: l WS-RF Services u Data Replication Service u Delegation Service u Reliable File Transfer Service l Pre WS-RF Components u Replica Location Service (Local Replica Catalog, Replica Location Index) u GridFTP Server

13 WSRF in a Nutshell l Service l State Management: u Resource u Resource Property l State Identification: u Endpoint Reference l State Interfaces: u GetRP, QueryRPs, GetMultipleRPs, SetRP l Lifetime Interfaces: u SetTerminationTime u ImmediateDestruction l Notification Interfaces u Subscribe u Notify l ServiceGroups RPs Resource Service GetRP GetMultRPs SetRP QueryRPs Subscribe SetTermTime Destroy EPR

14 Service Container Create Delegated Credential Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP proxy Initialize user proxy cert. Create delegated credential resource Set termination time Credential EPR returned EPR

15 Service Container Create Replicator Resource Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Create Replicator resource Pass delegated credential EPR Set termination time Replicator EPR returned EPR Replicator RP Access delegated credential resource

16 Service Container Monitor Replicator Resource Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Periodically polls Replicator RP via GetRP or GetMultRP Add Replicator resource to MDS Information service Index Index RP Subscribe to ResourceProperty changes for “Status” RP and “Stage” RP Conditions may trigger alerts or other actions (Trigger service not pictured) EPR

17 Service Container Query Replica Information Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Stage” RP value changed to “discover” Replicator queries RLS Replica Index to find catalogs that contain desired replica information Replicator queries RLS Replica Catalog(s) to retrieve mappings from logical name to target name (URL)

18 Service Container Transfer Data Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Stage” RP value changed to “transfer” Create Transfer resource Pass credential EPR Set Termination Time Transfer resource EPR returned Transfer RP EPR Access delegated credential resource Setup GridFTP Server transfer of file(s) Data transfer between GridFTP Server sites Periodically poll “ResultStatus” RP via GetRP When “Done”, get state information for each file transfer

19 Service Container Register Replica Information Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Stage” RP value changed to “register” RLS Replica Catalog sends update of new replica mappings to the Replica Index Transfer RP Replicator registers new file mappings in RLS Replica Catalog

20 Service Container Client Inspection of State Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Status” RP value changed to “Finished” Transfer RP Client inspects Replicator state information for each replication in the request

21 Service Container Resource Termination Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Termination time (set by client) expires eventually Transfer RP Resources destroyed (Credential, Transfer, Replicator) TIME

22 Performance Measurements: Wide Area Testing l The destination for the pull-based transfers is located in Los Angeles u Dual-processor, 1.1 GHz Pentium III workstation with 1.5 GBytes of memory and a 1 Gbit Ethernet u Runs a GT4 container and deploys services including RFT and DRS as well as GridFTP and RLS l The remote site where desired data files are stored is located at Argonne National Laboratory in Illinois u Dual-processor, 3 GHz Intel Xeon workstation with 2 gigabytes of memory with 1.1 terabytes of disk u Runs a GT4 container as well as GridFTP and RLS services

23 DRS Operations Measured l Create the DRS Replicator resource l Discover source files for replication using local RLS Replica Location Index and remote RLS Local Replica Catalogs l Initiate an Reliable File Transfer operation by creating an RFT resource l Perform RFT data transfer(s) l Register the new replicas in the RLS Local Replica Catalog

24 Experiment 1: Replicate 10 Files of Size 1 Gigabyte Component of Operation Time (milliseconds) Create Replicator Resource317.0 Discover Files in RLS 449.0 Create RFT Resource 808.6 Transfer Using RFT 1186796.0 Register Replicas in RLS 3720.8 l Data transfer time dominates l Wide area data transfer rate of 67.4 Mbits/sec

25 Experiment 2: Replicate 1000 Files of Size 10 Megabytes Component of Operation Time (milliseconds) Create Replicator Resource1561.0 Discover Files in RLS 9.8 Create RFT Resource 1286.6 Transfer Using RFT 963456.0 Register Replicas in RLS 11278.2 l Time to create Replicator and RFT resources is larger u Need to store state for 1000 outstanding transfers l Data transfer time still dominates l Wide area data transfer rate of 85 Mbits/sec

26 Future Work l Continued performance testing of DRS: u Increasing the size of the files being transferred u Increasing the number of files per DRS request l Add and refine DRS functionality as needed by GEON and other applications u E.g., add a push-based replication capability u Add fine-grained authorization capability to RLS, DRS l Long-term: u Will develop a suite of general, configurable, composable, high-level data management services


Download ppt "Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute."

Similar presentations


Ads by Google