Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Similar presentations


Presentation on theme: "The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute."— Presentation transcript:

1 The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute

2 The Data Replication Service l Included in the Tech Preview of GT4.0 release l Design is based on the publication component of the Lightweight Data Replicator system u Developed by Scott Koranda from U. Wisconsin at Milwaukee l Functionality u Replicate a set of files in the Grid on a local site u Users identify a set of desired files u DRS queries Replica Location Service to discover current locations of these files u Creates local replicas of desired files using the Reliable File Transfer Service u Registers new replicas in Replica Location Service for discovery

3 Motivation for DRS l Need for higher-level data management services that integrate lower-level Grid functionality u Efficient data transfer (GridFTP, RFT) u Replica registration and discovery (RLS) u Eventually validation of replicas, etc. l Goal is to generalize the custom data management systems developed by several application communities l Eventually plan to provide a suite of general, configurable, higher-level data management services l DRS is the first of these services

4 Relationship to Other Globus Services At requesting site, deploy: l WS-RF Services u Data Replication Service u Delegation Service u Reliable File Transfer Service l Pre WS-RF Components u Replica Location Service (Local Replica Catalog and Replica Location Index) u GridFTP Server

5 DRS Functionality l Initiate a DRS Request l Discover and select among replicas that act as source locations for data copies l Transfer data to local site to create new replicas l Register new replicas in catalogs

6 Initiating a DRS Request l Client uses GT4 Delegation Service to create a delegated credential that may be used by other services to act on behalf of user l Client creates a request file containing a replication request description including: u desired logical files u destination URLs l Client sends message to DRS to create the Replicator resource and passes the request file’s URL l Replicator retrieves the request file

7 Replica Discovery and Selection l Replicator queries the Globus Replica Location Service in a two-step process to discover locations of desired files: u Query local site’s Replica Location Index to find the catalogs at remote sites that contain mappings for the requested files u Query remote Local Replica Catalogs to get the physical file names of the replicas l Replicator selects source file for each file to be copied u Current implementation chooses randomly u A callout is provided for more sophisticated replica selection decisions based on state of Grid

8 File Transfers to Create New Replicas l The Replicator initiates a reqeust with Globus Reliable File Transfer Service u Creates RFT resource that holds state for each data transfer l Control passes from DRS to RFT, which also retrieves the delegated credential from the Delegation Service l RFT coordinates the file transfers l Transfers are performed by GridFTP servers at the source and destination sites l After transfers complete, the Replicator checks status of each file in the transfer request

9 Registration of New Replicas l Replicator adds mappings for the newly created replicas to its Globus RLS Local Replica Catalog l Local Replica Catalog updates Replica Location Indexes to make new replicas visible throughout Grid

10 Performance Measurements: Wide Area Testing l The destination for the pull-based transfers is located in Los Angeles u Dual-processor, 1.1 GHz Pentium III workstation with 1.5 GBytes of memory and a 1 Gbit Ethernet u Runs a GT4 container and deploys services including RFT and DRS as well as GridFTP and RLS l The remote site where desired data files are stored is located at Argonne National Laboratory in Illinois u Dual-processor, 3 GHz Intel Xeon workstation with 2 gigabytes of memory with 1.1 terabytes of disk u Runs a GT4 container as well as GridFTP and RLS services

11 DRS Operations Measured l Create the DRS Replicator resource l Discover source files for replication using local RLS Replica Location Index and remote RLS Local Replica Catalogs l Initiate an Reliable File Transfer operation by creating an RFT resource l Perform RFT data transfer(s) l Register the new replicas in the RLS Local Replica Catalog

12 Experiment 1: Replicate 10 Files of Size 10 Gigabytes Component of Operation Time (milliseconds) Create Replicator Resource317.0 Discover Files in RLS 449.0 Create RFT Resource 808.6 Transfer Using RFT 1186796.0 Register Replicas in RLS 3720.8 l Data transfer time dominates l Wide area data transfer rate of 67.4 Mbits/sec

13 Experiment 2: Replicate 1000 Files of Size 10 Megabytes Component of Operation Time (milliseconds) Create Replicator Resource1561.0 Discover Files in RLS 9.8 Create RFT Resource 1286.6 Transfer Using RFT 963456.0 Register Replicas in RLS 11278.2 l Time to create Replicator and RFT resources is larger u Need to store state for 1000 outstanding transfers l Data transfer time still dominates l Wide area data transfer rate of 85 Mbits/sec

14 Future Work l We will continue performance testing of DRS: u Increasing the size of the files being transferred u Increasing the number of files per DRS request l Add and refine DRS functionality as it is used by applications u E.g., add a push-based replication capability l We plan to develop a suite of general, configurable, composable, high-level data management services


Download ppt "The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute."

Similar presentations


Ads by Google