The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute
The Data Replication Service l Included in the Tech Preview of GT4.0 release l Design is based on the publication component of the Lightweight Data Replicator system u Developed by Scott Koranda from U. Wisconsin at Milwaukee l Functionality u Replicate a set of files in the Grid on a local site u Users identify a set of desired files u DRS queries Replica Location Service to discover current locations of these files u Creates local replicas of desired files using the Reliable File Transfer Service u Registers new replicas in Replica Location Service for discovery
Motivation for DRS l Need for higher-level data management services that integrate lower-level Grid functionality u Efficient data transfer (GridFTP, RFT) u Replica registration and discovery (RLS) u Eventually validation of replicas, etc. l Goal is to generalize the custom data management systems developed by several application communities l Eventually plan to provide a suite of general, configurable, higher-level data management services l DRS is the first of these services
Relationship to Other Globus Services At requesting site, deploy: l WS-RF Services u Data Replication Service u Delegation Service u Reliable File Transfer Service l Pre WS-RF Components u Replica Location Service (Local Replica Catalog and Replica Location Index) u GridFTP Server
DRS Functionality l Initiate a DRS Request l Discover and select among replicas that act as source locations for data copies l Transfer data to local site to create new replicas l Register new replicas in catalogs
Initiating a DRS Request l Client uses GT4 Delegation Service to create a delegated credential that may be used by other services to act on behalf of user l Client creates a request file containing a replication request description including: u desired logical files u destination URLs l Client sends message to DRS to create the Replicator resource and passes the request file’s URL l Replicator retrieves the request file
Replica Discovery and Selection l Replicator queries the Globus Replica Location Service in a two-step process to discover locations of desired files: u Query local site’s Replica Location Index to find the catalogs at remote sites that contain mappings for the requested files u Query remote Local Replica Catalogs to get the physical file names of the replicas l Replicator selects source file for each file to be copied u Current implementation chooses randomly u A callout is provided for more sophisticated replica selection decisions based on state of Grid
File Transfers to Create New Replicas l The Replicator initiates a reqeust with Globus Reliable File Transfer Service u Creates RFT resource that holds state for each data transfer l Control passes from DRS to RFT, which also retrieves the delegated credential from the Delegation Service l RFT coordinates the file transfers l Transfers are performed by GridFTP servers at the source and destination sites l After transfers complete, the Replicator checks status of each file in the transfer request
Registration of New Replicas l Replicator adds mappings for the newly created replicas to its Globus RLS Local Replica Catalog l Local Replica Catalog updates Replica Location Indexes to make new replicas visible throughout Grid
Performance Measurements: Wide Area Testing l The destination for the pull-based transfers is located in Los Angeles u Dual-processor, 1.1 GHz Pentium III workstation with 1.5 GBytes of memory and a 1 Gbit Ethernet u Runs a GT4 container and deploys services including RFT and DRS as well as GridFTP and RLS l The remote site where desired data files are stored is located at Argonne National Laboratory in Illinois u Dual-processor, 3 GHz Intel Xeon workstation with 2 gigabytes of memory with 1.1 terabytes of disk u Runs a GT4 container as well as GridFTP and RLS services
DRS Operations Measured l Create the DRS Replicator resource l Discover source files for replication using local RLS Replica Location Index and remote RLS Local Replica Catalogs l Initiate an Reliable File Transfer operation by creating an RFT resource l Perform RFT data transfer(s) l Register the new replicas in the RLS Local Replica Catalog
Experiment 1: Replicate 10 Files of Size 10 Gigabytes Component of Operation Time (milliseconds) Create Replicator Resource317.0 Discover Files in RLS Create RFT Resource Transfer Using RFT Register Replicas in RLS l Data transfer time dominates l Wide area data transfer rate of 67.4 Mbits/sec
Experiment 2: Replicate 1000 Files of Size 10 Megabytes Component of Operation Time (milliseconds) Create Replicator Resource Discover Files in RLS 9.8 Create RFT Resource Transfer Using RFT Register Replicas in RLS l Time to create Replicator and RFT resources is larger u Need to store state for 1000 outstanding transfers l Data transfer time still dominates l Wide area data transfer rate of 85 Mbits/sec
Future Work l We will continue performance testing of DRS: u Increasing the size of the files being transferred u Increasing the number of files per DRS request l Add and refine DRS functionality as it is used by applications u E.g., add a push-based replication capability l We plan to develop a suite of general, configurable, composable, high-level data management services