Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of GT4 Data Services. Globus Data Services Talk Outline Summarize capabilities and plans for data services in the Globus Toolkit Version 4.0.2.

Similar presentations


Presentation on theme: "Overview of GT4 Data Services. Globus Data Services Talk Outline Summarize capabilities and plans for data services in the Globus Toolkit Version 4.0.2."— Presentation transcript:

1 Overview of GT4 Data Services

2 Globus Data Services Talk Outline Summarize capabilities and plans for data services in the Globus Toolkit Version 4.0.2 l Extensible IO (XIO) system u Flexible framework for I/O l GridFTP u Fast, secure data transport l The Reliable File Transfer Service (RFT) u Data movement services for GT4 l The Replica Location Service (RLS) u Distributed registry that records locations of data copies l The Data Replication Service (DRS) u Integrates RFT and RLS to replicate and register files

3 The eXtensible Input / Output (XIO) System GridFTP The Reliable File Transfer Service (RFT) Bill Allcock, ANL

4 Technology Drivers l Internet revolution: 100M+ hosts u Collaboration & sharing the norm l Universal Moore’s law: x10 3 /10 yrs u Sensors as well as computers l Petascale data tsunami u Gating step is analysis l & our old infrastructure? 114 genomes 735 in progress You are here Slide courtesy of Ian Foster

5 Extensible IO (XIO) system l Provides a framework that implements a Read/Write/Open/Close Abstraction l Drivers are written that implement the functionality (file, TCP, UDP, GSI, etc.) l Different functionality is achieved by building protocol stacks l GridFTP drivers will allow 3 rd party applications to easily access files stored under a GridFTP server l Other drivers could be written to allow access to other data stores. l Changing drivers requires minimal change to the application code.

6 Globus XIO Framework l Moves the data from user to driver stack. l Manages the interactions between drivers. l Assist in the creation of drivers. u Internal API for passing operations down the stack User API Framework Driver Stack Transform Transport

7 GridFTP l A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol l GridFTP Protocol u Defined through the Global/Open Grid Forum u Multiple Independent implementations can interoperate l The Globus Toolkit supplies a reference implementation: u Server u Client tools (globus-url-copy) u Development Libraries

8 GridFTP Protocol l FTP protocol is defined by several IETF RFCs l Start with most commonly used subset u Standard FTP: get/put operations, 3 rd -party transfer l Implement standard but often unused features u GSS binding, extended directory listing, simple restart l Extend in various ways, while preserving interoperability with existing servers u Parallel data channels u Striped transfers u Partial file transfers u Automatic & manual TCP buffer setting u Progress monitoring u Extended restart

9 GridFTP Protocol (cont.) l Existing standards u RFC 959: File Transfer Protocol u RFC 2228: FTP Security Extensions u RFC 2389: Feature Negotiation for the File Transfer Protocol u Draft: FTP Extensions u GridFTP: Protocol Extensions to FTP for the Grid l Grid Forum Recommendation l GFD.20 l http://www.ggf.org/documents/GWD-R/GFD-R.020.pdf

10 GT4 GridFTP Implementation l 100% Globus code. No licensing issues. l Striping support is provided in 4.0 l Has IPV6 support included Based on XIO l Extremely modular to allow integration with a variety of data sources (files, mass stores, etc.) u Storage Resource Broker (SRB) DSI – Allows use of GridFTP to access data stored in SRB systems u High Performance Storage System (HPSS) DSI – Provides GridFTP access to hierarchical mass storage systems (HPSS 6.2) that include tape storage

11 ? Clients Data Storage Interfaces (DSI) -POSIX -SRB -HPSS -NEST GridFTP Server Separate control, data Striping, fault tolerance Metrics collection Access control XIO Drivers -TCP -UDT (UDP) -Parallel streams -GSI -SSH Client Interfaces Globus-URL-Copy C Library RFT (3 rd party) File Systems I/O Network GridFTP www.gridftp.org

12 Control and Data Channels Control Data Typical Installation Control Data Separate Processes Striped Server Control Data l GridFTP (and FTP) use (at least) two separate socket connections: u A control channel for carrying the commands and responses u A data Channel for actually moving the data l Control Channel and Data Channel can be (optionally) completely separate processes.

13 Parallel and Striped GridFTP Transfers l A distributed GridFTP service that typically runs on a storage cluster u Every node of the cluster is used to transfer data into/out of the cluster u Head node coordinates transfers l Multiple NICs/internal busses lead to very high performance u Maximizes use of Gbit+ WANs Parallel Transfer Fully utilizes bandwidth of network interface on single nodes. Striped Transfer Fully utilizes bandwidth of Gb+ WAN using multiple nodes. Parallel Filesystem

14 Striped Server Mode l Multiple nodes work together *on a single file* and act as a single GridFTP server l An underlying parallel file system allows all nodes to see the same file system and must deliver good performance (usually the limiting factor in transfer speed) u I.e., NFS does not cut it l Each node then moves (reads or writes) only the pieces of the file that it is responsible for. l This allows multiple levels of parallelism, CPU, bus, NIC, disk, etc. u Critical if you want to achieve better than 1 Gbs without breaking the bank

15 TeraGrid Striping results l Ran varying number of stripes l Ran both memory to memory and disk to disk. l Memory to Memory gave extremely good (nearly 1:1) linear scalability. l We achieved 27 Gbs on a 30 Gbs link (90% utilization) with 32 nodes. l Disk to disk we were limited by the storage system, but still achieved 17.5 Gbs

16 Memory to Memory over 30 Gigabit/s Network (San Diego — Urbana) 30 Gb/s 20 Gb/s 10 Gb/s Striping

17 Disk to Disk over 30 Gigabit/s Network (San Diego — Urbana) 20 Gb/s 10 Gb/s Striping

18 Scalability Results

19 Lots Of Small Files (LOSF) l Pipelining u Many transfer requests outstanding at once u Client sends second request before the first completes u Latency of request is hidden in data transfer time l Cached Data channel connections u Reuse established data channels (Mode E) u No additional TCP or GSI connect overhead

20 “Lots of Small Files” (LOSF) Optimization Send 1 GB partitioned into equi-sized files over 60 ms RTT, 1 Gbit/s WAN Megabit/sec File size (Kbyte) (16MB TCP buffer) Number of files John Bresnahan et al., Argonne

21 Reliable File Transfer Service (RFT) l Service that accepts requests for third-party file transfers l Maintains state in a DB about ongoing transfers l Recovers from RFT service failures l Increased reliability because state is stored in a database. l Service interface u The client can submit the transfer request and then disconnect and go away u Similar to a job scheduler for transfer job l Two ways to check status u Subscribe for notifications u Poll for status (can check for missed notifications)

22 Reliable File Transfer (cont.) l RFT accepts a SOAP description of the desired transfer l It writes this to a database l It then uses the Java GridFTP client library to initiate 3 rd part transfers on behalf of the requestor l Restart Markers are stored in the database to allow for restart in the event of an RFT failure l Supports concurrency, i.e., multiple files in transit at the same time u This gives good performance on many small files

23 Reliable File Transfer: Third Party Transfer RFT Service RFT Client SOAP Messages Notifications (Optional) Data Channel Protocol Interpreter Master DSI Data Channel Slave DSI IPC Receiver IPC Link Master DSI Protocol Interpreter Data Channel IPC Receiver Slave DSI Data Channel IPC Link GridFTP Server l Fire-and-forget transfer l Web services interface l Many files & directories l Integrated failure recovery l Has transferred 900K files

24 Data Transfer Comparison Control Data Control Data Control Data Control Data globus-url-copyRFT Service RFT Client SOAP Messages Notifications (Optional)

25 Globus Replica Location Service

26 Replica Management in Grids Data intensive applications produce terabytes or petabytes of data stored as of millions of data objects Replicate data at multiple locations for reasons of: l Fault tolerance: u Avoid single points of failure l Performance u Avoid wide area data transfer latencies u Achieve load balancing l Need tools for: u Registering the existence of data items, discovering them u Replicating data items to new locations

27 A Replica Location Service A Replica Location Service (RLS) is a distributed registry that records the locations of data copies and allows replica discovery u Must perform and scale well: support hundreds of millions of objects, hundreds of clients l E.g., LIGO (Laser Interferometer Gravitational Wave Observatory) Project u RLS servers at 8 sites u Maintain associations between 3 million logical file names & 30 million physical file locations l RLS is one component of a Replica Management system u Other components include consistency services, replica selection services, reliable data transfer, etc.

28 A Replica Location Service l A Replica Location Service (RLS) is a distributed registry that records the locations of data copies and allows discovery of replicas l RLS maintains mappings between logical identifiers and target names l An RLS framework was designed in a collaboration between the Globus project and the DataGrid project (SC2002 paper)

29 LRC RLI LRC Replica Location Indexes Local Replica Catalogs Replica Location Index (RLI) nodes aggregate information about one or more LRCs LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index Optional compression of state updates reduces communication, CPU and storage overheads Membership service registers participating LRCs and RLIs and deals with changes in membership RLS Framework Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings

30 Replica Location Service In Context l The Replica Location Service is one component in a layered data management architecture l Provides a simple, distributed registry of mappings l Consistency management provided by higher-level services

31 Components of RLS Implementation l Common server implementation for LRC and RLI l Front-End Server u Multi-threaded u Written in C u Supports GSI Authentication using X.509 certificates l Back-end Server u MySQL, PostgreSQL, Oracle Relational Database u Embedded SQLite DB l Client APIs: C, Java, Python l Client Command line tool

32 RLS Implementation Features l Two types of soft state updates from LRCs to RLIs u Complete list of logical names registered in LRC u Compressed updates: Bloom filter summaries of LRC l Immediate mode u Incremental updates l User-defined attributes u May be associated with logical or target names l Partitioning (without bloom filters) u Divide LRC soft state updates among RLI index nodes using pattern matching of logical names l Currently, static membership configuration only u No membership service

33 Alternatives for Soft State Update Configuration l LFN List u Send list of Logical Names stored on LRC u Can do exact and wildcard searches on RLI u Soft state updates get increasingly expensive as number of LRC entries increases l space, network transfer time, CPU time on RLI u E.g., with 1 million entries, takes 20 minutes to update mySQL on dual-processor 2 GHz machine (CPU-limited) l Bloom filters u Construct a summary of LRC state by hashing logical names, creating a bitmap u Compression u Updates much smaller, faster u Supports higher query rate u Small probability of false positives (lossy compression) u Lose ability to do wildcard queries

34 Immediate Mode for Soft State Updates l Immediate Mode u Send updates after 30 seconds (configurable) or after fixed number (100 default) of updates u Full updates are sent at a reduced rate u Tradeoff depends on volatility of data/frequency of updates u Immediate mode updates RLI quickly, reduces period of inconsistency between LRC and RLI content l Immediate mode usually sends less data u Because of less frequent full updates l Usually advantageous u An exception would be initially loading of large database

35 Performance Testing l Extensive performance testing reported in HPDC 2004 paper l Performance of individual LRC (catalog) or RLI (index) servers u Client program submits operation requests to server l Performance of soft state updates u Client LRC catalogs sends updates to index servers Software Versions: u Replica Location Service Version 2.0.9 u Globus Packaging Toolkit Version 2.2.5 u libiODBC library Version 3.0.5 u MySQL database Version 4.0.14 u MyODBC library (with MySQL) Version 3.51.06

36 Testing Environment l Local Area Network Tests u 100 Megabit Ethernet u Clients (either client program or LRCs) on cluster: dual Pentium-III 547 MHz workstations with 1.5 Gigabytes of memory running Red Hat Linux 9 u Server: dual Intel Xeon 2.2 GHz processor with 1 Gigabyte of memory running Red Hat Linux 7.3 l Wide Area Network Tests (Soft state updates) u LRC clients (Los Angeles): cluster nodes u RLI server (Chicago): dual Intel Xeon 2.2 GHz machine with 2 gigabytes of memory running Red Hat Linux 7.3

37 LRC Operation Rates (MySQL Backend) Up to 100 total requesting threads Clients and server on LAN Query: request the target of a logical name Add: register a new mapping Delete a mapping

38 Comparison of LRC to Native MySQL Performance LRC Overheads Highest for queries: LRC achieve 70-80% of native rates Adds and deletes: ~90% of native performance for 1 client (10 threads) Similar or better add and delete performance with 10 clients (100 threads)

39 Bulk Operation Performance l For user convenience, server supports bulk operations l E.g., 1000 operations per request l Combine adds/deletes to maintain approx. constant DB size l For small number of clients, bulk operations increase rates l E.g., 1 client (10 threads) performs 27% more queries, 7% more adds/deletes

40 Uncompressed Soft State Updates Perform poorly when multiple LRCs update RLI E.g., 6 LRCs with 1 million entries updating RLI, average update ~5102 seconds in Local Area Limiting factor: rate of updates to an RLI database Advisable to use incremental updates

41 Bloom Filter Compression l Construct a summary of each LRC’s state by hashing logical names, creating a bitmap l RLI stores in memory one bitmap per LRC Advantages: l Updates much smaller, faster l Supports higher query rate u Satisfied from memory rather than database Disadvantages: l Lose ability to do wildcard queries, since not sending logical names to RLI l Small probability of false positives (configurable) u Relaxed consistency model

42 Bloom Filter Performance: Single Wide Area Soft State Update (Los Angeles to Chicago) LRC Database Size Avg. time to send soft state update (seconds) Avg. time for initial bloom filter computation (seconds) Size of bloom filter (bits) 100,000 entries Less than 121 million 1 million entries 1.6718.410 million 5 million entries 6.891.650 million

43 Scalability of Bloom Filter Updates l 14 LRCs with 5 million mappings send Bloom filter updates continuously in Wide Area (unlikely, represents worst case) l Update times increase when 8 or more clients send updates l 2 to 3 orders of magnitude better performance than uncompressed (e.g., 5102 seconds with 6 LRCs)

44 Bloom Filter Compression Supports Higher RLI Query Rates Uncompressed updates: about 3000 queries per second Higher rates with Bloom filter compression Scalability limit: significant overhead to check 100 bit maps Practical deployments: <10 LRCs updating an RLI

45 RLS Performance Summary Individual RLS servers perform well and scale up to u Millions of entries u One hundred requesting threads l Soft state updates of the distributed index scale well when using Bloom filter compression l Uncompressed updates slow as size of catalog grows u Immediate mode is advisable

46 Current Work l Ongoing maintenance and improvements to RLS u RLS is a stable component u Good performance and scalability u No major changes to existing interfaces l Recently added features: u WS-RLS: WS-RF compatible web services interface to existing RLS service u Embedded SQLite database for easier RLS deployment u Pure Java client implementation completed

47 Wide Area Data Replication for Scientific Collaborations Ann Chervenak, Robert Schuler, Carl Kesselman USC Information Sciences Institute Scott Koranda Univa Corporation Brian Moe University of Wisconsin Milwaukee

48 Motivation l Scientific application domains spend considerable effort managing large amounts of experimental and simulation data l Have developed customized, higher-level Grid data management services l Examples: u Laser Interferometer Gravitational Wave Observatory (LIGO) Lightweight Data Replicator System u High Energy Physics projects: EGEE system, gLite, LHC Computing Grid (LCG) middleware u Portal-based coordination of services (E.g., Earth System Grid)

49 Motivation (cont.) l Data management functionality varies by application l Share several requirements: u Publish and replicate large datasets (millions of files) u Register data replicas in catalogs and discover them u Perform metadata-based discovery of datasets u May require ability to validate correctness of replicas u In general, data updates and replica consistency services not required (i.e., read-only accesses) l Systems provide production data management services to individual scientific domains l Each project spends considerable resources to design, implement & maintain data management system u Typically cannot be re-used by other applications

50 Motivation (cont.) l Long-term goals: u Generalize functionality provided by these data management systems u Provide suite of application-independent services l Paper describes one higher-level data management service: the Data Replication Service (DRS) l DRS functionality based on publication capability of the LIGO Lightweight Data Replicator (LDR) system l Ensures that a set of files exists on a storage site u Replicates files as needed, registers them in catalogs l DRS builds on lower-level Grid services, including: u Globus Reliable File Transfer (RFT) service u Replica Location Service (RLS)

51 Outline l Description of LDR data publication capability l Generalization of this functionality u Define characteristics of an application-independent Data Replication Service (DRS) l DRS Design l DRS Implementation in GT4 environment l Evaluation of DRS performance in a wide area Grid l Related work l Future work

52 A Data-Intensive Application Example: The LIGO Project l Laser Interferometer Gravitational Wave Observatory (LIGO) collaboration l Seeks to measure gravitational waves predicted by Einstein l Collects experimental datasets at two LIGO instrument sites in Louisiana and Washington State l Datasets are replicated at other LIGO sites l Scientists analyze the data and publish their results, which may be replicated l Currently LIGO stores more than 40 million files across ten locations

53 The Lightweight Data Replicator l LIGO scientists developed the Lightweight Data Replicator (LDR) System for data management l Built on top of standard Grid data services: u Globus Replica Location Service u GridFTP data transport protocol l LDR provides a rich set of data management functionality, including u a pull-based model for replicating necessary files to a LIGO site u efficient data transfer among LIGO sites u a distributed metadata service architecture u an interface to local storage systems u a validation component that verifies that files on a storage system are correctly registered in a local RLS catalog

54 LIGO Data Publication and Replication Two types of data publishing 1. Detectors at Livingston and Hanford produce data sets u Approx. a terabyte per day during LIGO experimental runs u Each detector produces a file every 16 seconds u Files range in size from 1 to 100 megabytes u Data sets are copied to main repository at CalTech, which stores them in tape-based mass storage system u LIGO sites can acquire copies from CalTech or one another 2. Scientists also publish new or derived data sets as they perform analysis on existing data sets u E.g., data filtering or calibration may create new files u These new files may also be replicated at LIGO sites

55 Some Terminology l A logical file name (LFN) is a unique identifier for the contents of a file u Typically, a scientific collaboration defines and manages the logical namespace u Guarantees uniqueness of logical names within that organization l A physical file name (PFN) is the location of a copy of the file on a storage system. u The physical namespace is managed by the file system or storage system l The LIGO environment currently contains: u More than 25 million unique logical files u More than 145 million physical files stored at ten sites

56 Components at Each LDR Site l Local storage system l GridFTP server for file transfer l Metadata Catalog: associations between logical file names and metadata attributes l Replica Location Service: u Local Replica Catalog (LRCs) stores mappings from logical names to storage locations u Replica Location Index (RLI) collects state summaries from LRCs l Scheduler and transfer daemons l Prioritized queue of requested files

57 LDR Data Publishing l Scheduling daemon runs at each LDR site u Queries site’s metadata catalog to identify logical files with specified metadata attributes u Checks RLS Local Replica Catalog to determine whether copies of those files already exist locally u If not, puts logical file names on priority-based scheduling queue l Transfer daemon also runs at each site u Checks queue and initiates data transfers in priority order u Queries RLS Replica Location Index to find sites where desired files exists u Randomly selects source file from among available replicas u Use GridFTP transport protocol to transfer file to local site u Registers newly-copied file in RLS Local Replica Catalog

58 Generalizing the LDR Publication Scheme l Want to provide a similar capability that is u Independent of LIGO infrastructure u Useful for a variety of application domains l Capabilities include: u Interface to specify which files are required at local site u Use of Globus RLS to discover whether replicas exist locally and where they exist in the Grid u Use of a selection algorithm to choose among available replicas u Use of Globus Reliable File Transfer service and GridFTP data transport protocol to copy data to local site u Use of Globus RLS to register new replicas

59 Relationship to Other Globus Services At requesting site, deploy: l WS-RF Services u Data Replication Service u Delegation Service u Reliable File Transfer Service l Pre WS-RF Components u Replica Location Service (Local Replica Catalog, Replica Location Index) u GridFTP Server

60 DRS Functionality l Initiate a DRS Request l Create a delegated credential l Create a Replicator resource l Monitor Replicator resource l Discover replicas of desired files in RLS, select among replicas l Transfer data to local site with Reliable File Transfer Service l Register new replicas in RLS catalogs l Allow client inspection of DRS results l Destroy Replicator resource DRS implemented in Globus Toolkit Version 4, complies with Web Services Resource Framework (WS-RF)

61 WSRF in a Nutshell l Service l State Management: u Resource u Resource Property l State Identification: u Endpoint Reference l State Interfaces: u GetRP, QueryRPs, GetMultipleRPs, SetRP l Lifetime Interfaces: u SetTerminationTime u ImmediateDestruction l Notification Interfaces u Subscribe u Notify l ServiceGroups RPs Resource Service GetRP GetMultRPs SetRP QueryRPs Subscribe SetTermTime Destroy EPR

62 Service Container Create Delegated Credential Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP proxy Initialize user proxy cert. Create delegated credential resource Set termination time Credential EPR returned EPR

63 Service Container Create Replicator Resource Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Create Replicator resource Pass delegated credential EPR Set termination time Replicator EPR returned EPR Replicator RP Access delegated credential resource

64 Service Container Monitor Replicator Resource Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Periodically polls Replicator RP via GetRP or GetMultRP Add Replicator resource to MDS Information service Index Index RP Subscribe to ResourceProperty changes for “Status” RP and “Stage” RP Conditions may trigger alerts or other actions (Trigger service not pictured) EPR

65 Service Container Query Replica Information Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Stage” RP value changed to “discover” Replicator queries RLS Replica Index to find catalogs that contain desired replica information Replicator queries RLS Replica Catalog(s) to retrieve mappings from logical name to target name (URL)

66 Service Container Transfer Data Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Stage” RP value changed to “transfer” Create Transfer resource Pass credential EPR Set Termination Time Transfer resource EPR returned Transfer RP EPR Access delegated credential resource Setup GridFTP Server transfer of file(s) Data transfer between GridFTP Server sites Periodically poll “ResultStatus” RP via GetRP When “Done”, get state information for each file transfer

67 Service Container Register Replica Information Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Stage” RP value changed to “register” RLS Replica Catalog sends update of new replica mappings to the Replica Index Transfer RP Replicator registers new file mappings in RLS Replica Catalog

68 Service Container Client Inspection of State Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Status” RP value changed to “Finished” Transfer RP Client inspects Replicator state information for each replication in the request

69 Service Container Resource Termination Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Termination time (set by client) expires eventually Transfer RP Resources destroyed (Credential, Transfer, Replicator) TIME

70 Performance Measurements: Wide Area Testing l The destination for the pull-based transfers is located in Los Angeles u Dual-processor, 1.1 GHz Pentium III workstation with 1.5 GBytes of memory and a 1 Gbit Ethernet u Runs a GT4 container and deploys services including RFT and DRS as well as GridFTP and RLS l The remote site where desired data files are stored is located at Argonne National Laboratory in Illinois u Dual-processor, 3 GHz Intel Xeon workstation with 2 gigabytes of memory with 1.1 terabytes of disk u Runs a GT4 container as well as GridFTP and RLS services

71 DRS Operations Measured l Create the DRS Replicator resource l Discover source files for replication using local RLS Replica Location Index and remote RLS Local Replica Catalogs l Initiate an Reliable File Transfer operation by creating an RFT resource l Perform RFT data transfer(s) l Register the new replicas in the RLS Local Replica Catalog

72 Experiment 1: Replicate 10 Files of Size 1 Gigabyte Component of Operation Time (milliseconds) Create Replicator Resource317.0 Discover Files in RLS 449.0 Create RFT Resource 808.6 Transfer Using RFT 1186796.0 Register Replicas in RLS 3720.8 l Data transfer time dominates l Wide area data transfer rate of 67.4 Mbits/sec

73 Experiment 2: Replicate 1000 Files of Size 10 Megabytes Component of Operation Time (milliseconds) Create Replicator Resource1561.0 Discover Files in RLS 9.8 Create RFT Resource 1286.6 Transfer Using RFT 963456.0 Register Replicas in RLS 11278.2 l Time to create Replicator and RFT resources is larger u Need to store state for 1000 outstanding transfers l Data transfer time still dominates l Wide area data transfer rate of 85 Mbits/sec


Download ppt "Overview of GT4 Data Services. Globus Data Services Talk Outline Summarize capabilities and plans for data services in the Globus Toolkit Version 4.0.2."

Similar presentations


Ads by Google