A Replica Location Service The Globus Project USC Information Sciences Institute Argonne National Laboratory
Motivation In a Data Grid, it may be desirable to create remote, read-only copies (replicas) of storage elements (files) To reduce latency of data accesses To increase robustness Need a mechanism for locating replicas Replica Location Problem: Given a unique logical identifier for data content, determine physical locations of one or more copies of that content Replica Location Service: a Data Grid component that maintains and provides access to information about physical locations of copies
A Replica Location Service Framework Applications may operate at different scales, have different resources and different tolerances to inconsistent RLS information We define a flexible RLS framework Allows users to make tradeoffs among: consistency space overhead reliability update costs query costs By different combinations of 5 essential elements, the framework supports a variety of RLS designs
RLS Requirements Support read-only files Mutable files require greater consistency, must use a separate mechanism Scale of Data Grid (e.g., High Energy Physics) 200 replica sites 50 million logical files total 500 million physical files (replicas) total 20 million physical files at a replica site
RLS Requirements (Cont.) Data Grid Performance (e.g., High Energy Physics) Avg. query response time: 10 milliseconds Max. query response time: 5 seconds Max query rates: 10 to 100 per second Max update/insertion rates: 5 to 20 per second
RLS Requirements (cont.) Security Issues: Authorization: Verify that users are allowed to perform requested operations Privacy: Knowledge of existence, location and content of data must be controlled Integrity: Prevent adversary from tampering with replica location results returned from RLS queries RLS: protects information about existence and location of data Individual storage systems: protect privacy and integrity of data contents
RLS Requirements (Cont.) Consistency Relaxed consistency: RLS is not required to maintain strict consistency Strict consistency would require that RLS always returns a complete and accurate list of copies of specified content Difficult or impossible to achieve in a Grid Local sites may delete replicas or become disconnected without warning
RLS Requirements (Cont.) Reliability No single point of failure: No one RLS site, if it fails or becomes inaccessible, can render entire service inoperable Decoupling of local and global state: Failure or inaccessibility of remote RLS components should not affect local access to local replicas Checksums
A Flexible RLS Framework Five essential elements: Reliable Local State Unreliable Global State Soft State mechanisms for maintaining global state Compression of state updates Membership protocol
Example 1: A Centralized, Nonredundant Global Index All updates sent to a centralized GRIN Not scalable: All queries serviced by a single index Not reliable: Single point of failure
Example 2: An RLS with LFN Partitioning, Redundancy and Bloom Filter Compression Updates to specific, redundant GRINs based on LFN More scalable, reliable Limited storage and communication costs
Example 3: An RLS with Redundancy, Compression and Partitioning of Logical Collections Send collection information to GRINs (lossy) Advantage: Partition intelligently based on file contents, creation or access patterns
Example 4: Hierarchical Index with Partitioning, Bloom Compression, Redundancy GRINs can exchange soft state updates Allows large variety of global index configurations