Presentation is loading. Please wait.

Presentation is loading. Please wait.

Replica Consistency in a Data Grid1 IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research December 1-5, 2003 High.

Similar presentations


Presentation on theme: "Replica Consistency in a Data Grid1 IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research December 1-5, 2003 High."— Presentation transcript:

1 Replica Consistency in a Data Grid1 IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research December 1-5, 2003 High Energy Accelerator Research Organization (KEK) 1-1 Oho, Tsukuba, Ibaraki 305-0801 Japan Replica Consistency Service in a Data Grid Gianni Pucciani (DIIET/INFN) E-mail: gp.puccio@tin.itgp.puccio@tin.it On behalf of the Consistency Group: Andrea Domenici (DIIET/INFN) Flavia Donno (CERN/INFN), Kathrin Paschen (CERN), Heinz Stockinger (CERN), Kurt Stockinger (CERN)

2 Replica Consistency in a Data Grid2 Overview Data Grid The Replication feature and the need for a Consistency Service Classification of Consistency solutions from literature Our approach and design Simulations of a Consistency Service using OptorSim

3 Replica Consistency in a Data Grid3 The Grid Vision Researchers perform their activities regardless of geographical location, interact with colleagues, share and access data Scientific instruments and experiments provide huge amounts of data The Grid: networked data processing centres and middleware software as the “glue” of resources Grid Computing: flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources (Virtual Organization) [Foster, Kesselman, Tuecke]

4 Replica Consistency in a Data Grid4 The Benefits of Replication Storing more copies (replicas) of the same data at multiple locations to increase: Performance: A single data source can be a bottleneck Data as close to the user as possible Availability and fault tolerance: in case of a server or network failure other replicas of the data item can provide the same information

5 Replica Consistency in a Data Grid5 The Costs Of Replication Data localization: we need mechanism to correctly identifying replicas: file naming convention and replica catalog Data consistency: if replicas can be modified by users, we need mechanisms to synchronize them, a consistency service

6 Replica Consistency in a Data Grid6 The Replica Consistency Problem A lot of literature in the field of DBMS, Distributed FileSystems, Distributed Applications, Directory Services… The Grid is different because of: Variety of applications Heterogeneous data Very high number of files (500Mil replicas!) Scalabilty problems (both users and resources) Highly dynamic resources

7 Replica Consistency in a Data Grid7 Classification Of Consistency Solutions Eager replication vs Lazy replication Taxonomy of Lazy replication [Y.Saito ] Where can an update be issued? single-Master vs multi- master What is transfered as an update? content-transfer vs log- transfer Who transfers an update? pull-based vs push-based Consistency guarantees: Eventual Consistency View Consistency (causal consistency, bounded inconsistency)

8 Replica Consistency in a Data Grid8 General Design Principles Provide a high level user interface Provide several consistency models that the user can choose Integrate as much as possible with existing Grid services like the Replica Manager

9 Replica Consistency in a Data Grid9 Functionalities Required Client Interface: provides basic operations invoked by users and services File update mechanism Update propagation protocol

10 Replica Consistency in a Data Grid10 Interaction With The Consistency Service Write operations on a file require: Obtaining a local working copy of a replica Modifying the working copy Telling the Consistency Service to update the logical file: file update and update propagation Read operation may not require consistency services depending on the protocol used (are stale reads acceptable?)

11 Replica Consistency in a Data Grid11 Basic Architecture UI as the main entry point LCSs interact with each other and with the UI to implement the consistency protocol

12 Replica Consistency in a Data Grid12 Simulation Of The RCS (1) OptorSim the Grid Simulator (http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.html)http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.html

13 Replica Consistency in a Data Grid13 Simulation Of The RCS (2) “Synchronous” protocol: write scenario User jobRMRCSLRCS1LRCS2 Get a working copy Modify it Update the file Find all replicas updateReplica ok

14 Replica Consistency in a Data Grid14 Simulation Of The RCS (3) Asynchronous protocol: write scenario User jobRMRCS LRCS Master LRCS slave Get a working copy Modify it Update the file Find the master Update master Update sec replica Ok, master updated Update sec replica

15 Replica Consistency in a Data Grid15 Simulation Of The RCS (4) Execution flow of a job Select the next file to access r/w? Get best file w Write the file Send update command to the RCS get result cont. Get best file Read the file Wait r

16 Replica Consistency in a Data Grid16 Simulation results Conflict rate of update operations Stale reads rate of simple read operations Avg value 12,9% -25,3% Avg value 0,9% - 2,8%

17 Replica Consistency in a Data Grid17 Conclusions Consistency is necessary in applications where users can modify replicas The benefits of Simulation : Help to point out some problems Evaluation of possible solutions and their impact on the system A real Consistency Service is feasible, although a close interaction with end-users and applications is advisable


Download ppt "Replica Consistency in a Data Grid1 IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research December 1-5, 2003 High."

Similar presentations


Ads by Google