Replica Consistency in a Data Grid1 IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research December 1-5, 2003 High.

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Database Architectures and the Web
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 7: Consistency 4/13/20151Distributed Systems - COMP 655.
Reliability on Web Services Presented by Pat Chan 17/10/2005.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems
Overview Distributed vs. decentralized Why distributed databases
Distributed Systems Fall 2011 Gossip and highly available services.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Chapter 12 Distributed Database Management Systems
Distributed Databases
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Consistency And Replication
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.
Heterogeneous Database Replication Gianni Pucciani LCG Database Deployment and Persistency Workshop CERN October 2005 A.Domenici
Distributed Systems and Algorithms Sukumar Ghosh University of Iowa Spring 2011.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
INFNGrid Constanza Project: Status Report A.Domenici, F.Donno, L.Iannone, G.Pucciani, H.Stockinger CNAF, 6 December 2004 WP3-WP5 FIRB meeting.
Practical Byzantine Fault Tolerance
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Architectural Design of Distributed Applications Chapter 13 Part of Design Analysis Designing Concurrent, Distributed, and Real-Time Applications with.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
From Digital Objects to Content across eInfrastructures Content and Storage Management in gCube Pasquale Pagano CNR –ISTI on behalf of Heiko Schuldt Dept.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
7. Grid Computing Systems and Resource Management
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
1 VLDB - Data Management in Grids B. Del-Fabbro, D. Laiymani, J.M. Nicod and L. Philippe Laboratoire d’Informatique de l’Université de Franche-Comté Séoul,
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Oracle to MySQL synchronization Gianni Pucciani CERN, University of Pisa.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
1 CEG 2400 Fall 2012 eDirectory – Directory Service.
Mobile Analyzer A Distributed Computing Platform Juho Karppinen Helsinki Institute of Physics Technology Program May 23th, 2002 Mobile.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Distributed Systems – Paxos
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Lecturer : Dr. Pavle Mogin
6.4 Data and File Replication
University of Technology
Replication Middleware for Cloud Based Storage Service
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Consistency and Replication
Architectures of distributed systems Fundamental Models
Architectures of distributed systems Fundamental Models
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Distributed Systems CS
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Architectures of distributed systems Fundamental Models
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Replica Consistency in a Data Grid1 IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research December 1-5, 2003 High Energy Accelerator Research Organization (KEK) 1-1 Oho, Tsukuba, Ibaraki Japan Replica Consistency Service in a Data Grid Gianni Pucciani (DIIET/INFN) On behalf of the Consistency Group: Andrea Domenici (DIIET/INFN) Flavia Donno (CERN/INFN), Kathrin Paschen (CERN), Heinz Stockinger (CERN), Kurt Stockinger (CERN)

Replica Consistency in a Data Grid2 Overview Data Grid The Replication feature and the need for a Consistency Service Classification of Consistency solutions from literature Our approach and design Simulations of a Consistency Service using OptorSim

Replica Consistency in a Data Grid3 The Grid Vision Researchers perform their activities regardless of geographical location, interact with colleagues, share and access data Scientific instruments and experiments provide huge amounts of data The Grid: networked data processing centres and middleware software as the “glue” of resources Grid Computing: flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources (Virtual Organization) [Foster, Kesselman, Tuecke]

Replica Consistency in a Data Grid4 The Benefits of Replication Storing more copies (replicas) of the same data at multiple locations to increase: Performance: A single data source can be a bottleneck Data as close to the user as possible Availability and fault tolerance: in case of a server or network failure other replicas of the data item can provide the same information

Replica Consistency in a Data Grid5 The Costs Of Replication Data localization: we need mechanism to correctly identifying replicas: file naming convention and replica catalog Data consistency: if replicas can be modified by users, we need mechanisms to synchronize them, a consistency service

Replica Consistency in a Data Grid6 The Replica Consistency Problem A lot of literature in the field of DBMS, Distributed FileSystems, Distributed Applications, Directory Services… The Grid is different because of: Variety of applications Heterogeneous data Very high number of files (500Mil replicas!) Scalabilty problems (both users and resources) Highly dynamic resources

Replica Consistency in a Data Grid7 Classification Of Consistency Solutions Eager replication vs Lazy replication Taxonomy of Lazy replication [Y.Saito ] Where can an update be issued? single-Master vs multi- master What is transfered as an update? content-transfer vs log- transfer Who transfers an update? pull-based vs push-based Consistency guarantees: Eventual Consistency View Consistency (causal consistency, bounded inconsistency)

Replica Consistency in a Data Grid8 General Design Principles Provide a high level user interface Provide several consistency models that the user can choose Integrate as much as possible with existing Grid services like the Replica Manager

Replica Consistency in a Data Grid9 Functionalities Required Client Interface: provides basic operations invoked by users and services File update mechanism Update propagation protocol

Replica Consistency in a Data Grid10 Interaction With The Consistency Service Write operations on a file require: Obtaining a local working copy of a replica Modifying the working copy Telling the Consistency Service to update the logical file: file update and update propagation Read operation may not require consistency services depending on the protocol used (are stale reads acceptable?)

Replica Consistency in a Data Grid11 Basic Architecture UI as the main entry point LCSs interact with each other and with the UI to implement the consistency protocol

Replica Consistency in a Data Grid12 Simulation Of The RCS (1) OptorSim the Grid Simulator (

Replica Consistency in a Data Grid13 Simulation Of The RCS (2) “Synchronous” protocol: write scenario User jobRMRCSLRCS1LRCS2 Get a working copy Modify it Update the file Find all replicas updateReplica ok

Replica Consistency in a Data Grid14 Simulation Of The RCS (3) Asynchronous protocol: write scenario User jobRMRCS LRCS Master LRCS slave Get a working copy Modify it Update the file Find the master Update master Update sec replica Ok, master updated Update sec replica

Replica Consistency in a Data Grid15 Simulation Of The RCS (4) Execution flow of a job Select the next file to access r/w? Get best file w Write the file Send update command to the RCS get result cont. Get best file Read the file Wait r

Replica Consistency in a Data Grid16 Simulation results Conflict rate of update operations Stale reads rate of simple read operations Avg value 12,9% -25,3% Avg value 0,9% - 2,8%

Replica Consistency in a Data Grid17 Conclusions Consistency is necessary in applications where users can modify replicas The benefits of Simulation : Help to point out some problems Evaluation of possible solutions and their impact on the system A real Consistency Service is feasible, although a close interaction with end-users and applications is advisable