EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka.

Slides:



Advertisements
Similar presentations
Replicated Dictionary and Log
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
1 Chapter 3. Synchronization. STEMPusan National University STEM-PNU 2 Synchronization in Distributed Systems Synchronization in a single machine Same.
Transaction.
Efficient Solutions to the Replicated Log and Dictionary Problems
Chapter 13 (Web): Distributed Databases
Distributed Systems Spring 2009
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
CS 582 / CMPE 481 Distributed Systems
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
Overview Distributed vs. decentralized Why distributed databases
Manajemen Basis Data Pertemuan 10 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
Logical Time and Logical Clocks
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Remote Backup Systems.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
TRANSACTION PROCESSING TECHNIQUES BY SON NGUYEN VIJAY RAO.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Concurrency Control in Distributed Databases. By :- Rishikesh Mandvikar rmandvik[at]engr.smu.edu May 1, 2004.
DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.
Database Design – Lecture 16
A Survey of Rollback-Recovery Protocols in Message-Passing Systems.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Logical Clocks n event ordering, happened-before relation (review) n logical clocks conditions n scalar clocks condition implementation limitation n vector.
Session-8 Data Management for Decision Support
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
1 Concurrency Control II: Locking and Isolation Levels.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Association Rule Mining in Peer-to-Peer Systems Ran Wolff Assaf Shcuster Department of Computer Science Technion I.I.T. Haifa 32000,Isreal.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
“Virtual Time and Global States of Distributed Systems”
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
1 Distributed Databases BUAD/American University Distributed Databases.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
DATABASE REPLICATION DISTRIBUTED DATABASE. O VERVIEW Replication : process of copying and maintaining database object, in multiple database that make.
EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
1 Controlled concurrency Now we start looking at what kind of concurrency we should allow We first look at uncontrolled concurrency and see what happens.
10 1 Chapter 10 - A Transaction Management Database Systems: Design, Implementation, and Management, Rob and Coronel.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Logical Clocks event ordering, happened-before relation (review) logical clocks conditions scalar clocks  condition  implementation  limitation vector.
Distributed Databases
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Remote Backup Systems.
Operating System Reliability
Operating System Reliability
EECS 498 Introduction to Distributed Systems Fall 2017
Operating System Reliability
Operating System Reliability
EEC 688/788 Secure and Dependable Computing
Chapter 10 Transaction Management and Concurrency Control
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Distributed Transactions
Transactions in Distributed Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Remote Backup Systems.
Operating System Reliability
Operating System Reliability
Presentation transcript:

EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka

Overview Need For Data Replication. Consistency Constraints For Replicated Data. Model Of The Distributed Environment. Dictionary And Log Structure. Dictionary Problem. Prior Work. Proposed Solution. Comparison With Other Work. 2DTT Data Structure Improvement. Extending The Proposed Solution. Conclusion.

Need for replicated data? Many applications share data objects. Reliability and fast access are in demand. First step toward a comprehensive disaster recovery plan. Availability of data even when individual node fails. 1/21

Consistency constraints for replicated data…. Serializable transactions ensure correctness of database. Serializable transactions ensure correctness of database. Serial consistency is harder in unreliable distributed system. Serial consistency is harder in unreliable distributed system. Why? Why? -> Availability conflicts with serial consistency. -> Concurrency and serializability are compatible when concurrent transactions access disjoint databases. So, Lower the consistency bar. Lower the consistency bar. Use a weaker consistency constraint with additional information about the distributed transaction. Use a weaker consistency constraint with additional information about the distributed transaction. 2/21

Event Model n1 n2 n5 n3 n6 n4 Send (m,T6,6) receive Non-communication event Local Data Intact! crashed Uses Lamport total ordering and happened-before concept. 3/21

Distributed Dictionary Data Replication needs an efficient data structure ---scalable, available and recoverable. Solution is….. A replicated dictionary using log Dictionary: An abstraction of data object like file directory, a resource management table, an electronic appointment calendar. X Index Delete Insert 1 1 4/21

A Dictionary Snapshot. 5/21

Distributed Log Data Structure: type Event = record op: OperationType; time : TimeType; node : NodeId; end Example: 1.delete, T i, 3. 2.add, T i +4, 6. 6/21

DICTIONARY PROBLEM NOTATION: Each node has a local, fully replicated dictionary copy V i. V(e) = Contents of node where e occurred. X = Dictionary Entry.. C X = Event that inserts X. X-delete event = Event that deletes X. Dictionary Problem Restrictions. R1) X є V(e) iff C X -> e with no X-delete event g, g -> E. R2) Delete(X) can be invoked on N i only if X є V i immediately prior to execution. R3) For each dictionary entry X, there is almost one event, insert(X) in the dictionary. Dictionary Problem: Problem of finding distributed algorithm on n nodes such that each node can do insert/delete/send/receive subjected to restrictions R1,R2 AND R3. T Insert xDelete xe 7/21

Prior Work P1 P2 P3 1) X INSERT X L2 L1 INSERT X INSERT X SENDS WHOLE LOG EXCESSIVE COMMUNICATION 1)Y 2)X USED TO CALCULATE DICTIONARY ENTRY. Y є V(e) iff CY -> e WITH NO X-DELETE EVENT g, g -> E EXCESSIVE CALCULATION ENTIRE LOG STORED EXCESSIVE STORAGE COST. 1) Y 8/21 Dictionary Log

Proposed Solution is… Data Structures Used:  Log Data Structure: 2-D Time Table T i (Remember Matrix Timestamp) Partial Log PL i  Dictionary Data Structure: V i : Set Of Dictionary Entries. 9/21

Algorithm  Initialization: V i = 0; PL i = 0; For all (i,j) T i [i,j] = 0  Insert(X)/ Delete(X): T i [i,i] = Clock i. PL i = PL i U { Op,T i [i,i],i} If Op = Insert(X), V i = V i U {X}. If Op = Delete(X), V i = V i – {X}.  Send(m) To N k : NP = {eR, (eR є PL i ) & ( N i knows that N k doesn’t know about eR with 2DTT = T i at node N i ). SEND TO N k.  Receive(m) From N k : m = NE = Msg to include = those records of which N i isn’t aware of. V i = {V | (V є V i or insertion of V є NE) AND (V hasn’t being deleted from NE ).} Update T i using same concept as matrix timestamp. PL i = {eR, the event belongs to PL i U NE & if at most one node has no info about eR}. 10/21

n2 n1 n3 Insert(X,1,1) T1T3 T log dictionary logdictionary 1 x dictionary log 11/21

n2 n1 n3 Insert (X,1,1) Insert (X, 1,3) Insert(X,1,2) T1T3 T log dictionary log 1 x dictionary log 1 x 12/21

n2 n1 n3 Insert (X,1,1) Insert (X, 1,3) Insert (X,1,2) Insert(Y,2,2) T1T3 T log dictionary log 1 x dictionary log 1 x 2 y 13/21

n2 n1 n3 Insert(X,1,1) Insert(Y,3,1) Insert (X, 1,3) Insert (Y,2,3) Insert(X,1,2) Insert(Y,3,2) T1T3 T log dictionary log 1 x dictionary log 1 x 2 y 14/21

n2 n1 n3 Insert (X,1,1) Insert( Y,3,1) Insert (Y,2,3) Insert (z,3,3) Insert (X,1,2) Insert (Y,3,2) T1T3 T log dictionary log 1 x dictionary log 1 x 2 y 15/21

n2 n1 n3 Insert(X,1,1) Insert(Y,3,1) Insert(Z,4,1) Insert( Y,2,3) Insert (z,4,3) Insert(X,1,2) Insert(Y,3,2) Insert(Z,4,2) T1T3 T log dictionary log 1 x Dictionary log 1 x 2 y 3 z 16/21

n2 n1 n3 Insert(Y,3,1) Insert(Z,4,1) Insert( Y,2,3) Insert (z,4,3) Insert(Y,3,2) Insert(Z,4,2) T1T3 T log dictionary log 1 x dictionary log 1 x 2 y 3 z 17/21 3 z

Comparison with other work Proposed By: Data Structure used: Disadvantage : Fisher and Michael Dictionary data structures. Have to send entire copy of the dictionary in each message. Allchin Synchronization set (SS) and 1-D Time Table. SS ~= Partial Log SS grows unboundedly. Wuu & Bernstein Dictionary, Log and 2-D Time Table 2-DTT of message complexity = O(n 2 ). is sent in every message. 18/21

Improving 2-DTT Message Complexity Strategy Data Structure Stored/Sent. Pros & Cons. 0 Complete 2DTT is stored at the node Complete 2DTT is sent in the message. Message Complexity is as high as O(n 2 ), as one has to send and store n x n matrix. 1 Complete 2DTT is stored at the node. A node sends only its own row in the message. Requires direct messages to update each row. Needs to include more event records. 2 Stores neighbors’ and own rows. Sends corresponding row info. to corresponding neighbor. Can’t determine when all nodes have come to know about an event. Discard event record once all neighbors know about it. 3 Stores all entries (row & column) corresponding to neighbors. Sends row info. thorough the gateway nodes. Better when n/w is large, connectivity and communication are less. Store: O(n 2 ) Send: O(n 2 ) Store: O(n 2 ) Send: O(n) Store: O(nk) Send: O(n) Stores: O(k 2 ) Send:O(k) 19/21

Extending The Proposed Solution…. Replicated Numeric Data: It supports add-to and subtract-from operations, that are commutative. Log/2DTT solution makes sure that no matter what order one does the operation, the answer is consistent. So, result1 = b + a –c; result2 = b – c + a; result1 = result2. Detection Of Failure : To distinguish node failure from communication failure, a log is used to collect records of communication events. Suppose node N 1 has the 2DTT as It knows that no one has received any info from Node 2. So, node 2 might be down. 20/21

Conclusion Mutual consistency of replicated data is achieved. Algorithm works well in an unreliable network. Weaker Consistency Constraint is used. Excessive communication, computation and storage costs are reduced. Remember Replicated Log used to compute others’ views of data. Link failure/Message lost: Get info from other nodes. Node failure: Info stored in log/dictionary that are stable storages. Reduction of comm / storage cost: Partial log sent and stored Reduction of computation cost: Partial entries re-calculated in the dictionary 21/21