Download presentation
Presentation is loading. Please wait.
Published byNora May Modified over 9 years ago
1
EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka
2
Overview Need For Data Replication. Consistency Constraints For Replicated Data. Model Of The Distributed Environment. Dictionary And Log Structure. Dictionary Problem. Prior Work. Proposed Solution. Comparison With Other Work. 2DTT Data Structure Improvement. Extending The Proposed Solution. Conclusion.
3
Need for replicated data? Many applications share data objects. Reliability and fast access are in demand. First step toward a comprehensive disaster recovery plan. Availability of data even when individual node fails. 1/21
4
Consistency constraints for replicated data…. Serializable transactions ensure correctness of database. Serializable transactions ensure correctness of database. Serial consistency is harder in unreliable distributed system. Serial consistency is harder in unreliable distributed system. Why? Why? -> Availability conflicts with serial consistency. -> Concurrency and serializability are compatible when concurrent transactions access disjoint databases. So, Lower the consistency bar. Lower the consistency bar. Use a weaker consistency constraint with additional information about the distributed transaction. Use a weaker consistency constraint with additional information about the distributed transaction. 2/21
5
Event Model n1 n2 n5 n3 n6 n4 Send (m,T6,6) receive Non-communication event Local Data Intact! crashed Uses Lamport total ordering and happened-before concept. 3/21
6
Distributed Dictionary Data Replication needs an efficient data structure ---scalable, available and recoverable. Solution is….. A replicated dictionary using log Dictionary: An abstraction of data object like file directory, a resource management table, an electronic appointment calendar. X Index Delete Insert 1 1 4/21
7
A Dictionary Snapshot. 5/21
8
Distributed Log Data Structure: type Event = record op: OperationType; time : TimeType; node : NodeId; end Example: 1.delete, T i, 3. 2.add, T i +4, 6. 6/21
9
DICTIONARY PROBLEM NOTATION: Each node has a local, fully replicated dictionary copy V i. V(e) = Contents of node where e occurred. X = Dictionary Entry.. C X = Event that inserts X. X-delete event = Event that deletes X. Dictionary Problem Restrictions. R1) X є V(e) iff C X -> e with no X-delete event g, g -> E. R2) Delete(X) can be invoked on N i only if X є V i immediately prior to execution. R3) For each dictionary entry X, there is almost one event, insert(X) in the dictionary. Dictionary Problem: Problem of finding distributed algorithm on n nodes such that each node can do insert/delete/send/receive subjected to restrictions R1,R2 AND R3. T Insert xDelete xe 7/21
10
Prior Work P1 P2 P3 1) X INSERT X L2 L1 INSERT X INSERT X SENDS WHOLE LOG EXCESSIVE COMMUNICATION 1)Y 2)X USED TO CALCULATE DICTIONARY ENTRY. Y є V(e) iff CY -> e WITH NO X-DELETE EVENT g, g -> E EXCESSIVE CALCULATION ENTIRE LOG STORED EXCESSIVE STORAGE COST. 1) Y 8/21 Dictionary Log
11
Proposed Solution is… Data Structures Used: Log Data Structure: 2-D Time Table T i (Remember Matrix Timestamp) Partial Log PL i Dictionary Data Structure: V i : Set Of Dictionary Entries. 9/21
12
Algorithm Initialization: V i = 0; PL i = 0; For all (i,j) T i [i,j] = 0 Insert(X)/ Delete(X): T i [i,i] = Clock i. PL i = PL i U { Op,T i [i,i],i} If Op = Insert(X), V i = V i U {X}. If Op = Delete(X), V i = V i – {X}. Send(m) To N k : NP = {eR, (eR є PL i ) & ( N i knows that N k doesn’t know about eR with 2DTT = T i at node N i ). SEND TO N k. Receive(m) From N k : m = NE = Msg to include = those records of which N i isn’t aware of. V i = {V | (V є V i or insertion of V є NE) AND (V hasn’t being deleted from NE ).} Update T i using same concept as matrix timestamp. PL i = {eR, the event belongs to PL i U NE & if at most one node has no info about eR}. 10/21
13
n2 n1 n3 Insert(X,1,1) T1T3 T2 000 000 000 200000 000 000000 000 log dictionary logdictionary 1 x dictionary log 11/21
14
n2 n1 n3 Insert (X,1,1) Insert (X, 1,3) Insert(X,1,2) T1T3 T2 200 210 000 200000 000 200000 201 log dictionary log 1 x dictionary log 1 x 12/21
15
n2 n1 n3 Insert (X,1,1) Insert (X, 1,3) Insert (X,1,2) Insert(Y,2,2) T1T3 T2 200 230 000 200000 000 200000 201 log dictionary log 1 x dictionary log 1 x 2 y 13/21
16
n2 n1 n3 Insert(X,1,1) Insert(Y,3,1) Insert (X, 1,3) Insert (Y,2,3) Insert(X,1,2) Insert(Y,3,2) T1T3 T2 200 230 000 330230 000 200230 202 log dictionary log 1 x dictionary log 1 x 2 y 14/21
17
n2 n1 n3 Insert (X,1,1) Insert( Y,3,1) Insert (Y,2,3) Insert (z,3,3) Insert (X,1,2) Insert (Y,3,2) T1T3 T2 200 230 000 330230 000 200230 204 log dictionary log 1 x dictionary log 1 x 2 y 15/21
18
n2 n1 n3 Insert(X,1,1) Insert(Y,3,1) Insert(Z,4,1) Insert( Y,2,3) Insert (z,4,3) Insert(X,1,2) Insert(Y,3,2) Insert(Z,4,2) T1T3 T2 200 240 204 434230 204 200230 204 log dictionary log 1 x Dictionary log 1 x 2 y 3 z 16/21
19
n2 n1 n3 Insert(Y,3,1) Insert(Z,4,1) Insert( Y,2,3) Insert (z,4,3) Insert(Y,3,2) Insert(Z,4,2) T1T3 T2 200 240 204 634230 204 200230 204 log dictionary log 1 x dictionary log 1 x 2 y 3 z 17/21 3 z
20
Comparison with other work Proposed By: Data Structure used: Disadvantage : Fisher and Michael Dictionary data structures. Have to send entire copy of the dictionary in each message. Allchin Synchronization set (SS) and 1-D Time Table. SS ~= Partial Log SS grows unboundedly. Wuu & Bernstein Dictionary, Log and 2-D Time Table 2-DTT of message complexity = O(n 2 ). is sent in every message. 18/21
21
Improving 2-DTT Message Complexity Strategy Data Structure Stored/Sent. Pros & Cons. 0 Complete 2DTT is stored at the node Complete 2DTT is sent in the message. Message Complexity is as high as O(n 2 ), as one has to send and store n x n matrix. 1 Complete 2DTT is stored at the node. A node sends only its own row in the message. Requires direct messages to update each row. Needs to include more event records. 2 Stores neighbors’ and own rows. Sends corresponding row info. to corresponding neighbor. Can’t determine when all nodes have come to know about an event. Discard event record once all neighbors know about it. 3 Stores all entries (row & column) corresponding to neighbors. Sends row info. thorough the gateway nodes. Better when n/w is large, connectivity and communication are less. Store: O(n 2 ) Send: O(n 2 ) Store: O(n 2 ) Send: O(n) Store: O(nk) Send: O(n) Stores: O(k 2 ) Send:O(k) 19/21
22
Extending The Proposed Solution…. Replicated Numeric Data: It supports add-to and subtract-from operations, that are commutative. Log/2DTT solution makes sure that no matter what order one does the operation, the answer is consistent. So, result1 = b + a –c; result2 = b – c + a; result1 = result2. Detection Of Failure : To distinguish node failure from communication failure, a log is used to collect records of communication events. Suppose node N 1 has the 2DTT as 1 0 0 0 0 0 1 0 3 It knows that no one has received any info from Node 2. So, node 2 might be down. 20/21
23
Conclusion Mutual consistency of replicated data is achieved. Algorithm works well in an unreliable network. Weaker Consistency Constraint is used. Excessive communication, computation and storage costs are reduced. Remember Replicated Log used to compute others’ views of data. Link failure/Message lost: Get info from other nodes. Node failure: Info stored in log/dictionary that are stable storages. Reduction of comm / storage cost: Partial log sent and stored Reduction of computation cost: Partial entries re-calculated in the dictionary 21/21
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.