Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal Vijay K. Garg

Similar presentations


Presentation on theme: "1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal Vijay K. Garg"— Presentation transcript:

1 1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal (anurag@cs.utexas.edu) Vijay K. Garg (garg@ece.utexas.edu) PDS Lab University of Texas at Austin

2 2 Outline Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

3 3 Motivation Dependency between events required for global state information Applications like monitoring and debugging Vector clock [Fidge 88, Mattern 89]  O(N) operations for a system with N processes  Dynamic creation of processes

4 4 Outline Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

5 5 Relevant Events Events “useful” for application Predicate Detection  “There are no messages in the channel” p1p1 p2p2 p3p3 p4p4

6 6 Vector Clocks [Fidge 88, Mattern 89] Assigns N-tuple (V) to every relevant event  e → f iff e.V < f.V (clock condition) Process P i :  V = (0, …, 0)  On an event e I.If e is receive of message m: V = max (V, m.V) II.If e is a relevant event: V[i] = V[i] + 1 III.If e is a send of message m: m.V = V

7 7 Outline Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

8 8 Key Idea Any chain in the computation poset can function as a process a f e b d c h g p1p1 p2p2 p3p3 p4p4 abcd e fgh

9 9 Chain Clocks A component in timestamp corresponds to a chain Change “Rule II” in the vector clock algorithm  If e is a relevant event V[e.c] = V[e.c] + 1 Theorem: Chain clocks guarantee the “clock condition” Goal: Online decomposition of poset into as few chains as possible

10 10 Outline Motivation Background Chain Clock Instances of Chain Clock  DCC  ACC  VCC Experimental Results Conclusion

11 11 Dynamic Chain Clocks (DCC) Shared vector Z maintains up-to-date values of all components Each process starts with empty vector Rule II  e.c = j such that Z[j] = e.V[j] Give preference to component last updated by P i  V[e.c] = V[e.c] + 1

12 12 DCC: Example I.If e is receive of message m: V = max (V, m.V) II.If e is a relevant event: e.c = i s.t. Z[i] = V[i] V[e.c] = V[e.c] + 1 Z[e.c] = Z[e.c] + 1 III.If e is a send of message m: m.V = V (1) p1p1 p2p2 (0,1) (1,1) = max{(1),(0,1)} 110 V1V1 V2V2 Z 111 22 (2,1) (3,2) p3p3 V3V3 1 3 2 3 (3,1) 1 3 2

13 13 Problem Number of processes can be much larger than minimal number of chains (1) p1p1 p2p2 (0,1) (1,2) (0,1,1)(1,2,2) (0,1,1,1)(1,2,2,2) p3p3 p4p4

14 14 Optimal Chain Decomposition Antichain: Set of pairwise concurrent elements Width: Maximum size of an antichain Dilworth’s Theorem [1950] : A poset of width k can be partitioned into k chains and no fewer. Requires knowledge of complete poset

15 15 Online Chain Decomposition Elements of poset presented in a total order consistent with the poset Assign elements to chains as they arrive Can be modeled as a game between  Bob : Presents elements  Alice : Assigns them to chains Felsner [1997] : For a poset of width k, Bob can force Alice to use k(k+1)/2 chains

16 16 Chain Partitioning Algorithm (ACC) Felsner gave an algorithm which meets the k(k+1)/2 bound Our algorithm is simpler and more efficient B1B1 B2B2 B3B3 B 1 … B k : |B i | = i For an element z:  Insert into the first queue q in B i with head < z  Swap queues in B i and B i-1 leaving q in its place z

17 17 Drawback of DCC and ACC Require a shared data structure  Monitoring applications generally need a central server Hybrid clocks  Multiple servers, each responsible for a subset of processes  Finds chains within a process group

18 18 Shared Memory System Accesses to shared variables induce dependencies Observation: Access events for a shared variable form a chain Variable-based Chain Clocks (VCC)  Associate a component with every variable

19 19 VCC Application: Predicate Detection Predicate : (x = 1) and (y = 1) Only events changing x and y are relevant Associate a component of VCC with x and other with y x = 0 x =1 x = 2 x = 1y = 1 y = 2 Initially: x=0, y = 0

20 20 Outline Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

21 21 Experiments Setup  A multithreaded application  Each thread generates a sequence of events  Parameters: Number of Processes Number of Events Probability of relevant event:   Metrics Number of components used Execution time

22 22 Components Used Events = 100  = 1%

23 23 Execution Time Events = 100  = 1%

24 24 Effect of Relevancy Threads = 100 Events = 100

25 25 Conclusion Generalized vector clocks to a class of algorithms called Chain Clocks Dynamic Chain Clock (DCC) can provide tremendous speedup and reduce memory requirement for applications Antichain-based Chain Clock (ACC) meets the lower bound for chain decomposition

26 26 Questions?

27 27

28 28 Example: Poset of width 2 For a poset of width 2, Alice can force Bob to use 3 chains 1 2 1 3

29 29 Drawback of DCC and ACC Require a shared data structure  Monitoring applications generally need a central server Hybrid clocks  Multiple servers, each responsible for a subset of processes  Finds chains within a process group

30 30 Example: Poset of width 2 For a poset of width 2, Alice can force Bob to use 3 chains 1 2 1 3

31 31 Chain Partitioning Algorithm (ACC) Felsner gave an algorithm which meets the k(k+1)/2 bound Our algorithm is simpler and more efficient B1B1 B2B2 B3B3 B 1 … B k : |B i | = i For an element z:  Insert into the first queue q in B i with head < z  Swap queues in B i and B i-1 leaving q in its place z

32 32 Happened Before Relation (→) [Lamport 78] Distributed computation with N processes Every process executes a series of events  Internal, send or receive event p1p1 p2p2 e → f if there is a path from e to f e║f if there is no path between e and f

33 33 Future work Lower bound for online chain decomposition when a decomposition into N chains is already known Other chain decomposition strategies

34 34 Distributed System: Time vs Threads Events = 100  = 1%

35 35 Distributed System: Events vs Time Threads = 100  = 1%

36 36 Effect of Number of Events Threads = 100  = 1%

37 37 DCC: Example I.If e is receive of message m: V = max (V, m.V) II.If e is a relevant event: e.c = i s.t. Z[i] = V[i] V[e.c] = V[e.c] + 1 Z[e.c] = Z[e.c] + 1 III.If e is a send of message m: m.V = V (1) p1p1 p2p2 (0,1) (1,1) = max{(1),(0,1)} 110 V1V1 V2V2 Z 111 22 (2,1) (3,2) p3p3 V3V3 1 3 2 3 (3,1) 1 3 2

38 38

39 39

40 40 Example for DCC – is it appropriate ? Is the content a bit too much for this amount  Where can I reduce it ? Remove VCC or ACC ? Chain clock  Generalizes vector clocks  Reduces the time and memory overhead  Elegantly handles dynamic process creation


Download ppt "1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal Vijay K. Garg"

Similar presentations


Ads by Google