1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal Vijay K. Garg

Slides:

Advertisements

Similar presentations

Modeling and Analyzing Periodic Distributed Computations Anurag Agarwal Vijay Garg Vinit Ogale The University.

Advertisements

System Integration and Performance

Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.

Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.

Scalable Algorithms for Global Snapshots in Distributed Systems

Virtual Time “Virtual Time and Global States of Distributed Systems” Friedmann Mattern, 1989 The Model: An asynchronous distributed system = a set of processes.

SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.

Uncoordinated Checkpointing The Global State Recording Algorithm.

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Parallel and Distributed Simulation Time Warp: Basic Algorithm.

Efficient Constraint Monitoring Using Adaptive Thresholds Srinivas Kashyap, IBM T. J. Watson Research Center Jeyashankar Ramamirtham, Netcore Solutions.

Logical Time Each event is assigned a logical time from a totally ordered set T The logical times for the events must respect any possible dependencies.

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.

CS 582 / CMPE 481 Distributed Systems

Ordering and Consistent Cuts Presented By Biswanath Panda.

CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.

Distributed Systems Fall 2009 Logical time, global states, and debugging.

CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.

Slides for Chapter 10: Time and Global State

Ordering and Consistent Cuts

Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.

Computer Organization and Architecture

Ordering and Consistent Cuts Presented by Chi H. Ho.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.

Chapter 10 Global Properties. Unstable Predicate Detection A predicate is stable if, once it becomes true it remains true Snapshot algorithm is not useful.

CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.

TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.

Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,

Tolerating Faults in Distributed Systems

1 Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526.

Logical Clocks n event ordering, happened-before relation (review) n logical clocks conditions n scalar clocks condition implementation limitation n vector.

Survey on Trace Analyzer (2) Hong, Shin /34Survey on Trace Analyzer (2) KAIST.

On Reducing the Global State Graph for Verification of Distributed Computations Vijay K. Garg, Arindam Chakraborty Parallel and Distributed Systems Laboratory.

Page 1 Logical Clocks Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is.

Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.

“Virtual Time and Global States of Distributed Systems”

CSE 486/586 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.

Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?

Distributed Systems Fall 2010 Logical time, global states, and debugging.

1 Deadlock. 2 Concurrency Issues Past lectures:  Problem: Safely coordinate access to shared resource  Solutions:  Use semaphores, monitors, locks,

Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.

D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.

Static Process Scheduling

Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay.

Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.

Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.

CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

ICDCS 2006 Efficient Incremental Optimal Chain Partition of Distributed Program Traces Selma Ikiz Vijay K. Garg Parallel and Distributed Systems Laboratory.

Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.

Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.

Clock Snooping and its Application in On-the-fly Data Race Detection Koen De Bosschere and Michiel Ronsse University of Ghent, Belgium Taipei, TaiwanDec.

Logical Clocks event ordering, happened-before relation (review) logical clocks conditions scalar clocks  condition  implementation  limitation vector.

Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems On Building Reliable Concurrent Systems Vijay.

Distributed Systems Lecture 6 Global states and snapshots 1.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

New Characterizations in Turnstile Streams with Applications

Parallel Programming By J. H. Wang May 2, 2017.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.

Parallel and Distributed Simulation Techniques

Concurrent Graph Exploration with Multiple Robots

Time And Global Clocks CMPT 431.

Event Ordering.

Parallel and Distributed Simulation

CS 425 / ECE 428  2013, I. Gupta, K. Nahrtstedt, S. Mitra, N. Vaidya, M. T. Harandi, J. Hou.

Efficient Incremental Optimal Chain Partition of Distributed Program Traces Selma Ikiz Vijay K. Garg Parallel and Distributed Systems Laboratory.

Chapter 5 (through section 5.4)

Slides for Chapter 11: Time and Global State

Runtime Safety Analysis of Multithreaded Programs

Jenhui Chen Office number:

Presentation transcript:

1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal Vijay K. Garg PDS Lab University of Texas at Austin

2 Outline Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

3 Motivation Dependency between events required for global state information Applications like monitoring and debugging Vector clock [Fidge 88, Mattern 89]  O(N) operations for a system with N processes  Dynamic creation of processes

4 Outline Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

5 Relevant Events Events “useful” for application Predicate Detection  “There are no messages in the channel” p1p1 p2p2 p3p3 p4p4

6 Vector Clocks [Fidge 88, Mattern 89] Assigns N-tuple (V) to every relevant event  e → f iff e.V < f.V (clock condition) Process P i :  V = (0, …, 0)  On an event e I.If e is receive of message m: V = max (V, m.V) II.If e is a relevant event: V[i] = V[i] + 1 III.If e is a send of message m: m.V = V

7 Outline Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

8 Key Idea Any chain in the computation poset can function as a process a f e b d c h g p1p1 p2p2 p3p3 p4p4 abcd e fgh

9 Chain Clocks A component in timestamp corresponds to a chain Change “Rule II” in the vector clock algorithm  If e is a relevant event V[e.c] = V[e.c] + 1 Theorem: Chain clocks guarantee the “clock condition” Goal: Online decomposition of poset into as few chains as possible

10 Outline Motivation Background Chain Clock Instances of Chain Clock  DCC  ACC  VCC Experimental Results Conclusion

11 Dynamic Chain Clocks (DCC) Shared vector Z maintains up-to-date values of all components Each process starts with empty vector Rule II  e.c = j such that Z[j] = e.V[j] Give preference to component last updated by P i  V[e.c] = V[e.c] + 1

12 DCC: Example I.If e is receive of message m: V = max (V, m.V) II.If e is a relevant event: e.c = i s.t. Z[i] = V[i] V[e.c] = V[e.c] + 1 Z[e.c] = Z[e.c] + 1 III.If e is a send of message m: m.V = V (1) p1p1 p2p2 (0,1) (1,1) = max{(1),(0,1)} 110 V1V1 V2V2 Z (2,1) (3,2) p3p3 V3V (3,1) 1 3 2

13 Problem Number of processes can be much larger than minimal number of chains (1) p1p1 p2p2 (0,1) (1,2) (0,1,1)(1,2,2) (0,1,1,1)(1,2,2,2) p3p3 p4p4

14 Optimal Chain Decomposition Antichain: Set of pairwise concurrent elements Width: Maximum size of an antichain Dilworth’s Theorem [1950] : A poset of width k can be partitioned into k chains and no fewer. Requires knowledge of complete poset

15 Online Chain Decomposition Elements of poset presented in a total order consistent with the poset Assign elements to chains as they arrive Can be modeled as a game between  Bob : Presents elements  Alice : Assigns them to chains Felsner [1997] : For a poset of width k, Bob can force Alice to use k(k+1)/2 chains

16 Chain Partitioning Algorithm (ACC) Felsner gave an algorithm which meets the k(k+1)/2 bound Our algorithm is simpler and more efficient B1B1 B2B2 B3B3 B 1 … B k : |B i | = i For an element z:  Insert into the first queue q in B i with head < z  Swap queues in B i and B i-1 leaving q in its place z

17 Drawback of DCC and ACC Require a shared data structure  Monitoring applications generally need a central server Hybrid clocks  Multiple servers, each responsible for a subset of processes  Finds chains within a process group

18 Shared Memory System Accesses to shared variables induce dependencies Observation: Access events for a shared variable form a chain Variable-based Chain Clocks (VCC)  Associate a component with every variable

19 VCC Application: Predicate Detection Predicate : (x = 1) and (y = 1) Only events changing x and y are relevant Associate a component of VCC with x and other with y x = 0 x =1 x = 2 x = 1y = 1 y = 2 Initially: x=0, y = 0

20 Outline Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

21 Experiments Setup  A multithreaded application  Each thread generates a sequence of events  Parameters: Number of Processes Number of Events Probability of relevant event:   Metrics Number of components used Execution time

22 Components Used Events = 100  = 1%

23 Execution Time Events = 100  = 1%

24 Effect of Relevancy Threads = 100 Events = 100

25 Conclusion Generalized vector clocks to a class of algorithms called Chain Clocks Dynamic Chain Clock (DCC) can provide tremendous speedup and reduce memory requirement for applications Antichain-based Chain Clock (ACC) meets the lower bound for chain decomposition

26 Questions?

27

28 Example: Poset of width 2 For a poset of width 2, Alice can force Bob to use 3 chains

29 Drawback of DCC and ACC Require a shared data structure  Monitoring applications generally need a central server Hybrid clocks  Multiple servers, each responsible for a subset of processes  Finds chains within a process group

30 Example: Poset of width 2 For a poset of width 2, Alice can force Bob to use 3 chains

31 Chain Partitioning Algorithm (ACC) Felsner gave an algorithm which meets the k(k+1)/2 bound Our algorithm is simpler and more efficient B1B1 B2B2 B3B3 B 1 … B k : |B i | = i For an element z:  Insert into the first queue q in B i with head < z  Swap queues in B i and B i-1 leaving q in its place z

32 Happened Before Relation (→) [Lamport 78] Distributed computation with N processes Every process executes a series of events  Internal, send or receive event p1p1 p2p2 e → f if there is a path from e to f e║f if there is no path between e and f

33 Future work Lower bound for online chain decomposition when a decomposition into N chains is already known Other chain decomposition strategies

34 Distributed System: Time vs Threads Events = 100  = 1%

35 Distributed System: Events vs Time Threads = 100  = 1%

36 Effect of Number of Events Threads = 100  = 1%

37 DCC: Example I.If e is receive of message m: V = max (V, m.V) II.If e is a relevant event: e.c = i s.t. Z[i] = V[i] V[e.c] = V[e.c] + 1 Z[e.c] = Z[e.c] + 1 III.If e is a send of message m: m.V = V (1) p1p1 p2p2 (0,1) (1,1) = max{(1),(0,1)} 110 V1V1 V2V2 Z (2,1) (3,2) p3p3 V3V (3,1) 1 3 2

38

39

40 Example for DCC – is it appropriate ? Is the content a bit too much for this amount  Where can I reduce it ? Remove VCC or ACC ? Chain clock  Generalizes vector clocks  Reduces the time and memory overhead  Elegantly handles dynamic process creation