Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems On Building Reliable Concurrent Systems Vijay.

Slides:



Advertisements
Similar presentations
Modeling and Analyzing Periodic Distributed Computations Anurag Agarwal Vijay Garg Vinit Ogale The University.
Advertisements

Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Scalable Algorithms for Global Snapshots in Distributed Systems
Virtual Time “Virtual Time and Global States of Distributed Systems” Friedmann Mattern, 1989 The Model: An asynchronous distributed system = a set of processes.
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
1 Message Logging Pessimistic & Optimistic CS717 Lecture 10/16/01-10/18/01 Kamen Yotov
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Ordering and Consistent Cuts Presented by Chi H. Ho.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Chapter 10 Global Properties. Unstable Predicate Detection A predicate is stable if, once it becomes true it remains true Snapshot algorithm is not useful.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Control of FACTS Devices Using a Transportation Model Bruce McMillin Computer Science Mariesa Crow Electrical and Computer Engineering University.
Selected topics in distributed computing Shmuel Zaks
1 Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526.
Chapter 9 Global Snapshot. Global state  A set of local states that are concurrent with each other Concurrent states: no two states have a happened before.
Survey on Trace Analyzer (2) Hong, Shin /34Survey on Trace Analyzer (2) KAIST.
On Reducing the Global State Graph for Verification of Distributed Computations Vijay K. Garg, Arindam Chakraborty Parallel and Distributed Systems Laboratory.
Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Fault-Tolerant Parallel and Distributed Computing for Software Engineering Undergraduates Ali Ebnenasir and Jean Mayo {aebnenas, Department.
1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal Vijay K. Garg
“Virtual Time and Global States of Distributed Systems”
Distributed Systems Fall 2010 Logical time, global states, and debugging.
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay.
Software Systems Verification and Validation Laboratory Assignment 4 Model checking Assignment date: Lab 4 Delivery date: Lab 4, 5.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
ICDCS 2006 Efficient Incremental Optimal Chain Partition of Distributed Program Traces Selma Ikiz Vijay K. Garg Parallel and Distributed Systems Laboratory.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Distributed Systems Lecture 6 Global states and snapshots 1.
Global State Recording
Distributed Snapshot.
COT 5611 Operating Systems Design Principles Spring 2012
Global State Recording
Detecting Temporal Logic Predicates on Distributed Computations
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Snapshot.
Parallel and Distributed Simulation
Chien-Liang Fok Distribution Seminar Chien-Liang Fok
Breakpoints and Halting in Distributed Systems
Producing short counterexamples using “crucial events”
Runtime Safety Analysis of Multithreaded Programs
Distributed Snapshot.
Hints for Building Self-. Systems Vijay K
Distributed algorithms
COT 5611 Operating Systems Design Principles Spring 2014
Parallel Exact Stochastic Simulation in Biochemical Systems
Distributed Snapshot.
Presentation transcript:

Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems On Building Reliable Concurrent Systems Vijay K. Garg Professor, Department of ECE and CS Director, PDSL The University of Texas at Austin Austin, TX 78712

2 Motivation: Reliable Software  Multithreaded Distributed programs are prone to errors. – Concurrency, nondeterminism, process and channel failures Techniques to ensure program correctness  Before program development: Model Checking  During: Testing and Debugging  After: Software Fault-Tolerance

3 Paradise Approach Key Abstraction: Global Properties  Model Checking and Verification: Check global properties against model of the program (Promela)  Testing and Debugging: Global breakpoints, Trace analysis  Software Fault-Tolerance: Monitoring for global properties, Controlled Reexecution

4 Talk Outline  Motivation  Monitoring Distributed Systems – Clock : Tracking Dependency – Camera : Global Snapshot (Checkpoint) – Sensor : Detecting Global Properties – Slicer : Computation Slicing – Supervisor: Controlling Execution  Other Projects at PDSL

5 Paradise Environment ProgramMonitorSlicerPredicate Observe Control

6 Trace Model: Total Order vs Partial Order  Total order: interleaving of events in a trace  Partial order: Lamport’s happened-before model f2f2 e1e1 CS 2 CS 1 f1f1 e2e2 P1P1 P2P2 Partial Order Trace CS 2 CS 1 e1e1 e2e2 f1f1 f2f2 e2e2 e1e1 CS 2 f1f1 f2f2 Successful Trace Specification: CS 1 Λ CS 2 ¬CS 2 ¬CS 1 ¬CS 2 ¬CS 1 ¬ CS 2 Faulty Trace 

7 Tracking Dependency computation: a set of events ordered by “happened before” relation  Problem: Timestamp events to answer – e happened before f ? – e concurrent with f ?

8 Clocks in a Distributed System Result: s happened before t i the vector at s is less than the vector at t. Vector Clocks [Fidge 89, Mattern 89] P1P1 (1,0,0)(2,1,0)(3,1,0) P2P2 (0,1,0)(0,2,0) P3P3 (0,0,1)(0,0,2)(2,1,3)

9 Dynamic Chain Clocks  Problem with vector clocks: scalability, dynamic process structure  Idea: Computing the “chains” in an online fashion [Aggarwal and Garg PODC 05] for relevant events a f e b d c h g abcd e fgh A computation with 4 processes The relevant subcomputation P1P2P3P4P1P2P3P4

10 Experimental Results Simulation of a computation with 1% relevant events Measured – number of components vs number of threads – total time overhead vs number of threads

11 Talk Outline  Motivation and Overview  Instrumentation – Clock : Tracking Dependency  Property Checking – Camera : Global Snapshot (Checkpoint) – Sensor : Detecting Global Properties – Slicer : Computation Slicing

12 Global Snapshot Problem: Compute a global snapshot/checkpoint of the system (state of the processes and the channels) Motivation – Checkpointing for fault tolerance – Distributed debugging – Detecting stable predicates

13 Key Difficulties: Taking care of messages Two sites A and B with $400 each. Site A sends a message with $ 100 to site B.  Problem 1: Inconsistent State checkpoint (A) before message sent - $400 message received before checkpoint (B) - $500  Problem 2: Messages in transit message sent before checkpoint (A) - $300 checkpoint (B) before message received - $400 P1P1 P2P2 P3P3 G1G1 G2G2

14 Current Algorithms Key idea: white/red processes and messages  A process must be red to act on a red message  Record white messages received by red processes  [Chandy and Lamport 85] A "marker" message on every channel  [Mattern 89, SBF+ 04] Number of white messages sent on every channel O(N 2 ) messages for completely connected topology

15 Our Algorithms joint work with Rahul Garg, and Yogish Sabharwal, IBM IRL. (ACM International Conference on Supercomputing 2006) – Lower Bound: Ω(N log w) messages – Implementation: Blue Gene/L with MPI. – w: average number of messages in transit AlgorithmNumber of messagesMessage size Grid BasedO(N 3/2 )O(N 1/2 logw) Tree BasedO(N logN logw)O(logw + logN) CentralizedO(N logw)O(logw + logN)

16 Global Property Detection Predicate: A global condition expressed using variables on processes – e.g., more than one process is in critical section, there is no token in the system Problem: find a global state that satisfies the given predicate P1P1 P2P2 G1G1 G2G2 Critical section

17 The Main Difficulty in Partial Order Algorithm for general predicate [Cooper and Marzullo 91] Too many global states : A computation may contain as many as O(k n ) global states k: maximum number of events on a process n: number of processes e1e1 e2e2 f1f1 f2f2 T ┴ P1P1 P2P2 {e 1, ┴ } {f 1, ┴ } {e 1, f 1, ┴ } {e 2, e 1, f 1, ┴ } {e 2, e 1, f 2, f 1, ┴ {e 1, f 2, f 1, ┴ } {e 2, e 1, ┴ } {┴}{┴}

18 Efficient Predicate Detection for Special Cases  stable predicate: [Chandy and Lamport 85] once the predicate becomes true, it stays true e.g., deadlock  unstable predicate: observer independent predicate [Charron-Bost et al 95] occurs in one interleaving  occurs in all interleavings e.g., any disjunction of local predicate linear predicate [Chase and Garg 95] e.g., conjunctive predicates such as there is no leader in the system relational predicate: x1 + x2 +…+ xn ≥ k [Chase and Garg 95] e.g., violation of k-mutual exclusion

19 Linear Predicates The set of consistent cuts that satisfy a linear predicate is closed under intersection [ Chase and Garg 95]. Examples:  conjunctive predicates: critical 1 and critical 2  channel predicates: “all channels are empty”, “there are exactly k messages in the channel from process P to Q”  some relational predicates: x 1 + x 2 · k, when x i is mon. non- decreasing WXWX W Å X

20 Conjunctive Predicates A predicate that can be expressed as l 1 Λ l 2 Λ … l n, where l i is local to P i. Detect errors that may be hidden in some run due to race conditions. Examples: – mutual exclusion problem: (P1 in CS) and (P2 in CS) – missing primary: (P1 is secondary) and (P2 is secondary) and (P3 is secondary) Importance: Sufficient for detection of any boolean expression of local predicates

21 Conjunctive Predicates: Centralized Algorithm (l 1 Λ l 2 Λ … l n ) is true iff there exist s i in P i such that l i is true in state s i, and s i and s j are incomparable for distinct i,j.

22 Algorithms for Conjunctive Predicates Centralized Algorithm [Garg and Waldecker 92] Each non-checker process maintains its local vector and sends to the checker process the chain clock whenever – local predicate is true – at most once in each message interval. Time complexity: Checker requires at most O(n 2 m) comparisons. – token based algorithm [Garg and Chase 95] – completely distributed algorithm [Garg and Chase 95] – keeping queues shorter [Chiou and Korfhage 95] – avoiding control messages [Hurfin, Mizuno, Raynal, Singhal 96]

23 Other Special Classes of Predicates  Relational Predicates – Let x i : number of token at P i – Σ x i < k: loss of tokens – Algorithms: max-flow techniques [Groselj 93, Chase and Garg 95, Wu and Chen 98] – Dilworth's partition [Tomlinson and Garg 96]

24 Relational Predicates Let x i  0 be a variable at P i. Predicates of the form [Chase and Garg 95]  x i  k Algorithm: Consistent cut with minimum value = min cut in the flow graph

25 Predicate Detection in General Explore the state-space (need to examine all global states) without constructing the graph  breadth first manner [Cooper and Marzullo 91]  depth first manner [Alagar and Venkatesan 94]  lexical order [Garg 03] {e 1, ┴ } {f 1, ┴ } {e 1, f 1, ┴ } {e 2, e 1, f 1, ┴ } {e 2, e 1, f 2, f 1, ┴ {e 1, f 2, f 1, ┴ } {e 2, e 1, ┴ } {┴}{┴}

26 Talk Outline  Motivation and Overview  Instrumentation – Clock : Tracking Dependency  Property Checking – Camera : Global Snapshot (Checkpoint) – Sensor : Detecting Global Properties – Slicer : Computation Slicing – Supervisor: Controlling Execution

27 The Main Idea of Computation Slicing Partial order trace slice state explosion keep all red global states slicing

28 How does Computation Slicing Help? Partial order trace slice retain all global states satisfying b 1 slicing for b 1 check b 1 Λ b 2 check b 2 satisfy b 1

29 Example  Detect predicate (x*y + z < 5) Λ (x ≥1) Λ (z ≤ 3) P1P1 P2P2 P3P3 x y z a 1 b 2 c d 0 e 0 f 2 g 1 h 3 u 4 v 1 w 2 x 4 {a,e,f,u,v} {b} {w}{g} Computation Slice with respect to (x ≥1) Λ (z ≤3)

30 Slice slice: a sub-trace such that:  it contains all consistent cuts of the trace satisfying the given predicate  it contains the least number of consistent cuts [Garg and Mittal 01, Mittal and Garg 01] predicate trace slice Slicer

31 Results Efficient polynomial-time algorithms for computing the slice for: – linear predicates: [Garg and Mittal 01] time-complexity: O(n 2 m) – general predicate: Theorem: Given a computation, if a predicate b can be detected efficiently then the slice for b can also be computed efficiently. [Mittal,Sen and Garg 03] – combining slices: Boolean operators – temporal logic operators: EF, AG, EG – approximate slice: For arbitrary boolean expression n: number of processes m: number of events

32 POTA Architecture [Sen and Garg 04] Instrumentor Specification Slicer Predicate Detector Trace Slice Predicate (Specification) Translator Execute Program Execute SPIN Program Instrumented Program Promela TraceSlice yes/ witness no/ counter example no/ counter example yes Analyzer

33 Experiments: Dining Philosophers Trace Verification  POTA: Partial Order Trace Analyzer (based on slicing) [Sen and Garg 03]  SPIN: A widely used model checking tool [Holzmann 97] – SPIN: 250 seconds for n = 6, runs out of memory for n > 6. – POTA: can handle n= 200. Used 400 seconds. Predicate: Two neighboring dining philosophers do not eat concurrently

34 Supervisor: Motivation for Control  maintain global invariants or proper order of events – Examples: Distributed Debugging ensure that busy1 V busy2 is always true ensure that m 1 is delivered before m 2  Fault tolerance On fault, rollback and execute under control

35 Rollback Recovery for Software Faults Re-execution Problem:  To re-execute in order to avoid a recurrence of a previously detected failure – Progressive Retry [Wang et al 97] – Controlled Re-execution [Tarafdar and Garg 98] P1P1 P2P2 P3P3 faulty state restored state

36 Controlled Re-execution Add the synchronization necessary to maintain safety property – e.g., mutual exclusion Critical section P1P1 P2P2 a G1G1 c db

37 Results Efficient algorithms for computing the synchronization for: – Locks [Tarafdar, Garg DISC98] O(nm) algorithm for various types of locks – disjunctive predicate [Mittal, Garg 00] e.g., (n-1)-mutual exclusion time-complexity:} O(m 2 ) minimizes the number of synchronization arrows – region predicate [Mittal, Garg PODC 00] e.g., virtual clocks of processes are “approximately” synchronized time-complexity:O(nm 2 ) maximizes the concurrency in the controlled computation n: number of processes, m: number of events

38 Conclusions  Efficient algorithms possible for monitoring global properties  Observation and Control: a powerful abstraction  Current execution engines are designed for performance rather than fault-tolerance

39 Other Research Projects  Distributed simulation: GVT algorithms, fault- tolerance  Recovery Schemes: Optimistic Message Logging, fault-tolerance without replication  Model Checking: Partial Order Methods  Formal Methods: Petri Nets, Lattice Theory, Max Plus Algebra

40 Questions ??