On the Complexity of Buffer Allocation in Message Passing Systems

Slides:

Advertisements

Similar presentations

Problems and Their Classes

Advertisements

1 Deadlock Solutions: Avoidance, Detection, and Recovery CS 241 March 30, 2012 University of Illinois.

Max Cut Problem Daniel Natapov.

Lecture 23. Subset Sum is NPC

1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.

Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.

CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le

CS21 Decidability and Tractability

1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 21 Instructor: Paul Beame.

NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.

The Theory of NP-Completeness

NP-Complete Problems Problems in Computer Science are classified into

1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 24 Instructor: Paul Beame.

CSE 421 Algorithms Richard Anderson Lecture 27 NP Completeness.

The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.

The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.

Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.

MCS 312: NP Completeness and Approximation algorthms Instructor Neelima Gupta

CSCI 2670 Introduction to Theory of Computing December 1, 2004.

Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.

NP-COMPLETENESS PRESENTED BY TUSHAR KUMAR J. RITESH BAGGA.

1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.

EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.

CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.

NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.

1 Chapter 34: NP-Completeness. 2 About this Tutorial What is NP ? How to check if a problem is in NP ? Cook-Levin Theorem Showing one of the most difficult.

CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.

CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.

Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.

The Theory of NP-Completeness

More NP-Complete and NP-hard Problems

More NP-complete problems

Chapter 10 NP-Complete Problems.

Richard Anderson Lecture 26 NP-Completeness

Advanced Algorithms Analysis and Design

Computability and Complexity

Richard Anderson Lecture 26 NP-Completeness

Structural testing, Path Testing

NP-Completeness Yin Tat Lee

CS21 Decidability and Tractability

Intro to Theory of Computation

Introduction to Operating Systems

ICS 353: Design and Analysis of Algorithms

Complexity 6-1 The Class P Complexity Andrei Bulatov.

Parameterised Complexity

Inter Process Communication (IPC)

Richard Anderson Lecture 25 NP-Completeness

1. for (i=0; i < n; i+=2) if (A[i] > A[i+1]) swap(A[i], A[i+1])

Chapter 11 Limitations of Algorithm Power

NP-Complete Problems.

CS 3343: Analysis of Algorithms

Approximating the Buffer Allocation Problem Using Epochs

Netzer & Miller 1990: On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions.

NP-Completeness Yin Tat Lee

CS21 Decidability and Tractability

CS21 Decidability and Tractability

Gábor Kusper Research Institute for Symbolic Computation (RISC-Linz)

The Theory of NP-Completeness

CS21 Decidability and Tractability

Trevor Brown DC 2338, Office hour M3-4pm

The Polynomial Hierarchy Enumeration Problems 7.3.3

Instructor: Aaron Roth

Instructor: Aaron Roth

Our old list of problems

Lecture 23 NP-Hard Problems

Propositional Satisfiability

Presentation transcript:

On the Complexity of Buffer Allocation in Message Passing Systems Joint work with Jan B. Pedersen & Alan Wagner Alex Brodsky University of British Columbia

Outline Motivation Definitions Buffer Allocation Problem Buffer Sufficiency Problem Nonblocking Buffer Allocation Problem Other Models and Related Problems Conclusion

Motivation

Motivation

Motivation

What is the Problem? send(p2,...) send(p1,...) recv(p2,...) Unless there is somewhere to put the message the senders will deadlock. So....

What is the Problem? send(p2,...) send(p1,...) recv(p2,...) Unless there is somewhere to put the message the senders will deadlock. So, buffers are used.

What is the Problem? send(p2,...) send(p1,...) send(p2,...) recv(p2,...) recv(p1,...)

Problem Statement Not all systems have unrestricted amounts of buffers. e.g., clusters that offload message passing functionality to the network interface card (NIC). Hence, we must determine the number of buffers needed for a safe program execution. This is the Buffer Allocation Problem (BAP). Question: What is the complexity of BAP?

Assumptions

Assumptions Processes are asynchronous.

Assumptions Processes are asynchronous. The communication pattern is static. i.e., doesn't change from execution to execution.

Assumptions Processes are asynchronous. The communication pattern is static. i.e., doesn't change from execution to execution. Send/recv calls are explicitly matched. send(p2,...) recv(p1,...)

Assumptions Processes are asynchronous. The communication pattern is static. i.e., doesn't change from execution to execution. Send/recv calls are explicitly matched. Buffers are allocated on the receiver.

Assumptions Processes are asynchronous. The communication pattern is static. i.e., doesn't change from execution to execution. Send/recv calls are explicitly matched. Buffers are allocated on the receiver. Sends block if no buffers are available & receiver is not ready.

Problem Input

Problem Input What is the invariant across executions of a program?

Problem Input What is the invariant across executions of a program? The static communication pattern

Problem Input What is the invariant across executions of a program? The static communication pattern. Use communication graphs to represent communication patterns. The communication graph becomes the problem input.

Communication Graph P0 P1 P2 P3 P4 P5 Process component Time Processes are denoted by vertical process arcs (up to down).

Communication Graph P0 P1 start Event send recv end Events (start, end, send, receive) are denoted by vertices.

Communication Graph P0 P1 send Communication arcs denote sends from one process to another.

Communication Graph P0 P1 P2 P3 P4 P5 Examples of communication graphs.

Arrival != Receipt P0 P1 P2 P3 Arrival interval Receive event occurs when message is received.

Arrival != Receipt P0 P1 P2 P3 Arrival interval Messages can arrive before receive events.

Dependencies P0 P1 All events depend on start events.

Dependencies P0 P1 Receive events depend on send events.

Dependencies P0 P1 If there are NO buffers, send events depend on receive events.

Dependencies P2 P3 A send event depends on the preceding event.

Dependencies P2 P3 The arrival interval is defined by a receive event and its dependency within the same process.

Circular Dependency & Deadlock With no buffers, the send/receive events depend on each other.

Circular Dependency & Deadlock A circular dependency (with no buffers) represents deadlock.

The t-ring P0 P1 P2 P3 P4 P5 We call such a circular dependency a t-ring, e.g., t=6.

Solving Deadlock P0 P1 P2 P3 P4 P5 1 2 1 2 To solve deadlock, we use buffers.

Buffer Assignment P0 P1 P2 P3 P4 P5 1 2 1 2 Each process is assigned 0 or more buffers.

Solving Deadlock P0 P1 1 Initially, neither process can complete a send.

Solving Deadlock P0 P1 1 Message from process 0 is buffered by process 1.

Solving Deadlock P0 P1 1 Process 0 can proceed to receive from process 1.

Solving Deadlock P0 P1 1 Finally, process 1 receives from process 0.

Solving Deadlock P2 P3 P4 P5 2 Initially, none of the sends can complete.

Solving Deadlock P2 P3 P4 P5 2 Since message arrival is nondeterministic, 2 buffers are needed.

Solving Deadlock P2 P3 P4 P5 2 Process P4 can complete its receives.

Solving Deadlock P2 P3 P4 P5 2 Process P3 can complete its receives.

Safety A program is k-safe if k buffers are sufficient to guarantee deadlock free execution.

Buffer Allocation Problem (BAP) Informal Question: How many buffers does a program need to avoid deadlock? Formal Question: Given a communication graph, how many buffers are needed to avoid deadlock in the corresponding program? Decision Question (BAP): Given a communication graph and integer k, is the corresponding program k-safe?

Thm: BAP is NP-hard Proof by reduction from 3SAT 3SAT Decision Problem: Does the formula of the form ∧i (ai ∨ bi ∨ ci) have a satisfying assignment where each ai, bi, ci, is a either a variable xj or its negation, (n variables). Idea: For any 3SAT formula we show how to construct a corresponding communication graph to test n-safety (requires n buffers). 2 widgets: fix assignment and check clauses

The Construction x0 ~x0 x1 ~x1 x2 ~x2 x3 ~x3 For a formula over n variables create a graph with 2n processes.

Fixing the assignment x0 ~x0 x1 ~x1 x2 ~x2 x3 ~x3 The 2-rings are used to fix a variable assignment.

Fixing the assignment x0 ~x0 x1 ~x1 x2 ~x2 x3 ~x3 1 1 1 1 1 1 1 1 A buffer assignment fixes the variables, e.g., ~x0, x1, x2, ~x3. No more than n buffers may be selected, (testing for n-safety).

Use a 3-ring for each Clause x0 ~x0 x1 ~x1 x2 ~x2 x3 ~x3 1 1 1 1 (x0+x1+x3) (x0+~x2+x3) Each clause is represented by a 3-ring. Which will not deadlock only if one of the processes has a buffer.

Unsatisfied Clauses and 3-rings x0 ~x0 x1 ~x1 x2 ~x2 x3 ~x3 1 1 1 1 (x0+x1+x3) (x0+~x2+x3) This first 3-ring does not deadlock. The 3-ring corresponds to a satisfied clause.

Unsatisfied Clauses and 3-rings x0 ~x0 x1 ~x1 x2 ~x2 x3 ~x3 1 1 1 1 (x0+x1+x3) (x0+~x2+x3) For this buffer assignment the second 3-ring will deadlock. The program is n-safe if none of the 3-rings deadlock.

A Better Buffer Assignment x0 ~x0 x1 ~x1 x2 ~x2 x3 ~x3 1 1 1 1 (x0+x1+x3) (x0+~x2+x3) Buffer assignments of size n that prevent deadlock correspond to satisfying assignments for the formula.

Thus, BAP is NP-Hard! x0 ~x0 x1 ~x1 x2 ~x2 x3 ~x3 1 1 1 1 (x0+x1+x3) 1 1 1 1 (x0+x1+x3) (x0+~x2+x3)

Buffer Sufficiency Problem (BSP) How about something easier? To solve BAP we need to verify that a buffer assignment is safe, this is also HARD! Decision Problem: Given a communication graph and a buffer assignment, does the buffer assignment yield a safe execution?

Thm: BSP is coNP-complete Proof by reduction from Tautology Given a DNF formula, is it true for all assignments? Construct a communication graph that corresponds to a given formula. Play a colouring game on the graph which simulates the execution of a program. There exists a simulation that deadlocks iff the formula is not a tautology.

Buffer Stealing P2 P3 P4 1 P2 is blocked on a receive P2 is blocked on a receive Which process blocks, P3 or P4 ? Either execution is possible.

Buffer Stealing P2 P3 P4 1 P2 is blocked on a receive P2 is blocked on a receive The process whose message arrives first steals the buffer.

Fixing an Assignment P2 xi ~xi 1 A buffer stealing widget used to fix an assignment.

Fixing an Assignment P2 xi ~xi 1 Fixing an assignment corresponds to an execution.

Terms of a DNF T0 T1 T2 T3 T4 T5 1 1 1 1 1 1 When does this system deadlock?

Terms of a DNF T0 T1 T2 T3 T4 T5 1 1 1 1 1 1 Message arrival corresponds to a term being false.

Terms of a DNF T0 T1 T2 T3 T4 T5 1 1 1 1 1 1 If message arrival steals all buffers, the t-ring will deadlock.

Sketch of proof A buffer stealing widget forces an execution which corresponds to a variable assignment. The simulation deadlocks on the sum widget only if the formula is false on the assignment. Also uses the buffer stealing mechanism. Hence, a simulation can deadlock iff the corresponding formula is not a tautology.

How does this help us? While these results are interesting, they don't help us solve our problem! How to determine the number of buffers our program needs.

How does this help us? While these results are interesting, they don't help us solve our problem! How to determine the number of buffers our program needs. Suppose, we added an additional restriction: the program should not block (or deadlock) due to lack of buffers!

Amazingly, this makes our problem tractable! How does this help us? While these results are interesting, they don't help us solve our problem! How to determine the number of buffers our program needs. Suppose, we added an additional restriction: the program should not block (or deadlock) due to lack of buffers! Amazingly, this makes our problem tractable!

The Nonblocking Buffer Allocation Problem (NBAP) Informal Question: How many buffers does a program need to execute without blocking on a send? Decision Question: Given a communication graph and an integer k, are all executions of the corresponding program send block free? Upper bound: # of receives in each process.

When is a buffer needed? P0 P1 P2 P3 P4 P5 Arrival interval A buffer is needed only during the arrival interval.

How Long is the Interval? P0 P1 P2 P3 P4 P5 Arrival interval The interval extends to preceding dependency in the same process.

Each interval requires a buffer P0 P1 P2 P3 P4 P5 2 buffers, no overlap

Each interval requires a buffer P0 P1 P2 P3 P4 P5 3 buffers, 1 overlap

Each interval requires a buffer P0 P1 P2 P3 P4 P5 5 buffers, 3 overlaps of size 2, 1 overlap of size 3

The Algorithm P0 P1 P2 P3 P4 P5 3 2 Compute the maximum per process overlap.

The Algorithm P0 P1 P2 P3 P4 P5 5 = + + 3 + 2 + + 5 = + + 3 + 2 + + Sum the per process buffers.

Implementation First: detect dependencies to minimize arrival intervals. Use depth first search and dynamic programming. Arrival interval

Implementation First: detect dependencies to minimize arrival intervals. Use depth first search and dynamic programming. Second: Compute max overlap for each process. Sort arrival intervals of each process. Find maximum overlap. 1 Arrival interval

Implementation First: detect dependencies to minimize arrival intervals. Use depth first search and dynamic programming. Second: Compute max overlap for each process. Sort arrival intervals of each process. Find maximum overlap. Total time: O(|V|n + |V|log|V|) 1 Arrival interval

Other Models We considered a model where the buffer is allocated on the receive side.

Other Models We considered a model where the buffer is allocated on the receive side. Other models include: Send side buffers

Other Models We considered a model where the buffer is allocated on the receive side. Other models include: Send side buffers Send / recv side buffers

Results We considered a model where the buffer is allocated on the receive side. Other models include: Send side buffers Send / recv side buffers For these we have the following results: Problem Recv Side Send Side Send/Recv BAP NP-hard NP-hard NP-hard BSP CoNP-C CoNP-C NBAP P P

Results We considered a model where the buffer is allocated on the receive side. Other models include: Send side buffers Send / recv side buffers For these we have the following results: Problem Recv Side Send Side Send/Recv BAP NP-hard NP-hard NP-hard BSP CoNP-C P CoNP-C NBAP P P NP-hard

Conclusions Solving the buffer allocation problem for programs with static communication patterns and simple communication primitives is NP- hard. Even verifying a solution to the buffer allocation problem is hard (coNP-complete). Fortunately, if programs are required to be block free, as well as deadlock free, then the problem becomes tractable!

Thank you

Solving Deadlock P0 P1 1 Message from process 0 is buffered by process 1.

Solving Deadlock P0 P1 P2 P3 P4 P5 1 2 1 2 Finally, process 1 receives buffered message from process 0.

Solving Deadlock P0 P1 P2 P3 P4 P5 1 2 1 2 Process 0 can then receive from process 1.

The Algorithm P0 P1 P2 P3 P4 P5 3 2 Compute the maximum per process overlap.

The Algorithm P0 P1 P2 P3 P4 P5 5 = + + 3 + 2 + + 5 = + + 3 + 2 + + Sum the per process buffers.

Implementation First: detect dependencies To minimize arrival intervals Arrival interval