Deadlock Detection for Distributed Process Networks

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Scalable Algorithms for Global Snapshots in Distributed Systems
Requirements on the Execution of Kahn Process Networks Marc Geilen and Twan Basten 11 April 2003 /e.
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.
Deadlock Detection in Distributed Process Networks – Alex Olson Process Networks (PN)  Determinate dataflow model [Kahn, 1974].  Concurrent processes.
Using Interfaces to Analyze Compositionality Haiyang Zheng and Rachel Zhou EE290N Class Project Presentation Dec. 10, 2004.
Dataflow Process Networks Lee & Parks Synchronous Dataflow Lee & Messerschmitt Abhijit Davare Nathan Kitchen.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
CS533 - Concepts of Operating Systems
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
A Platform-based Design Flow for Kahn Process Networks Abhijit Davare Qi Zhu December 10, 2004.
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.
Voicu Groza, 2008 SITE, HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Hardware/Software Codesign of Embedded Systems Voicu Groza SITE Hall, Room.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Concurrent Programming. Concurrency  Concurrency means for a program to have multiple paths of execution running at (almost) the same time. Examples:
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
Scalable Multi-core Sonar Beamforming with Computational Process Networks Motivation Sonar beamforming requires significant computation and input/output.
CS333 Intro to Operating Systems Jonathan Walpole.
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
Deadlock Detection for Distributed Process Networks Alex Olson Embedded Software Spring 2004.
High Performance Embedded Computing © 2007 Elsevier Lecture 4: Models of Computation Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
REAL-TIME OPERATING SYSTEMS
Processes and threads.
Chapter 3: Process Concept
PROCESS MANAGEMENT IN MACH
Applied Operating System Concepts -
Advanced OS Concepts (For OCR)
Operating System Concepts
Processes and Threads Processes and their scheduling
Prof. Onur Mutlu Carnegie Mellon University
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
Parallel Programming By J. H. Wang May 2, 2017.
CS399 New Beginnings Jonathan Walpole.
Async or Parallel? No they aren’t the same thing!
Processes Overview: Process Concept Process Scheduling
Chapter 3: Process Concept
Chapter 4: Threads 羅習五.
CS490 Windows Internals Quiz 2 09/27/2013.
Applied Operating System Concepts
Chapter 4 Multithreading programming
Chapter 4: Threads.
Chapter 4: Processes Process Concept Process Scheduling
Lecture 2: Processes Part 1
CPU Scheduling G.Anuradha
Recap OS manages and arbitrates resources
Inter Process Communication (IPC)
CS703 - Advanced Operating Systems
Chapter 2: The Linux System Part 3
Operating System Concepts
Threads Chapter 4.
Channels.
COMP60621 Fundamentals of Parallel and Distributed Systems
Multithreaded Programming
Operating Systems Lecture 1.
Prof. Leonardo Mostarda University of Camerino
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Typically for using the shared memory the processes should:
Mark McKelvin EE249 Embedded System Design December 03, 2002
Channels.
Channels.
Programming with Shared Memory Specifying parallelism
GPU Scheduling on the NVIDIA TX2:
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

Deadlock Detection for Distributed Process Networks Alex G. Olson & Brian L. Evans The University of Texas at Austin ICASSP 2005

Motivation for Formal Models Applications may require higher input/output and computational rates than one CPU can handle Exploit parallelism for high performance Parallel (one machine) or distributed (many machines) Pitfalls of parallel/distributed programming Synchronization, shared memory, and deadlock Debugging concurrent code on many processors Formal models have provable properties Determinacy: programs are correct by construction Validation: only debug each component separately Scalability: faster execution with more CPUs ICASSP 2005

Applications Application Input Data Rate Computation Rate Output Data Rate Sonar Beamforming [Allen & Evans 00] 160 MB/s 4-20 GFLOPS 72 MB/s Bzip2 (block-zip) Compression 1-4 MB/s ~1-4 GIPS (approx) MPEG4 Encoding (4CIF) 18 MB/s ~2 GIPS ~1 MB/s H.264 Video Server (QCIF) [Banerjee 02] 1 MB/s ~1 GIPS ~40 KB/s x Design Space Exploration [Vissers & Wolf, 1999] Image Processing [Webb et al., 1999] ICASSP 2005

Process Networks [Kahn, 1974] Concurrently executing processes Communicate only over one-way unbounded channels (FIFO queues) Read one input port at a time Node execution suspended until enough data available Data that has been read is dequeued from channel Samples (tokens) flow along arcs Samples have value but not time information Flow of (untimed) data drives computation Determinate execution Any scheduling algorithm that obeys above rules will produce same history of tokens on arcs ICASSP 2005

Bounding Size of PN Queues Bounded Scheduling [Parks & Lee, 1995] Write to a full queue suspends node execution On global deadlock, resize smallest queue Favors incomplete bounded execution (non-determinate) Computational PN [Allen & Evans, 2000] Processes may consume fewer tokens than read All memory allocation can be handled by queues Bounded Scheduling [Geilen & Basten, 2003] Show local deadlock may not lead to global deadlock Artificial deadlock Deadlock detection required for bounded communication, but no framework detects local deadlock ICASSP 2005

Deadlock Detection Algorithm Mitchell & Merritt’s algorithm [1984] Detects local and global deadlocks Exactly one process detects deadlock Simplifies deadlock resolution Pair of labels (numbers) used for deadlock detection Deadlock detected when a label makes a “round-trip” among set of blocked processes ICASSP 2005

Mitchell-Merritt Example BUSY BUSY Write to B Read from C 1,1 1,2 2,1 3,3 4,2 1,4 A B A B Blocking Step Initial State 1,3 1,4 C D C D Public (count, pid) Private (count, pid) BUSY BUSY Read from A Arrows indicate waiting. Artificial deadlock without feedback. ICASSP 2005

Mitchell-Merritt Example Transmit Step 4,2 2,1 3,3 1,4 Deadlock Detected 4,2 2,1 3,3 1,4 A B A B C D C D Public Label Private Label ICASSP 2005

Implementation Distributed framework for Computational Process Networks TCP sockets for communication Transmit and receive queues (zero-copy) C++, POSIX threads http://www.ece.utexas.edu/~bevans/projects/pn ICASSP 2005

Execution Performance Overhead <1μs per read/write ICASSP 2005

Execution Performance Overhead <1μs per read/write ICASSP 2005

Conclusion Formal models simplify parallel design, implementation, and debugging Communication in PN model follows “Single-Resource” semantics Mitchell-Merritt algorithm applicable to non-distributed, parallel, and distributed PN’s Can be used to implement bounded-memory scheduling algorithms ICASSP 2005