Deadlock Detection for Distributed Process Networks Alex G. Olson & Brian L. Evans The University of Texas at Austin ICASSP 2005
Motivation for Formal Models Applications may require higher input/output and computational rates than one CPU can handle Exploit parallelism for high performance Parallel (one machine) or distributed (many machines) Pitfalls of parallel/distributed programming Synchronization, shared memory, and deadlock Debugging concurrent code on many processors Formal models have provable properties Determinacy: programs are correct by construction Validation: only debug each component separately Scalability: faster execution with more CPUs ICASSP 2005
Applications Application Input Data Rate Computation Rate Output Data Rate Sonar Beamforming [Allen & Evans 00] 160 MB/s 4-20 GFLOPS 72 MB/s Bzip2 (block-zip) Compression 1-4 MB/s ~1-4 GIPS (approx) MPEG4 Encoding (4CIF) 18 MB/s ~2 GIPS ~1 MB/s H.264 Video Server (QCIF) [Banerjee 02] 1 MB/s ~1 GIPS ~40 KB/s x Design Space Exploration [Vissers & Wolf, 1999] Image Processing [Webb et al., 1999] ICASSP 2005
Process Networks [Kahn, 1974] Concurrently executing processes Communicate only over one-way unbounded channels (FIFO queues) Read one input port at a time Node execution suspended until enough data available Data that has been read is dequeued from channel Samples (tokens) flow along arcs Samples have value but not time information Flow of (untimed) data drives computation Determinate execution Any scheduling algorithm that obeys above rules will produce same history of tokens on arcs ICASSP 2005
Bounding Size of PN Queues Bounded Scheduling [Parks & Lee, 1995] Write to a full queue suspends node execution On global deadlock, resize smallest queue Favors incomplete bounded execution (non-determinate) Computational PN [Allen & Evans, 2000] Processes may consume fewer tokens than read All memory allocation can be handled by queues Bounded Scheduling [Geilen & Basten, 2003] Show local deadlock may not lead to global deadlock Artificial deadlock Deadlock detection required for bounded communication, but no framework detects local deadlock ICASSP 2005
Deadlock Detection Algorithm Mitchell & Merritt’s algorithm [1984] Detects local and global deadlocks Exactly one process detects deadlock Simplifies deadlock resolution Pair of labels (numbers) used for deadlock detection Deadlock detected when a label makes a “round-trip” among set of blocked processes ICASSP 2005
Mitchell-Merritt Example BUSY BUSY Write to B Read from C 1,1 1,2 2,1 3,3 4,2 1,4 A B A B Blocking Step Initial State 1,3 1,4 C D C D Public (count, pid) Private (count, pid) BUSY BUSY Read from A Arrows indicate waiting. Artificial deadlock without feedback. ICASSP 2005
Mitchell-Merritt Example Transmit Step 4,2 2,1 3,3 1,4 Deadlock Detected 4,2 2,1 3,3 1,4 A B A B C D C D Public Label Private Label ICASSP 2005
Implementation Distributed framework for Computational Process Networks TCP sockets for communication Transmit and receive queues (zero-copy) C++, POSIX threads http://www.ece.utexas.edu/~bevans/projects/pn ICASSP 2005
Execution Performance Overhead <1μs per read/write ICASSP 2005
Execution Performance Overhead <1μs per read/write ICASSP 2005
Conclusion Formal models simplify parallel design, implementation, and debugging Communication in PN model follows “Single-Resource” semantics Mitchell-Merritt algorithm applicable to non-distributed, parallel, and distributed PN’s Can be used to implement bounded-memory scheduling algorithms ICASSP 2005