Copyright © 2001 Stephen A. Edwards All rights reserved Dataflow Languages Prof. Stephen A. Edwards.

Slides:



Advertisements
Similar presentations
1 Chapter 5 Concurrency: Mutual Exclusion and Synchronization Principals of Concurrency Mutual Exclusion: Hardware Support Semaphores Readers/Writers Problem.
Advertisements

Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
1 ICS102: Introduction To Computing King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Requirements on the Execution of Kahn Process Networks Marc Geilen and Twan Basten 11 April 2003 /e.
EECB 473 Data Network Architecture and Electronics Lecture 3 Packet Processing Functions.
Chapter 6 Concurrency: Deadlock and Starvation
Chapter 6 Concurrency: Deadlock and Starvation
DATAFLOW PROCESS NETWORKS Edward A. Lee Thomas M. Parks.
Synthesis of Embedded Software Using Free-Choice Petri Nets.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Fall 2005 Extending LANs Qutaibah Malluhi CSE Department Qatar University Repeaters, Hubs, Bridges, Fiber Modems, and Switches.
Scheduling for Embedded Real-Time Systems Amit Mahajan and Haibo.
A Schedulability-Preserving Transformation of BDF to Petri Nets Cong Liu EECS 290n Class Project December 10, 2004.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
CS 206 Introduction to Computer Science II 10 / 14 / 2009 Instructor: Michael Eckmann.
CSCI 4550/8556 Computer Networks Comer, Chapter 11: Extending LANs: Fiber Modems, Repeaters, Bridges and Switches.
1 Quasi-Static Scheduling of Embedded Software Using Free-Choice Petri Nets Marco Sgroi, Alberto Sangiovanni-Vincentelli Luciano Lavagno University of.
Dataflow Process Networks Lee & Parks Synchronous Dataflow Lee & Messerschmitt Abhijit Davare Nathan Kitchen.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
Models of Computation for Embedded System Design Alvise Bonivento.
A Denotational Semantics For Dataflow with Firing Edward A. Lee Jike Chong Wei Zheng Paper Discussion for.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 18 Program Correctness To treat programming.
CS533 - Concepts of Operating Systems
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing.
Copyright © 2001 Stephen A. Edwards All rights reserved Review for Final Prof. Stephen A. Edwards.
Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.
Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.
Models of Computation as Program Transformations Chris Chang
1 TRAPEZOIDAL RULE IN MPI Copyright © 2010, Elsevier Inc. All rights Reserved.
Advances in Language Design
Chapter 3 Memory Management: Virtual Memory
S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz
Voicu Groza, 2008 SITE, HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Hardware/Software Codesign of Embedded Systems Voicu Groza SITE Hall, Room.
Recursion Chapter 7. Chapter Objectives  To understand how to think recursively  To learn how to trace a recursive method  To learn how to write recursive.
The Functions of Operating Systems Interrupts. Learning Objectives Explain how interrupts are used to obtain processor time. Explain how processing of.
Developing an Algorithm
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
Mahapatra-A&M-Fall'001 Co-design Finite State Machines Many slides of this lecture are borrowed from Margarida Jacome.
Recursion. What is recursion? Rules of recursion Mathematical induction The Fibonacci sequence Summary Outline.
Pseudocode. Simple Program Design, Fourth Edition Chapter 2 2 Objectives In this chapter you will be able to: Introduce common words, keywords, and meaningful.
CE Operating Systems Lecture 13 Linux/Unix interprocess communication.
Chapter 11 Extending LANs 1. Distance limitations of LANs 2. Connecting multiple LANs together 3. Repeaters 4. Bridges 5. Filtering frame 6. Bridged network.
CSCI1600: Embedded and Real Time Software Lecture 11: Modeling IV: Concurrency Steven Reiss, Fall 2015.
Types and Programming Languages Lecture 11 Simon Gay Department of Computing Science University of Glasgow 2006/07.
Finite State Machines (FSM) OR Finite State Automation (FSA) - are models of the behaviors of a system or a complex object, with a limited number of defined.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 3: Embedded Computing High Performance Embedded Computing Wayne Wolf.
High Performance Embedded Computing © 2007 Elsevier Lecture 4: Models of Computation Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
M1G Introduction to Programming 2 2. Creating Classes: Game and Player.
Agenda  Quick Review  Finish Introduction  Java Threads.
Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.
Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Recursion Topic 5.
Dataflow Machines CMPE 511
ESE532: System-on-a-Chip Architecture
Algorithm Analysis CSE 2011 Winter September 2018.
CSCI1600: Embedded and Real Time Software
Processor Fundamentals
Theory of Computation Turing Machines.
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Background and Motivation
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Computer Architecture
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Copyright © 2001 Stephen A. Edwards All rights reserved Dataflow Languages Prof. Stephen A. Edwards

Copyright © 2001 Stephen A. Edwards All rights reserved Philosophy of Dataflow Languages  Drastically different way of looking at computation  Von Neumann imperative language style: program counter is king  Dataflow language: movement of data the priority  Scheduling responsibility of the system, not the programmer

Copyright © 2001 Stephen A. Edwards All rights reserved Dataflow Language Model  Processes communicating through FIFO buffers Process 1 Process 2 Process 3 FIFO Buffer

Copyright © 2001 Stephen A. Edwards All rights reserved Dataflow Languages  Every process runs simultaneously  Processes can be described with imperative code  Compute … compute … receive … compute … transmit  Processes can only communicate through buffers

Copyright © 2001 Stephen A. Edwards All rights reserved Dataflow Communication  Communication is only through buffers  Buffers usually treated as unbounded for flexibility  Sequence of tokens read guaranteed to be the same as the sequence of tokens written  Destructive read: reading a value from a buffer removes the value  Much more predictable than shared memory

Copyright © 2001 Stephen A. Edwards All rights reserved Dataflow Languages  Once proposed for general-purpose programming  Fundamentally concurrent: should map more easily to parallel hardware  A few lunatics built general-purpose dataflow computers based on this idea  Largely a failure: memory spaces anathema to the dataflow formalism

Copyright © 2001 Stephen A. Edwards All rights reserved Applications of Dataflow  Not a good fit for, say, a word processor  Good for signal-processing applications  Anything that deals with a continuous stream of data  Becomes easy to parallelize  Buffers typically used for signal processing applications anyway

Copyright © 2001 Stephen A. Edwards All rights reserved Applications of Dataflow  Perfect fit for block-diagram specifications Circuit diagrams Linear/nonlinear control systems Signal processing  Suggest dataflow semantics  Common in Electrical Engineering  Processes are blocks, connections are buffers

Copyright © 2001 Stephen A. Edwards All rights reserved Kahn Process Networks  Proposed by Kahn in 1974 as a general-purpose scheme for parallel programming  Laid the theoretical foundation for dataflow  Unique attribute: deterministic  Difficult to schedule  Too flexible to make efficient, not flexible enough for a wide class of applications  Never put to widespread use

Copyright © 2001 Stephen A. Edwards All rights reserved Kahn Process Networks  Key idea: Reading an empty channel blocks until data is available  No other mechanism for sampling communication channel’s contents Can’t check to see whether buffer is empty Can’t wait on multiple channels at once

Copyright © 2001 Stephen A. Edwards All rights reserved Kahn Processes  A C-like function (Kahn used Algol)  Arguments include FIFO channels  Language augmented with send() and wait() operations that write and read from channels

Copyright © 2001 Stephen A. Edwards All rights reserved A Kahn Process  From Kahn’s original 1974 paper process f(in int u, in int v, out int w) { int i; bool b = true; for (;;) { i = b ? wait(u) : wait(w); printf("%i\n", i); send(i, w); b = !b; } f u v w Process alternately reads from u and v, prints the data value, and writes it to w

Copyright © 2001 Stephen A. Edwards All rights reserved A Kahn Process  From Kahn’s original 1974 paper process f(in int u, in int v, out int w) { int i; bool b = true; for (;;) { i = b ? wait(u) : wait(w); printf("%i\n", i); send(i, w); b = !b; } Process interface includes FIFOs wait() returns the next token in an input FIFO, blocking if it’s empty send() writes a data value on an output FIFO

Copyright © 2001 Stephen A. Edwards All rights reserved A Kahn Process  From Kahn’s original 1974 paper process g(in int u, out int v, out int w) { int i; bool b = true; for(;;) { i = wait(u); if (b) send(i, v); else send(i, w); b = !b; } g u v w Process reads from u and alternately copies it to v and w

Copyright © 2001 Stephen A. Edwards All rights reserved A Kahn System  Prints an alternating sequence of 0’s and 1’s fg h h Emits a 1 then copies input to output Emits a 0 then copies input to output

Copyright © 2001 Stephen A. Edwards All rights reserved Proof of Determinism  Because a process can’t check the contents of buffers, only read from them, each process only sees sequence of data values coming in on buffers  Behavior of process: Compute … read … compute … write … read … compute  Values written only depend on program state  Computation only depends on program state  Reads always return sequence of data values, nothing more

Copyright © 2001 Stephen A. Edwards All rights reserved Determinism  Another way to see it:  If I’m a process, I am only affected by the sequence of tokens on my inputs  I can’t tell whether they arrive early, late, or in what order  I will behave the same in any case  Thus, the sequence of tokens I put on my outputs is the same regardless of the timing of the tokens on my inputs

Copyright © 2001 Stephen A. Edwards All rights reserved Scheduling Kahn Networks  Challenge is running processes without accumulating tokens A C B

Copyright © 2001 Stephen A. Edwards All rights reserved Scheduling Kahn Networks  Challenge is running processes without accumulating tokens A C B Always emit tokens Only consumes tokens from A Tokens will accumulate here

Copyright © 2001 Stephen A. Edwards All rights reserved Demand-driven Scheduling?  Apparent solution: only run a process whose outputs are being actively solicited  However... A C B D Always consume tokens Always produce tokens

Copyright © 2001 Stephen A. Edwards All rights reserved Other Difficult Systems  Not all systems can be scheduled without token accumulation Produces two a’s for every b a b Alternates between receiving one a and one b

Copyright © 2001 Stephen A. Edwards All rights reserved Tom Parks’ Algorithm  Schedules a Kahn Process Network in bounded memory if it is possible  Start with bounded buffers  Use any scheduling technique that avoids buffer overflow  If system deadlocks because of buffer overflow, increase size of smallest buffer and continue

Copyright © 2001 Stephen A. Edwards All rights reserved Parks’ Algorithm in Action  Start with buffers of size 1  Run A, B, C, D A C B D Only consumes tokens from A

Copyright © 2001 Stephen A. Edwards All rights reserved Parks’ Algorithm in Action  B blocked waiting for space in B->C buffer  Run A, then C  System will run indefinitely A C B D Only consumes tokens from A

Copyright © 2001 Stephen A. Edwards All rights reserved Parks’ Scheduling Algorithm  Neat trick  Whether a Kahn network can execute in bounded memory is undecidable  Parks’ algorithm does not violate this  It will run in bounded memory if possible, and use unbounded memory if necessary

Copyright © 2001 Stephen A. Edwards All rights reserved Using Parks’ Scheduling Algorithm  It works, but…  Requires dynamic memory allocation  Does not guarantee minimum memory usage  Scheduling choices may affect memory usage  Data-dependent decisions may affect memory usage  Relatively costly scheduling technique  Detecting deadlock may be difficult

Copyright © 2001 Stephen A. Edwards All rights reserved Kahn Process Networks  Their beauty is that the scheduling algorithm does not affect their functional behavior  Difficult to schedule because of need to balance relative process rates  System inherently gives the scheduler few hints about appropriate rates  Parks’ algorithm expensive and fussy to implement  Might be appropriate for coarse-grain systems Scheduling overhead dwarfed by process behavior

Copyright © 2001 Stephen A. Edwards All rights reserved Synchronous Dataflow (SDF)  Edward Lee and David Messerchmitt, Berkeley, 1987  Restriction of Kahn Networks to allow compile-time scheduling  Basic idea: each process reads and writes a fixed number of tokens each time it fires: loop read 3 A, 5 B, 1 C … compute … write 2 D, 1 E, 7 F end loop

Copyright © 2001 Stephen A. Edwards All rights reserved SDF and Signal Processing  Restriction natural for multirate signal processing  Typical signal-processing processes:  Unit-rate Adders, multipliers  Upsamplers (1 in, n out)  Downsamplers (n in, 1 out)

Copyright © 2001 Stephen A. Edwards All rights reserved Multi-rate SDF System  DAT-to-CD rate converter  Converts a 44.1 kHz sampling rate to 48 kHz Upsampler Downsampler

Copyright © 2001 Stephen A. Edwards All rights reserved Delays  Kahn processes often have an initialization phase  SDF doesn’t allow this because rates are not always constant  Alternative: an SDF system may start with tokens in its buffers  These behave like delays (signal-processing)  Delays are sometimes necessary to avoid deadlock

Copyright © 2001 Stephen A. Edwards All rights reserved Example SDF System  FIR Filter (all single-rate) dup *c dup *c + dup *c + dup *c + + One-cycle delay Duplicate Constant multiply (filter coefficient) Adder

Copyright © 2001 Stephen A. Edwards All rights reserved SDF Scheduling  Schedule can be determined completely before the system runs  Two steps: 1. Establish relative execution rates by solving a system of linear equations 2. Determine periodic schedule by simulating system for a single round

Copyright © 2001 Stephen A. Edwards All rights reserved SDF Scheduling  Goal: a sequence of process firings that  Runs each process at least once in proportion to its rate  Avoids underflow no process fired unless all tokens it consumes are available  Returns the number of tokens in each buffer to their initial state  Result: the schedule can be executed repeatedly without accumulating tokens in buffers

Copyright © 2001 Stephen A. Edwards All rights reserved Calculating Rates  Each arc imposes a constraint b d c a a – 2b = 0 4b – 3d = 0 b – 3c = 0 2c – a = 0 d – 2a = 0 Solution: a = 2c b = 3c d = 4c

Copyright © 2001 Stephen A. Edwards All rights reserved Calculating Rates  Consistent systems have a one-dimensional solution Usually want the smallest integer solution  Inconsistent systems only have the all-zeros solution  Disconnected systems have two- or higher- dimensional solutions

Copyright © 2001 Stephen A. Edwards All rights reserved An Inconsistent System  No way to execute it without an unbounded accumulation of tokens  Only consistent solution is “do nothing” b 1 ca a – c = 0 a – 2b = 0 3b – c = 0 3a – 2c = 0

Copyright © 2001 Stephen A. Edwards All rights reserved An Underconstrained System  Two or more unconnected pieces  Relative rates between pieces undefined ba 1 1 dc 3 2 a – b = 0 3c – 2d = 0

Copyright © 2001 Stephen A. Edwards All rights reserved Consistent Rates Not Enough  A consistent system with no schedule  Rates do not avoid deadlock  Solution here: add a delay on one of the arcs ba

Copyright © 2001 Stephen A. Edwards All rights reserved SDF Scheduling  Fundamental SDF Scheduling Theorem: If rates can be established, any scheduling algorithm that avoids buffer underflow will produce a correct schedule if it exists

Copyright © 2001 Stephen A. Edwards All rights reserved Scheduling Example  Theorem guarantees any valid simulation will produce a schedule b d c a a=2 b=3 c=1 d=4 Possible schedules: BBBCDDDDAA BDBDBCADDA BBDDBDDCAA … many more BC … is not valid

Copyright © 2001 Stephen A. Edwards All rights reserved Scheduling Choices  SDF Scheduling Theorem guarantees a schedule will be found if it exists  Systems often have many possible schedules  How can we use this flexibility? Reduced code size Reduced buffer sizes

Copyright © 2001 Stephen A. Edwards All rights reserved SDF Code Generation  Often done with prewritten blocks  For traditional DSP, handwritten implementation of large functions (e.g., FFT)  One copy of each block’s code made for each appearance in the schedule I.e., no function calls

Copyright © 2001 Stephen A. Edwards All rights reserved Code Generation  In this simple-minded approach, the schedule BBBCDDDDAA would produce code like B; C; D; A;

Copyright © 2001 Stephen A. Edwards All rights reserved Looped Code Generation  Obvious improvement: use loops  Rewrite the schedule in “looped” form: (3 B) C (4 D) (2 A)  Generated code becomes for ( i = 0 ; i < 3; i++) B; C; for ( i = 0 ; i < 4 ; i++) D; for ( i = 0 ; i < 2 ; i++) A;

Copyright © 2001 Stephen A. Edwards All rights reserved Single-Appearance Schedules  Often possible to choose a looped schedule in which each block appears exactly once  Leads to efficient block-structured code Only requires one copy of each block’s code  Does not always exist  Often requires more buffer space than other schedules

Copyright © 2001 Stephen A. Edwards All rights reserved Finding Single-Appearance Schedules  Always exist for acyclic graphs Blocks appear in topological order  For SCCs, look at number of tokens that pass through arc in each period (follows from balance equations)  If there is at least that much delay, the arc does not impose ordering constraints  Idea: no possibility of underflow b 3 2 a 6 a=2 b=3 6 tokens cross the arc delay of 6 is enough

Copyright © 2001 Stephen A. Edwards All rights reserved Finding Single-Appearance Schedules  Recursive strongly-connected component decomposition  Decompose into SCCs  Remove non-constraining arcs  Recurse if possible Removing arcs may break the SCC into two or more

Copyright © 2001 Stephen A. Edwards All rights reserved Minimum-Memory Schedules  Another possible objective  Often increases code size (block-generated code)  Static scheduling makes it possible to exactly predict memory requirements  Simultaneously improving code size, memory requirements, sharing buffers, etc. remain open research problems

Copyright © 2001 Stephen A. Edwards All rights reserved Cyclo-static Dataflow  SDF suffers from requiring each process to produce and consume all tokens in a single firing  Tends to lead to larger buffer requirements  Example: downsampler  Don’t really need to store 8 tokens in the buffer  This process simply discards 7 of them, anyway 8 1

Copyright © 2001 Stephen A. Edwards All rights reserved Cyclo-static Dataflow  Alternative: have periodic, binary firings  Semantics: first firing: consume 1, produce 1  Second through eighth firing: consume 1, produce 0 {1,1,1,1,1,1,1,1}{1,0,0,0,0,0,0,0}

Copyright © 2001 Stephen A. Edwards All rights reserved Cyclo-Static Dataflow  Scheduling is much like SDF  Balance equations establish relative rates as before  Any scheduler that avoids underflow will produce a schedule if one exists  Advantage: even more schedule flexibility  Makes it easier to avoid large buffers  Especially good for hardware implementation: Hardware likes moving single values at a time

Copyright © 2001 Stephen A. Edwards All rights reserved Summary of Dataflow  Processes communicating exclusively through FIFOs  Kahn process networks Blocking read, nonblocking write Deterministic Hard to schedule Parks’ algorithm requires deadlock detection, dynamic buffer-size adjustment

Copyright © 2001 Stephen A. Edwards All rights reserved Summary of Dataflow  Synchronous Dataflow (SDF)  Firing rules: Fixed token consumption/production  Can be scheduled statically Solve balance equations to establish rates Any correct simulation will produce a schedule if one exists  Looped schedules For code generation: implies loops in generated code Recursive SCC Decomposition  CSDF: breaks firing rules into smaller pieces Scheduling problem largely the same