S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz

Name: S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz
Uploaded: 2017-07-21T17:09:21+00:00
Duration: PTM17S54
Channel: Alexander O’Neal’
Description: S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing
S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz Programming Language Group School of Information Technology University of Sydney

Abstract Synchronous data flow (SDF) differs from traditional data flow The schedule of SDF nodes can be done at compile time (statically) Contribution of this paper: Develop theory for static scheduling of SDF programs on single or multiple processors - Synchronous data flow differs from traditional data flow in that the amount of data produced and consumed by a data flow node is specified a priori for each input and output

Introduction Need to depart from the simplicity of von Neumann computer architecture Programming signal processors using large grain data flow languages [W. B. Ackerman 82] Ease the programming Enhancing the modularity of code Describe algorithms in more naturally Concurrency is immediately evident from program description - Concurrency is immediately evident from program description so parallel hardware can be used more effectively

Data Flow Analysis [W. B. Ackerman 82]
P = X + Y Q = P/Y R = X*P S = R – Q T = R*P RESULT = S/T Many of these instructions can run in parallel as long as some constraints are met These constraints can be represented by a graph Node represents instructions Arrow between nodes represents constraints So, the permissible computation sequence can be for example (1, 3, 5, 2, 4, 6), (1, 2, 3, 5, 4, 6) and others. - An arrow from one instruction to another means that the second may not be executed until the first has been completed

Sequencing Constraints
(1) P = X + Y (3) R = X*P (2) Q = P/Y (4) S = R - Q (5) T = R*P (6) RESULT = S/T

The Data Flow Paradigm A program is divided into pieces (nodes or blocks) which can execute whenever input data are available An algorithm can be described as data flow graph Node representing function Arc representing data paths Signal processing algorithms can also be described as data flow graph Node is atomic or non-atomic function Arc is signal path

The Data Flow Paradigm Contd.
The complexity of the functions (granularity) will determine the amount of parallelism available No attempt to exploit concurrency inside a block The functions within the blocks can be specified using von Neumann programming techniques The blocks can themselves represent another data flow graph (hierarchical) LGDF is ideally suited for signal processing

Synchronous Data Flow Graphs
A block is invoked when input available When it invoked it consumes a fixed number of input samples on each input path and produces fixed number of output samples A block is synchronous if we can specify a priori its input and output samples when it is invoked Assuming that the signal processing system repetitively apply an algorithm to an infinite sequence of data

A synchronous data flow graph
B A C b c e j d f g h i SDF graph requires buffering the data samples passed between blocks and schedule blocks when data are available (static approach) This could be done dynamically (runtime supervisor, costly approach)

A synchronous data flow graph
SDF graphs can be scheduled statically (at compile time) regardless of the number of processors No need to have dynamic control Communication between nodes and processors is set up by the compiler so no runtime control Thus the LGDF paradigm gives the programmer a natural way for programming with evident concurrency

Scheduling an SDF graph
Schedule blocks onto processors in such a way that data is available during its invocation Assumptions The SDF graph is non terminating (without dead lock) The SDF graph is connected Goal is to find a periodic admissible parallel schedule (PAPS also PASS) Non terminating is natural for signal processing if SDF is not connected then each separate graphs can be scheduled separately using subsets of the processors

Construction of a PASS Topology matrix e c 1 2 1 i d 2 3 g f 3
Connection to the outside world is ignored Correctly constructed self loop has equal amount of data produced and consumed so the net difference will be zero This matrix need not be square in general

Construction of a PASS Replace each arc with FIFO queue to pass data from one block to another (vary) Vector b(n) contains the queue sizes of all the buffers at time n For sequential schedule only one block can be invoked at a time v(n) is the vector of blocks invoked at time n

Construction of a PASS 2 1 3 D 2D The change in the buffer size caused by invoking a node is A unit delay on an arc from A to B means that a n-th sample consumed by node B is (n-1)-th sample produced by node A So the first sample consumed by destination block is not produced by the source (part of initial state of arc buffer)

Construction of a PASS 2 1 3 D 2D Because of this initial condition block 2 can be invoked once and block 3 can be invoked twice before block 1 is invoked at all Delay therefore affect the way the system starts up

Construction of a PASS Given this computation model (eqn. 1 - 4)
Find necessary and sufficient conditions for existing a PASS, and hence a PAPS Find practical algorithms that provably finds a PASS if one exists Find a practical algorithms that construct reasonable (not necessarily optimal) PAPS, if a PASS exists

Necessary condition for existing a PASS
Where s is the number of nodes or blocks in the graph Definition 1: an admissible sequential schedule is a non-empty ordered list of nodes such that if the nodes are executed in sequence given by , the amount of buffer will remain non negative and bounded. Each node must appear in at least once

Quick reminder of rank of a matrix
The rank of a matrix is the maximum number of independent rows The rank can be calculated by gaussian elimination algorithm 2nd column is the twice of column 1

1 Theorem 1: For a connected SDF graph with s nodes and topology matrix Γ, rank (Γ) = s -1 is a necessary condition for a PASS to exist. PASS of period p (3)=> b(p) = b(0) + Γq where 2 2 1 1 1 2 1 2 3 1 3 q vector tells us how many times we should invoke each node in one period of a PASS after a period the buffer will end up once again in their initial state

Since the PASS is periodic, we can write Since the PASS is admissible, the buffers must remain bounded, by definition 1. The buffers remain bounded if and only if where O is a vector full of zeros For q ≠ O, this implies that rank (Γ) < s where s is the dimension of q. But rank (Γ) can be either s or s – 1, and so it must be s – 1 [Lemma 3]

1 Theorem 1 indicates that if we have a SDF graph with a topology matrix of rank s, then the graph is somehow defective and no PASS can be found for it 2 1 1 1 1 2 1 2 3 1 3 Any schedule for this graph will result either in deadlock or unbounded buffer sizes The rank of the topology matrix indicates s sample rate inconsistency in the graph

Theorem 2: For a connected SDF graph with s nodes and topology matrix Γ, and with rank(Γ) = s – 1, we can find a positive integer vector q ≠ O such that Γq = O where O is the zero vector. Definition 2: A predecessor to a node x is a node feeding data to x.

Definition 3: (Class S algorithm) Given a positive integer vector q such that Γq = O and an initial state for the buffers b(0), the ith node is runnable at a given time if it has not been run times and running it will not cause a buffer size to go negative. A class S algorithm is any algorithm that schedules a node if it is runnable, updates b(n) and stops only when no more nodes are runnable. If class S algorithms terminates before it has scheduled each node the number of times specified in the q vector, then it is said to be deadlocked.

Theorem 3: Given a SDF graph with topology matrix Γ and given a positive integer vector q s.t. Γq = O, if a PASS of period p = exists, where is a row vector full of ones, any class S algorithm will find such a PASS.

1 1 1 1 1 1 D 2 2 - Networks with insufficient delays in directed loops are not computable 1 1 2 2 (a) (b) Two SDF graph with consistent sample rates but no admissible schedule

Theorem 4: Given a SDF graph with topology matrix Γ and given a positive integer vector q s.t. Γq = O, a PASS of period p = exists if and only if a PASS of period Np exists for any integer N. Theorem 4 tells us that it does not matter what positive integer vector we use from the null space of the topology matrix, so we can simplify our system by using the smallest such vector, thus obtaining a PASS with minimum period.

Class S algorithm given the theorems
Solve for the smallest positive integer vector Form an arbitrary ordered list L of all nodes in the system For each , schedule if it is runnable, trying each node once If each node has been scheduled times, STOP If no node in L can be scheduled, indicate a deadlock Else goto 3 and repeate

Constructing a PAPS If a workable schedule for a single processor can be generated then a workable schedule for a multiprocessor system can also be generated First step is to construct an acyclic precedence graph for J period of the PASS by class S algorithm

Construct an acyclic precedence graph by example
2 1 This graph is neither acyclic nor a precedence graph Possible minimum PASS is {1, 3, 1, 3}, {3, 1, 1, 2} or {1, 1, 3, 2} each with period 4. {2, 1, 3, 1} not a PASS because node 2 is not immediately runnable 3 1 1 D 2 2 1

Construct an acyclic precedence graph
1 1 2 3 2 1 1 1 3 2 3 1 J=1 J=2

Next step constructing a parallel schedule
By critical path method [Adam 74] or by Hu-level scheduling algorithm [T. C. Hu 61] A level is determined for each node in the acyclic precedence graph, where the level of a given node is the worst case of the total of the runtimes of nodes on a graph from the given node to a terminal node of the graph The terminal node is a node with no successor If there is no terminal node then one can be created with zero runtime

Hu-level scheduling algorithm
6 5 3 1 3 2 1 2 3 2 6 1 3 1 3 1 2 3 3 2 6 3 3 1 J=1 J=2

Constructing a parallel schedule
Hu-level scheduling algorithm simply schedules available nodes with the highest level first When there are more than available nodes with the same highest level than there are processors, a reasonable heuristic is to schedule the ones with the longest runtime first

Constructing a parallel schedule
3 3 1 3 PROC 1 PROC 1 PROC 2 1 1 2 PROC 2 1 1 2 1 2 J=1 J=2 Two processors, runtime of nodes 1,2,3 are 1, 2,3 time units respectively

Limitations of Model Do not greater scale conditional control flow like general purpose languages Asynchronous graphs Connecting to the outside world Data dependent runtime of blocks

Summary This paper describes the theory necessary to develop a signal processing programming methodology that offers Programmer convenience Natural way to describe signal processing Readily use the available concurrency

Question? Thank you

S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz

Similar presentations

Presentation on theme: "S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz

Similar presentations

Presentation on theme: "S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz"— Presentation transcript:

Similar presentations

About project

Feedback