CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing
Previously Computing Requirements SCORE –stream-based computing model –use streams for linking computations instead of shared memory locations expose parallelism freedom of sequential/spatial implementation
Today Streams moderately well developed for –sequential atoms in multithreaded/multiprocessor environment General DF case SDF Expression...thoughts on adapting ideas for SCORE- like execution
General Dataflow case Dataflow graph exposes parallelism Operators enabled as soon as data is available Captures partial ordering for computation Adaptive/tolerant to latencies in system => great for exposing parallelism
General Dataflow Fine-grained –expose maximum parallelism –…but rendevous/presence overhead for every operator Who runs when is unpredictable –variable latencies –variable consumption/production –=> force runtime synchronization/scheduling
General Dataflow What structure to exploit to reduce requirements?
General Dataflow What structure to exploit to reduce requirements? –Spatial operator locality most communication local (sequential) –Operation blocks only do dataflow presence on input to region of code sequential/direct computation of subgraph –all local/deterministic computations in subgraph –Cyclic/predictable dataflow?
Dataflow Multithreading Original DF: –synchronize per instruction Hybrid DF -> TAM –synchronize on remote memory access (msgs) –run scheduling quanta (several instructions) Multithreading –coarse-grain tasks –synchronize on input data –(also locking)
What to watch for With arbitrary I/O rates –unbounded buffering requirements
Synchronous Data Flow Restriction –number of tokens produced/consumed is constant per operator firing –these numbers known at compile time –each edge has predetermined number of initial tokens Consistent –admissible and periodic
SDF: Periodic Periodic –invoke each operator at least once –return to initial state (# tokens on each edge) –can determine by balance equations
SDF: Admissible Admissible –firing sequence not yield deadlock
SDF: Inadmissible
SDF: Admissible
Benefits Periodic schedules Bounded buffer requirements –Acyclic graphs optimal algorithm –Cycle NP-complete heuristic algorithm … close to optimal buffering
SDF Example By Balance Equations –1 A, 2 B, 4 C Firing Sequences: –ABCBCCC –ABCCBCC –ABBCCCC Buffer Costs –5 (AB=2 BC=3) –4 (AB=2 BC=2) –6 (AB=2 BC=4)
Scheduling (min buffer) F= fireable operator D=deferrable(F) = edge has enough tokens to fire sink While (F ) –if ((F-D) ) fire from F-D –else fire operator which increases number of tokens least
Buffer Minimization Repeat –1 A –2 B –4 C F={A}, D= –A F={B}, D= –B F={B,C},D={B} –C F={B,C},D={B} –C F={B}, D= –B
SDF BDF What is SDF missing? –Restricts range of expression –Allows static scheduling
SDF BDF Sufficient Addition:
SDF BDF BDF –SDF + switch and select operators BDF is Turing Complete
Expression: Block Diagram Ptolemy example from Buck’94
Expression: Stream Language Function AveragePairs(D: Signal returns Signal) –stream integer [(D[0]+D[1])/2] || AveragePairs(stream_rest(D)) Ex: Dennis94
Convert to Static Data Flow
Composition of Stream Operators Function Process(D:ImageStream, w:integer returns MarkStream) –let R:=for I in 1,w return array of –FourForThree(AveragePairsD[I])) end for –in PeakDetect(TwoDimFilter(R,w)) –end let end function
Adapting How different?
Adapting How different? –Expensive to change operators –Possibility of spatial pipelining of operators Operator AT Operator copies –Allow dynamic rates… violate fixed firing
SDF: Timeslice Multiples of repetition/firing schedule –valid for acyclic graph –require greater buffering
SDF: Spatial Can realize spatially Repetition/firing schedule –gives relative throughput rates –simple cases => suggest Area-Throughput points
Dynamic Note that adding switch/select gives general, dynamic dataflow Suggests can identify: –static regions (obey SDF restrictions) –dynamic boundaries (where dynamic operators exist) Static schedule static regions Dynamic control at boundary/invocation of static blocks
Dynamic Flow Rates Cannot schedule completely at compile time Use feedback to get expected flow rate –schedule like SDF –track data presence at dynamic boundaries –allow additional buffer space (overflow) –stall slower operator as necessary careful check possible deadlock conditions
Summary Stream datatype captures computational structure –good for spatial implementations –expose parallelism Rich experience in DF/DSP to exploit Static powerful where applicable Can still help schedule “mostly static” cases