Download presentation
Presentation is loading. Please wait.
Published byEmmeline Goodwin Modified over 9 years ago
1
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney
2
Parallel Programming Gap Uniprocessor hits the physical limit Multicore, many core systems Programming challenges now Multiple flows of control and memories Finding algorithm that can run in parallel Correctness of the program Synchronization Issues Debugging the program 2
3
3 Stream Programming Paradigm General purpose programming paradigm Suitable for high performance applications Inherent parallelism Two types of elements in a program: Actors: computation elements Streams: channels to ship data Stream graph: composition of actors and streams Input and output: stream Actor Input channels Output channels
4
StreamIt Language StreamIt: stream programming language Architecture independent Modular parallel computation may be any StreamIt language construct joiner splitter pipeline feedback loop joiner splitter splitjoin filter 4
5
5 A Simple Filter in StreamIt int->int filter A { init { // empty } work pop 1 peek 2 push 1 { push(peek(0) + peek(1)); pop(); } A
6
StreamIt Example float->float pipeline Example() { add A(); add B(); add splitjoin { split duplicate; add D(); add E(); join roundrobin; } add G(); add H(); } 6 E F A B C split join D z 1 11 1 11 1 1
7
Research Question? 7 G H A B D split join E z 1 11 1 11 1 1 ? Parallel System Proc-1 Proc-2 Proc-N
8
8 8 Static Translation of Stream Programs We propose Novel steady state analysis describing the output bandwidth of an actor depending on the input of the stream program Quantitative analysis to resolve bottlenecks in stream programs Algorithm for mapping actors to parallel system Scheduling mapped actors to processor Our goal To statically optimize the throughput of a stream program on a parallel system
9
99 Overview of the Proposed System Finding a Closed Form for Steady State Resolve Bottleneck Find Mapping by ILP or Approximation Schedule Actors for Processors
10
Bandwidth Transfer Model Synchronous model Fundamental assumption: Actors deliver constant output bandwidth given a constant input bandwidth Output bandwidth of actor is normalized (scaled btw. 0..1) Bandwidth of channel (i,j): output bandwidth of actor i multiplied with weight w ij 10 i j w ij x i k Finding a Closed Form for Steady State
11
Simultaneous Linear Functional Equations System Bandwidth transfer functions: Resulting in a simultaneous functional equation system: 11 Finding a Closed Form for Steady State
12
Solving Equation System Solve by Gaussian-Style equation solver Bandwidth variables are successively eliminated Substitution Loop-breaking Solution of the system is a linear function for each actor, which does not depend on the predecessors depends only on the input bandwidth “z” Finding a Closed Form for Steady State 12
13
13 Mapping Actors to Processors An NP-hard problem Takes too long time Approximation algorithm Much faster O(n log n) Non-optimal solution We formulate Integer Linear Programming problem considering Input, output and processing bandwidths of each actor Processing capacity of each processor Inter processor communication bandwidths P1 P2 P3 G H A B D split join E z 1.0 0.5 1.0 Find Mapping by ILP or Approximation
14
Find Optimal Solution by Binary Search Developing a test for a given input bandwidth “z” of stream program Binary Search If solution is not feasible at the mid point, new upper bound for z If solution is feasible at the mid point, new lower bound for z 0 z sys 1 Solution space mid left right 14 Find Mapping by ILP or Approximation
15
Test for a given z: Simple Model is a 0-1 binary variable 1 ≤ i ≤ n, 1 ≤ j ≤ p for all i, 1 ≤ i ≤ n …… (1) for all j, 1 ≤ j ≤ p …… (2) where 15 Find Mapping by ILP or Approximation
16
Bottleneck Analysis Constraints limiting input bandwidth of the whole program Input, output and processing bandwidth constraints Bottleneck-free if System BW < Actor BW If System BW > Actor BW then there is a bottleneck in the system 16 Resolve Bottleneck 0 System BW 1 < Actor BW
17
Region Duplication 17 Resolve Bottleneck
18
18 Overview of the Proposed System Finding a Closed Form for Steady State Resolve Bottleneck Find Mapping by ILP or Approximation Schedule Actors for Processors
19
19 Example Program
20
Mapping to Processors 20
21
21 Execution Time of CPLEX
22
Execution Time of an Approx. Algorithm 22
23
23 Optimal vs. Approximation
24
Summary Synchronous model for stream programs Novel steady state model Statically optimize the throughput of stream programs Resolving bottleneck by simple quantitative analysis Finding an approximation for the mapping problem 24
25
Related Works [1] Static Scheduling of SDF Programs for DSP [Lee ‘87] [2] StreamIt: A language for streaming applications [Thies ‘02] [3] Phased Scheduling of Stream Programs [Thies ’03] [4] Exploiting Coarse Grained Task, Data, and Pipeline Parallelism in Stream Programs [Thies ‘06] [5] Orchestrating the Execution of Stream Programs on Cell [Scott ’08] [6] Software Pipelined Execution of Stream Programs on GPUs [Udupa‘09] [7] Synergistic Execution of Stream Programs on Multicores with Accelerators [Udupa ‘09] 25
26
Future Plan to Complete YearDurationMilestones 20092 monthsPrepare for publication: Solving SLFE system, ILP model, Approx. Algorithm, Bottleneck resolving technique 2010 3 months Design approximation/heuristic algorithm for actor placement with communication constraints between processing elements 4 monthsExperiment on different benchmarks and compare the results 3 monthsInvestigate new scheduling techniques for stream programs 2 monthsFurther publications covering both topics (at least 2) 2011 4 monthsDesign a small stream programming language that uses our proposed technique 6 monthsImplement the small stream programming language that uses our proposed technique 2 monthsFurther publications 20123 monthsWrite up PhD thesis and further publication 26
27
Questions? 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.