Presentation is loading. Please wait.

Presentation is loading. Please wait.

Communication Overhead Estimation on Multicores S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz.

Similar presentations


Presentation on theme: "Communication Overhead Estimation on Multicores S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz."— Presentation transcript:

1 Communication Overhead Estimation on Multicores S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz

2 2 Outline Motivation  Multicore trend  Stream programming Profiling communication overhead Related works 2

3 3 Motivation # cores/chip Courtesy: Scott’08 C/C++/Java CUDA X10 Peakstream Fortress Accelerator Ct C T M Rstream Rapidmind Stream Programming 3

4 4 Stream Programming Paradigm Programs expressed as stream graphs  Streams: Infinite sequence of data elements  Actors: Functions applied to streams 4 Actor Stream

5 5 Properties of Stream Program Regular and repeating computation Independent actors with explicit communication  Producer / Consumer dependencies 5 Adder Speaker AtoD FMDemod LPF 1 Splitter Joiner LPF 2 LPF 3 HPF 1 HPF 2 HPF 3

6 6 StreamIt Language An implementation of stream prog. Hierarchical structure Each construct has single input/output stream parallel computation may be any StreamIt language construct joiner splitter pipeline feedback loop joiner splitter splitjoin filter 6

7 How to Estimate the Communication Overhead? 7

8 Problems to Measure Communication Overhead Reasons:  Multicores are non-communication exposed architecture  Complex cache hierarchy  Cache coherence protocols Consequence:  Cannot directly measure the communication cost  Estimate the communication cost by measuring the execution time of actors 8

9 Measuring the Communication Overhead of an Edge 9 ik Processor 1 No communication cost Processor 1 With communication cost Processor 2 ki

10 How to Minimize the Required Number of Experiments 10 A B C 1 2 Pipeline Graph Coloring Requires 2+1 Exps A B C D Processor 1Processor 2 1 2 3 E F 5 4 Even edges across partition Processor 1 A D B C E Processor 2 1 3 2 4 Odd edges across partition

11 Obs. 1: There is no loop of three actors in a stream graph 11 ik l Processor 1Processor 2

12 Obs. 2: There is no interference of adjacent nodes between edges 12 A B CD E F For blue color edges P-1 P-2 P-3 P-4

13 Remove Interference Convert to a line graph Add interference edges Use vertex coloring algorithm 13 A B CD E F AB BC BD CE DE EF Line graph Stream graph AB BC BD CE DE EF

14 Processor Leveling Graph 14 A B CD E F For blue colored edge Processor leveling graph A B, C, D, E F

15 Coloring the Processor Labelling Graph 15 A B, C, D, E F Processor 2Processor 1 A B, C, D, E F A F

16 Measuring the Communication Cost 16 A B CD E F A B, C, D, E F Processor 2Processor 1 For blue colored edge

17 Profiling Performance Benchmark Total EdgeProf StepsSteps/Edge (%)Err (%) SAR443710 MatrixMult88212417 MergeSort3741131 FMRadio2131424 DCT2893214 RadixSort122175 FFT2631227 MPEG56173015 Channel2262711 BeamFormer39513 GM17%15% 17

18 18 Related Works [1] Static Scheduling of SDF Programs for DSP [Lee ‘87] [2] StreamIt: A language for streaming applications [Thies ‘02] [3] Phased Scheduling of Stream Programs [Thies ’03] [4] Exploiting Coarse Grained Task, Data, and Pipeline Parallelism in Stream Programs [Thies ‘06] [5] Orchestrating the Execution of Stream Programs on Cell [Scott ’08] [6] Software Pipelined Execution of Stream Programs on GPUs [Udupa‘09] [7] Synergistic Execution of Stream Programs on Multicores with Accelerators [Udupa ‘09] [8] Orchestration by approximation [Farhad ‘11] 18

19 Questions?


Download ppt "Communication Overhead Estimation on Multicores S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz."

Similar presentations


Ads by Google