Download presentation
Presentation is loading. Please wait.
Published byGabriel Washington Modified over 9 years ago
1
Communication Overhead Estimation on Multicores S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz
2
2 Outline Motivation Multicore trend Stream programming Profiling communication overhead Related works 2
3
3 Motivation # cores/chip Courtesy: Scott’08 C/C++/Java CUDA X10 Peakstream Fortress Accelerator Ct C T M Rstream Rapidmind Stream Programming 3
4
4 Stream Programming Paradigm Programs expressed as stream graphs Streams: Infinite sequence of data elements Actors: Functions applied to streams 4 Actor Stream
5
5 Properties of Stream Program Regular and repeating computation Independent actors with explicit communication Producer / Consumer dependencies 5 Adder Speaker AtoD FMDemod LPF 1 Splitter Joiner LPF 2 LPF 3 HPF 1 HPF 2 HPF 3
6
6 StreamIt Language An implementation of stream prog. Hierarchical structure Each construct has single input/output stream parallel computation may be any StreamIt language construct joiner splitter pipeline feedback loop joiner splitter splitjoin filter 6
7
How to Estimate the Communication Overhead? 7
8
Problems to Measure Communication Overhead Reasons: Multicores are non-communication exposed architecture Complex cache hierarchy Cache coherence protocols Consequence: Cannot directly measure the communication cost Estimate the communication cost by measuring the execution time of actors 8
9
Measuring the Communication Overhead of an Edge 9 ik Processor 1 No communication cost Processor 1 With communication cost Processor 2 ki
10
How to Minimize the Required Number of Experiments 10 A B C 1 2 Pipeline Graph Coloring Requires 2+1 Exps A B C D Processor 1Processor 2 1 2 3 E F 5 4 Even edges across partition Processor 1 A D B C E Processor 2 1 3 2 4 Odd edges across partition
11
Obs. 1: There is no loop of three actors in a stream graph 11 ik l Processor 1Processor 2
12
Obs. 2: There is no interference of adjacent nodes between edges 12 A B CD E F For blue color edges P-1 P-2 P-3 P-4
13
Remove Interference Convert to a line graph Add interference edges Use vertex coloring algorithm 13 A B CD E F AB BC BD CE DE EF Line graph Stream graph AB BC BD CE DE EF
14
Processor Leveling Graph 14 A B CD E F For blue colored edge Processor leveling graph A B, C, D, E F
15
Coloring the Processor Labelling Graph 15 A B, C, D, E F Processor 2Processor 1 A B, C, D, E F A F
16
Measuring the Communication Cost 16 A B CD E F A B, C, D, E F Processor 2Processor 1 For blue colored edge
17
Profiling Performance Benchmark Total EdgeProf StepsSteps/Edge (%)Err (%) SAR443710 MatrixMult88212417 MergeSort3741131 FMRadio2131424 DCT2893214 RadixSort122175 FFT2631227 MPEG56173015 Channel2262711 BeamFormer39513 GM17%15% 17
18
18 Related Works [1] Static Scheduling of SDF Programs for DSP [Lee ‘87] [2] StreamIt: A language for streaming applications [Thies ‘02] [3] Phased Scheduling of Stream Programs [Thies ’03] [4] Exploiting Coarse Grained Task, Data, and Pipeline Parallelism in Stream Programs [Thies ‘06] [5] Orchestrating the Execution of Stream Programs on Cell [Scott ’08] [6] Software Pipelined Execution of Stream Programs on GPUs [Udupa‘09] [7] Synergistic Execution of Stream Programs on Multicores with Accelerators [Udupa ‘09] [8] Orchestration by approximation [Farhad ‘11] 18
19
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.