Download presentation
Presentation is loading. Please wait.
Published byGeorgiana Thomas Modified over 9 years ago
1
StreamX10: A Stream Programming Framework on X10 Haitao Wei 2012-06-14 School of Computer Science at Huazhong University of Sci&Tech
2
2 Outline Introduction and Background 1 COStream Programming Language 2 Stream Compilation on X10 3 Experiments 4 Conclusion and Future Work 5
3
Background and motivition Stream Programming A high level programming model that has been productively applied Usually, depends on the specific architectures which makes it difficult to port between different platforms X10 a productive parallel programming environment isolates the different architecture details provides a flexible parallel programming abstract layer for stream programming StreamX10 : try to make the stream program portable based on X10
4
4 Outline Introduction and Background 1 COStream Programming Language 2 Stream Compilation on X10 3 Experiments 4 Conclusion and Future Work 5
5
COStream Language 5 stream FIFO queue connecting operators operator Basic func unit—actor node in stream graph Multiple inputs and multiple outputs Window –like pop , peek , push operations Init and work function composite Connected operators—subgraph of actors A stream program is composed of composites
6
COStream and Stream Graph 6 Composite Main{ graph stream S = Source(){ state :{ int x;} init :{x=0;} work :{ S[0].i = x; x++; } window S:tumbling,count(1); } streamit P = MyOp(S){ param pn:N } () as SinkOp = Sink(P){ state :{int r;} work :{ r = P[0].j; println(r); } window P : tumbling,count(1); } Composite MyOp(output Out ; input In){ param attribute:pn graph stream Out = Averager(In){ work :{ int sum=0,i; for(i=0;i<pn;i++) sum += In[i],j; Out[0].j = (sum/pn); } window In: sliding,count(10),count(1); Out:tumbling,count(1); } stream operator composite Source Sink Averager push=1 peek=10 pop=1 push=1 pop=1 SP
7
7 Outline Introduction and Background 1 COStream Programming Language 2 Stream Compilation on X10 3 Experiments 4 Conclusion and Future Work 5
8
Compilation flow of StreamX10 PhraseFunction Front-end Translates the COStream syntax into abstract syntax tree. Instantiation Instantiates the composites hierarchically to static flattened operators. Static Stream Graph Constructs static stream graph from flattened operators. Scheduling Calculates initialization and steady-state execution orderings of operators. Partitioning Performs partitioning based on X10 parallelism models for load balance. Code Generation Generates X10 code for COStream programs.
9
The Execution Framework 9 The node is partitioned between the places Each node is mapped to an activity The nodes use the pipeline fashion to exploit the parallelisms The local and Global FIFO buffer are used
10
Work Partition Inter-place 10 Objective:Minimized Communication and Load Balance (Using Metis) 10 2 2 2 2 2 2 2 2 2 2 1 5 5 5 5 5 5 1 Comp. work=10 Speedup : 30/10 =3 Communication : 2
11
Global FIFO implementation 11 Each Producer/Consumer has its own local buffer the producer uses push operation to store the data to the local buffer The consumer uses peek/pop operation to fetch data from the local buffer When the local buffer is full/empty is data will be copied automatically
12
X10 code in the Back-end 12 Spawn activities for each node at place according to the partition Call the work function in initial and steady schedule Define the work function
13
13 Outline Introduction and Background 1 COStream Programming Language 2 Stream Compilation on X10 3 Experiments 4 Conclusion and Future Work 5
14
Experimental Platform and Benchmarks 14 Platform Intel Xeon processor (8 cores ) 2.4 GHZ with 4GB memory Radhat EL5 with Linux 2.6.18 X10 compiler and runtime used are 2.2.0 Benchmarks Rewrite 11 benchmarks from StreamIt
15
The throughputs comparison 15 Throughputs of 4 different configurations (NPLACE*NTHREAD=8) Normalized to 1 place with 8 threads for most benchmarks, CPU utilization increases from 24% to 89%,when places varies from 1 to 4, except for the benchmark with low computation/communication ratio benefits are little or worse when the number of places increases from 4 to 8
16
Observation and Analysis 16 The throughput goes up when the number of places increases. This is because that multiple places increase the CPU utilization Multiple places show parallelism but also bring more communication overhead Benchmarks with more computation workload like DES and Serpent_full can still benefit form the number of places increasing
17
17 Outline Introduction and Background 1 COStream Programming Language 2 Stream Compilation on X10 3 Experiments 4 Conclusion and Future Work 5
18
Conclusion We proposed and implemented StreamX10, a stream programming language and compilation system on X10 A raw partitioning optimization is proposed to exploit the parallelisms based on X10 execution model Preliminary experiment is conducted to study the performance 18
19
Future Work How to choose the best configuration (# of places and # of threads) automatically for each benchmark How to decrease the thread switching overhead by mapping multiple nodes to the single activity 19
20
Acknowledgment X10 Innovation Award founding support QiMing Teng, Haibo Lin and David P. Grove at IBM for their help on this research 20
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.