StreamX10: A Stream Programming Framework on X10 Haitao Wei 2012-06-14 School of Computer Science at Huazhong University of Sci&Tech.

StreamX10: A Stream Programming Framework on X10 Haitao Wei 2012-06-14 School of Computer Science at Huazhong University of Sci&Tech

2 Outline Introduction and Background 1 COStream Programming Language 2 Stream Compilation on X10 3 Experiments 4 Conclusion and Future Work 5

Background and motivition  Stream Programming A high level programming model that has been productively applied Usually, depends on the specific architectures which makes it difficult to port between different platforms  X10 a productive parallel programming environment isolates the different architecture details provides a flexible parallel programming abstract layer for stream programming  StreamX10 ： try to make the stream program portable based on X10

COStream Language 5  stream FIFO queue connecting operators  operator Basic func unit—actor node in stream graph Multiple inputs and multiple outputs Window –like pop ， peek ， push operations Init and work function  composite Connected operators—subgraph of actors A stream program is composed of composites

COStream and Stream Graph 6 Composite Main{ graph stream S = Source(){ state :{ int x;} init :{x=0;} work :{ S[0].i = x; x++; } window S:tumbling,count(1); } streamit P = MyOp(S){ param pn:N } () as SinkOp = Sink(P){ state :{int r;} work :{ r = P[0].j; println(r); } window P ： tumbling,count(1); } Composite MyOp(output Out ; input In){ param attribute:pn graph stream Out = Averager(In){ work :{ int sum=0,i; for(i=0;i<pn;i++) sum += In[i],j; Out[0].j = (sum/pn); } window In: sliding,count(10),count(1); Out:tumbling,count(1); } stream operator composite Source Sink Averager push=1 peek=10 pop=1 push=1 pop=1 SP

Compilation flow of StreamX10 PhraseFunction Front-end Translates the COStream syntax into abstract syntax tree. Instantiation Instantiates the composites hierarchically to static flattened operators. Static Stream Graph Constructs static stream graph from flattened operators. Scheduling Calculates initialization and steady-state execution orderings of operators. Partitioning Performs partitioning based on X10 parallelism models for load balance. Code Generation Generates X10 code for COStream programs.

The Execution Framework 9 The node is partitioned between the places Each node is mapped to an activity The nodes use the pipeline fashion to exploit the parallelisms The local and Global FIFO buffer are used

Work Partition Inter-place 10 Objective:Minimized Communication and Load Balance (Using Metis) 10 2 2 2 2 2 2 2 2 2 2 1 5 5 5 5 5 5 1 Comp. work=10 Speedup ： 30/10 =3 Communication ： 2

Global FIFO implementation 11 Each Producer/Consumer has its own local buffer the producer uses push operation to store the data to the local buffer The consumer uses peek/pop operation to fetch data from the local buffer When the local buffer is full/empty is data will be copied automatically

X10 code in the Back-end 12 Spawn activities for each node at place according to the partition Call the work function in initial and steady schedule Define the work function

Experimental Platform and Benchmarks 14  Platform Intel Xeon processor (8 cores ) 2.4 GHZ with 4GB memory Radhat EL5 with Linux 2.6.18 X10 compiler and runtime used are 2.2.0  Benchmarks Rewrite 11 benchmarks from StreamIt

The throughputs comparison 15  Throughputs of 4 different configurations (NPLACE*NTHREAD=8)  Normalized to 1 place with 8 threads for most benchmarks, CPU utilization increases from 24% to 89%,when places varies from 1 to 4, except for the benchmark with low computation/communication ratio benefits are little or worse when the number of places increases from 4 to 8

Observation and Analysis 16  The throughput goes up when the number of places increases. This is because that multiple places increase the CPU utilization  Multiple places show parallelism but also bring more communication overhead  Benchmarks with more computation workload like DES and Serpent_full can still benefit form the number of places increasing

Conclusion  We proposed and implemented StreamX10, a stream programming language and compilation system on X10  A raw partitioning optimization is proposed to exploit the parallelisms based on X10 execution model  Preliminary experiment is conducted to study the performance 18

Future Work  How to choose the best configuration (# of places and # of threads) automatically for each benchmark  How to decrease the thread switching overhead by mapping multiple nodes to the single activity 19

Acknowledgment  X10 Innovation Award founding support  QiMing Teng, Haibo Lin and David P. Grove at IBM for their help on this research 20

StreamX10: A Stream Programming Framework on X10 Haitao Wei 2012-06-14 School of Computer Science at Huazhong University of Sci&Tech.

Similar presentations

Presentation on theme: "StreamX10: A Stream Programming Framework on X10 Haitao Wei 2012-06-14 School of Computer Science at Huazhong University of Sci&Tech."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

StreamX10: A Stream Programming Framework on X10 Haitao Wei 2012-06-14 School of Computer Science at Huazhong University of Sci&Tech.

Similar presentations

Presentation on theme: "StreamX10: A Stream Programming Framework on X10 Haitao Wei 2012-06-14 School of Computer Science at Huazhong University of Sci&Tech."— Presentation transcript:

Similar presentations

About project

Feedback