Download presentation
Presentation is loading. Please wait.
Published byRobert Maxwell Modified over 9 years ago
1
Packet Size optimization for Supporting Coarse-Grained Pipelined Parallelism Wei Du Gagan Agrawal Ohio State University
2
Context: Coarse Grained Pipelined Parallelism Motivating application scenarios Internet data
3
Motivating Application Classes Scientific data analysis solving shallow water equations (SWE) developing Eastern North Pacific Tidal model Data mining k-nearest neighbor search algorithm k-means clustering hot list query Visualization visualizing time-dependent, two-dimensional wake vortex computations Iso-surface rendering Image analysis virtual microscope
4
Ways to Implement Local processing Internet data
5
Ways to Implement Remote processing Internet data
6
Our approach A coarse-grained pipelined execution model is a good match Internet Ways to Implement data
7
Coarse-Grained Pipelined Execution Definition Computations associated with an application are carried out in several stages, which are executed on a pipeline of computing units Increasing/dedicated WAN bandwidths are making this feasible Example — K-nearest Neighbor Given a 3-D range R=, and a point = (a, b, c). We want to find the nearest K neighbors of within R. Range_queryFind the K-nearest neighbors
8
Overview of Our Efforts Language and Compiler Framework for Coarse-Grained Pipelined Parallelism (SC 2003) Language and Compiler Framework for Coarse-Grained Pipelined Parallelism (SC 2003) Reduction Strategies (SC 2003) Reduction Strategies (SC 2003) Support for program adaptation (SC 2004) Support for program adaptation (SC 2004) DataCutter Runtime System (Saltz et al.) DataCutter Runtime System (Saltz et al.) Packet Size Optimization Problem (this talk) Packet Size Optimization Problem (this talk)
9
Roadmap Packet Size Optimization Problem Fixed-frequency Communication Pattern Fixed-size Communication Pattern Experimental Results Conclusion
10
DataCutter Runtime System Ongoing project at OSU / Maryland ( Kurc, Catalyurek, Beynon, Saltz et al) Targets a distributed, heterogeneous environment Allows decomposition of application-specific data processing operations into a set of interacting processes Provides a specific low-level interface filter stream layout & placement filter 1 filter 2 filter 3 stream
11
Problem Definition Problem stage 1 stage 3 stage 5 stage 2 stage 4 stages time 1 3 5 2 4 per- byte cost per- packet cost stages time 1 3 5 2 4 per- byte cost per- packet cost …
12
Problem Definition Problem Related work optimize packet size in a multi-link communication pipeline (Wang et al.) Our challenge: both communication stages and computation stages are involved Volume and/or frequency of communication can vary stage 1 stage 3 stage 5 stage 2 stage 4
13
A Simple Pipeline bottleneck stage : the one keeps busy all the time after it starts working, and before the last packet exits … stages Stage 4 Stage 2 time Stage 1 Stage 3 Stage 5 1 3 5 2 4 TfTf TbTb TlTl per- byte cost per- packet cost T = T f + T b + T l T = T f + T b + T l T f : time the 1 st packet takes to reach the bottleneck stage T f : time the 1 st packet takes to reach the bottleneck stage T b : time all packets spend at the bottleneck stage T b : time all packets spend at the bottleneck stage T l : time the last packet takes after exiting the bottleneck stage T l : time the last packet takes after exiting the bottleneck stage
14
Problem Formulation Key challenge: both frequency and volume of communication can be different across stages Key challenge: both frequency and volume of communication can be different across stages Consider two cases Consider two cases Fixed Frequency: Only size of packet can change Fixed Frequency: Only size of packet can change Fixed Size: Only frequency of communication can change Fixed Size: Only frequency of communication can change
15
α 1 … α i α 1 … α i+2 Communication Patterns Fixed-frequency communication Stage i Stage i+2 B: total size of data to be processed B: total size of data to be processed k: # of packets k: # of packets α i : ratio of input to output for stage i α i : ratio of input to output for stage i α1α1 α1α2α1α2 Stage 1 Stage 3 α 1 … α i-2
16
Fixed-Frequency Communication n : depth of the pipeline g i : per byte cost for stage i G i : per packet cost for stage i cost for a packet of size at stage i : bottleneck stage b α 1 … α i Stage i Stage i+2 b = {i|α 1 … α i-1 g i +G i = max{ α 1 … α j-1 g j +G j, j = 1, …, n}} α 1 … α i-1 α 1 … α i+1 input size
17
Bottleneck at Stage Other than First b-1 …... … … … … stages … …… … … …… … … …… … … …… … … …… … 1 b n b+1 time TfTf TbTb TlTl per- byte cost per- packet cost Fixed-Frequency Communication
18
… … …… … … … … …… … stages time 1 2 n TlTl per- byte cost per- packet cost TbTb … …… … Fixed-frequency Communication 1 st stage bottleneck pipeline
19
Communication Patterns Fixed-size communication Stage i Stage i+2 B: total size of data to be processed B: total size of data to be processed k: # of packets k: # of packets α i : ratio of input to output for stage i α i : ratio of input to output for stage i Stage 1 Stage 3
20
Formulas Fixed-frequency communication b=1b≠1 Fixed-size communication b=1b≠1
21
Experimental Results Experimental setting 4 computation stages Read Sub-setting Local Processing Global Processing Experimental applications Z-buffer based iso-surface rendering Active pixel based iso-surface rendering K-nearest neighbor search algorithm
22
Fixed-frequency Communication KNN-1st ZBUF-2nd
23
Fixed-frequency Communication-ACTP
24
KNN-2nd Fixed-size Communication ZBUF-1st
25
ACTPFixed-size Communication
26
Related work Wang et al. optimize packet size in a multi-link communication pipeline consider computation in the pipeline as well Elsie et al. introduce pipelining for remote storage broker no modification to the original data
27
Conclusion Coarse-grained pipelined parallel execution model is suitable for distributed applications Packet size optimization problem is defined Analytical models are developed for fixed- frequency and fixed-size communication patterns Experimental results show the accuracy of our model
28
Fixed-size Communication Stage 1 Stage 3 n : depth of the pipeline g i : per byte cost for stage i G i : per packet cost for stage i cost for a packet of size at stage i : bottleneck stage b g i +G i
29
b b-1 per- byte cost per- packet cost time stages …... … … … … … …… … n b+1 TfTf … …… … … … … … … … … …… … … …… … TbTb TlTl … Fixed-size Communication non-1 st stage bottleneck pipeline
30
TlTl TbTb 2 1 per- byte cost per- packet cost time stages … n … … … … … … … …… … … …… … …… Fixed-size Communication 1 st stage bottleneck pipeline
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.