Presentation is loading. Please wait.

Presentation is loading. Please wait.

Filter Decomposition for Supporting Coarse-grained Pipelined Parallelism Wei Du, Gagan Agrawal Ohio State University.

Similar presentations


Presentation on theme: "Filter Decomposition for Supporting Coarse-grained Pipelined Parallelism Wei Du, Gagan Agrawal Ohio State University."— Presentation transcript:

1 Filter Decomposition for Supporting Coarse-grained Pipelined Parallelism Wei Du, Gagan Agrawal Ohio State University

2 Distributed Data-Intensive Applications Fast growing datasets Remote data access Distributed data storage More connected world Internet data

3 Requirements: Huge Storage/Powerful Computer/Fast Connection Internet data Implementation: Local processing Internet data

4 Internet data Implementation: Remote processing Requirements: Complex Analysis at Data Centers

5 Our hypothesis Coarse-grained pipelined execution model is a good match Internet A Practical Solution data

6 Coarse-Grained Pipelined Execution Definition Computations associated with an application are carried out in several stages, which are executed on a pipeline of computing units Example — K-nearest Neighbor (KNN) Given a 3-D range R=, and a point p = (a, b, c). We want to find the nearest K neighbors of p within R. Range_queryFind the K-nearest neighbors

7 Challenges Computation associated with an application needs to be decomposed into stages Decomposition decisions are dependent on the execution environment Generating code for each stage (SC03) Other performance issues for the pipelined execution (ICPP04) Adapting to the dynamic execution environment (SC04)

8 RoadMap Filter Decomposition Problem MIN_ONETRIP Algorithm MIN_BOTTLENECK Algorithm MIN_TOTAL Algorithm Experimental Results Related Work Conclusion

9 Filter Decomposition C1C1 C2C2 C m-1 CmCm L1L1 L m-1 computation pipeline f1f1 f2f2 f n-1 fnfn atomic filters f 3 - f 6 fnfn L1L1 C1C1 CmCm C m-1 f1f1 f 1, f 2 C2C2 f n-1 L m-1 f 2, f 3 fnfn L1L1 C1C1 CmCm C m-1 f1f1 f1f1 C2C2 f n-2,f n-1 L m-1

10 Filter Decomposition C1C1 C2C2 C m-1 CmCm L1L1 L m-1 computation pipeline f1f1 f2f2 f n-1 fnfn atomic filters Goal: Find a placement p (f 1,f 2, …, f n ) = (F 1, F 2, …, F m ) where F i = f i1, f i1+1, …, f ik, (1 ≤ i 1,i k ≤ n) such that the predicted execution time is minimal (1≤ i ≤ m).

11 f3f3 f4f4 L1L1 C1C1 C3C3 f1f1 f 1, f 2 C2C2 L2L2 Cost Model Bottleneck stage: b th stage the slowest stage in the pipeline Execution time T = T(C 1 )+T(L 1 )+N*T(C 2 )+T(L 2 )+T(C 3 ) = ∑ i≠b T i + (N-1)*T b

12 Three Algorithms MIN_ONETRIP Algorithm dynamic programming algorithm to minimize ∑T i MIN_BOTTLENECK Algorithm dynamic programming algorithm to minimize T b MIN_TOTAL Algorithm greedy algorithm try to minimize T T = ∑ i≠b T i + (N-1)*T b

13 Filter Decomposition: MIN_ONETRIP C m-2 C m-1 CmCm L m-1 L m-2 f n-1 fnfn fnfn Goal: minimize time spent by one packet on the pipeline

14 C m-2 C m-1 CmCm L m-1 L m-2,…, T[i,j]: min cost of doing computations f 1,…, f i on computing units C 1,…, C j, where the results of f i are on C j. T[i,j] = min T[i-1,j] + Cost_comp(P(C j ),Task(f i )) T[i,j-1] + Cost_comm(B(L j-1 ),Vol(f i )) Goal: T[n,m] Cost: O(mn) Filter Decomposition: MIN_ONETRIP

15 Filter Decomposition: MIN_BOTTLENECK C m-2 C m-1 CmCm L m-1 L m-2 fnfn f1f1 … fnfn f n-1 f1f1 … f n-1 f n f n-2 f1f1 … …… f 2 …f n f1f1 Goal: minimize time spent at the bottleneck stage

16 ,…, N[i,j]: min cost of bottleneck stage for computing f 1,…, f i on computing units C 1,…, C j, where the results of f i are on C j. Cost: O(mn 2 ) N[i,j] = min max{ N[i,j-1], Cost_comm (B(L j-1 ),Vol(f i )) } … max{ N[i-1,j-1], Cost_comm (B(L j-1 ),Vol(f i-1 )), Cost_Comp (P(C j ),Task(f i )) } max{ N[1,j-1], Cost_comm (B(L j-1 ),Vol(f 1 )), Cost_Comp (P(C j ), Task(f 2 ) + … + Task(f i )) } Filter Decomposition: MIN_BOTTLENECK

17 C1C1 C2C2 C3C3 C4C4 L1L1 L3L3 L2L2 f1f1 f2f2 f3f3 f4f4 f5f5 L1L1 C1C1 C3C3 C4C4 C2C2 f1f1 f 1 : T 1 Estimated Cost f 1, f 2 f 1, f 2 : T 2 f 1 - f 3 : T 3 f 1 - f 4 : T 4 Min{T 1 … T 4 } = T 2 To minimize the predicted execution time T Filter Decomposition: MIN_BOTTLENECK

18 RoadMap Filter Decomposition Problem MIN_ONETRIP Algorithm MIN_BOTTLENECK Algorithm MIN_TOTAL Algorithm Experimental Results Related Work Conclusion

19 Experimental Results 4 Configurations 3 Applications Virtual Microscope Iso-Surface Rendering 111 11 111 0.10.5 110.01 10.001 0.110.01 10.001

20 Used Applications Virtual Microscope (Vmscope) an emulation of a microscope input: a rectangular region, a resolution value output: portion of the original image with certain resolution

21 Experimental Results: Virtual Microscope 3 queries Q1 : 1 packet Q2 : 4 packets Q3 : 4500 packets 4 Algorithms MIN_ONETRIP MIN_BOTTLENECK MIN_TOTAL Exhaustive_Search

22 Execution Time (in ms) Application Execution Time (in ms) Application Experimental Results: Virtual Microscope

23 Two observations The performance variance between different algorithms is small The Exha_Search does not always give the best placement characteristics based on one packet information combining two filters as one, saving copying cost Experimental Results: Virtual Microscope

24 Iso-surface rendering (Iso) input: a 3-D grid, a scalar value, a view screen with angle specified output: a surface seen from certain angle, which captures points in the grid whose scalar value matches the given iso-surface value Used Applications

25 Experimental Results: Iso 2 Implementations ZBUF ACTP 2 Datasets small : 3 packets large : 47 packets 4 Algorithms MIN_ONETRIP MIN_BOTTLENECK MIN_TOTAL Exhaustive_Search

26 Execution Time (in ms) Application Execution Time (in ms) Application Small dataset Large dataset Experimental Results: Iso

27 The MIN_TOTAL algorithm gives the best placement for small dataset The MIN_ONETRIP algorithm finds the best placement for large dataset This application is very data-dependent ! Experimental Results: Iso

28 Execution Time (in ms) Number of Runs Execution Time (in ms) Number of Runs ZBUF ACTP Experimental Results: Iso

29 Conclusion & Future Work Our algorithms perform quite well Future Work To find more accurate characteristics of applications estimate of the performance change resulting from combining multiple atomic filters estimate of the impact of data dependence

30 Thank you !!!


Download ppt "Filter Decomposition for Supporting Coarse-grained Pipelined Parallelism Wei Du, Gagan Agrawal Ohio State University."

Similar presentations


Ads by Google