Download presentation
Presentation is loading. Please wait.
1
On-line adaptive parallel prefix computation Jean-Louis Roch, Daouda Traoré and Julien Bernard Presented by Andreas Söderström, ITN
2
The prefix problem Given X = x 1,x 2,…,x n compute the n products π k =x 0 о x 1 о … ο x k for 1 ≤ k ≤ n where ο is some associative operation Given X = x 1,x 2,…,x n compute the n products π k =x 0 о x 1 о … ο x k for 1 ≤ k ≤ n where ο is some associative operation Example: o = + (i.e. addition) X = 1,3,5,7 π 1 = 1 π 2 = 1+3 = 4 π 3 = 1+3+5 = 9 π 4 = 1+3+5+7 = 16 Example: o = + (i.e. addition) X = 1,3,5,7 π 1 = 1 π 2 = 1+3 = 4 π 3 = 1+3+5 = 9 π 4 = 1+3+5+7 = 16
3
Parallel prefix sum (first pass) 12345678 371115 1026 36 Step 0 Step 1 Step 2 Step 3
4
Parallel prefix sum (second pass) For every even position use the value of the parent node For every even position use the value of the parent node For evey odd position p n compute p n-1 + p n For evey odd position p n compute p n-1 + p n Step 0 Step 1 Step 2 Step 3 12345678 371115 1026 36 1021 310213661528
5
Parallel prefix computation Parallel time: 2n/p + O(log n) for p < n/(log n) Parallel time: 2n/p + O(log n) for p < n/(log n) Lower bound for parallel time: 2n/(p+1) for n > p(p+1)/2 Lower bound for parallel time: 2n/(p+1) for n > p(p+1)/2 Assumes identical processors! Assumes identical processors!
6
Parallel prefix computation Potential practical problems: Potential practical problems: Processor setup may be heterogenous Processor setup may be heterogenous Processor load may vary due to other users computing on the same machine Processor load may vary due to other users computing on the same machine Off-line optimal scheduling potentially not optimal anymore! Off-line optimal scheduling potentially not optimal anymore! Solution: Solution: Use on-line scheduling! Use on-line scheduling!
7
The basic idea Combine a sequentially optimal algorithm with fine-grained parallellism using work stealing Combine a sequentially optimal algorithm with fine-grained parallellism using work stealing P0P1Pn … P2 Steal work
8
The algorithm Sequential process P s : The sequential process P s starts working on [π 1, π k ], i.e. value indices [1,k] where indices [k+1,m] has been stolen The sequential process P s starts working on [π 1, π k ], i.e. value indices [1,k] where indices [k+1,m] has been stolen When P s reaches the index k it communicates π k to the parallel process P v that has stolen [k+1,m] and recoveres the last index n computed by P v together with the local prefix result r n When P s reaches the index k it communicates π k to the parallel process P v that has stolen [k+1,m] and recoveres the last index n computed by P v together with the local prefix result r n P s uses associativity to calculate π n+1 = π k o r n and continues with the computation from index n+1 P s uses associativity to calculate π n+1 = π k o r n and continues with the computation from index n+1
9
The algorithm Parallel process P v P v scans for active processes (can be P s or another P v ) and steals part of the work from that process. P v scans for active processes (can be P s or another P v ) and steals part of the work from that process. P v computes the local prefix operation on the stolen interval P v computes the local prefix operation on the stolen interval The computation of P v depends on a previous value and need to be finalized when that value is known The computation of P v depends on a previous value and need to be finalized when that value is known
10
The algorithm P0 P1 P2 123 456789101112 13141516 Result Jump Finalize Stealable
11
Performance If a processor is or becomes slow part of its work can be stolen by an idle processor If a processor is or becomes slow part of its work can be stolen by an idle processor Asymptotic optimality (proof provided in the paper) Asymptotic optimality (proof provided in the paper)
12
Performance P homogenous processeors
13
Performance P heterogenous processors
14
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.