Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a 2 ) ) a 3 = a 1 ) (a 2 ) a 3 ) How to compute (a 1 ) a 2 ) …. ) a n ) in parallel in O(logn) time?
Approach 1 a0a1a2a3a4a5a6a7 [0:1] [0:0] [1:2] [2:3] [3:4] [4:5] [5:6] [6:7] [0:1] [0:0] [0:2] [0:3] [1:4] [2:5] [3:6] [4:7] [0:1] [0:0] [0:2] [0:3] [0:4] [0:5] [0:6] [0:7] d=1 d=2 d=4 Assume that n = 2 k for i = 0 to k-1 for j = 0 to n-1-2 i do in parallel x[j+ 2 i ] = x[j] + x[j+ 2 i ]
How to do on Tree Architecture? for each node if there is a signal from left and right S t <- S l + S r if there is a signal R, send R to both its children if the node is a leaf and there is a signal R, X <- X + R SlSl SrSr StSt R
How to do on a Hypercube A complete binary tree can be embedded into a hypercube Simpler solution: each node computes prefix and total sum for i = 0 to k-1 for j = 0 to n-1 do in parallel x[j] = x[j] + sum[j i ] if i-th bit of j = 1 sum[j ] = sum[j] + sum[j i ], where j i and j have the same binary number representation except their i-th bit, where the i-th bit of j i is the complement of the i-bit of j.
Prefix on Hypercube a0a1a2a3a4a5a6a7 for i = 0 to k-1 for j = 0 to n-1 do in parallel x[j] = x[j] + sum[j i ] if i-th bit of j = 1 sum[j ] = sum[j] + sum[j i ], [0:1] [0:0] [0:1] [2:2] [2:3] [4:4] [4:5] [6:6] [6:7] d=1 X SUM [0:1] [0:3] [0:0] [0:3] [2:2] [0:3] [2:3] [0:3] [4:4] [4:7] [4:5] [4:7] [4:6] [4:7] d=2 X SUM [0:1] [0:7] [0:0] [0:7] [2:2] [0:7] [2:3] [0:7] [0:4] [0:7] [0:5] [0:7] [0:6] [0:7] d=4 X SUM
Applications of Data Parallel Operations Any associative operations: Examples: –min, max, add –adding two binary numbers –finite state automata –radix sort –segmented prefix sum –routing packing unpacking broadcast (copy-scan) –solving recurrence equations –straight line computation (parallel arithmetic evaluation)
Adding two n bit numbers as parallel prefix a = a n-1 …. a 0 b = b n-1 …. b 0 s = a + b note that s i = a i b i c i-1 to compute c i define g and p as: g i = a i b i, p i = a i b i define as : (g,p) (g’,p’) = (g (p g’), p p’) Then carry bit c i can be computed by: (g,p) (g’,p’) = (g (p g’), p p’) (G i, P i ) = (g i,p i ) (g i-1, p i-1 ) … (g 0,p 0 ) and G i = c i
Hardware circuit of recursive look-ahead adder
Parsing a regular language b b cc q1q1 q2q2 q0q0 (q0,b) = q2, (q0,c) = q1, (q1,b) = q0, (q1,c) = qr, (q2,b) = qr, (q2,c) = q0 qr: reject state q0->q2 q1->q0 q2->qr q2 q0 qr q1 qr q0 q1 qr q0 q2 q0 qr q1’ q2’ q3’ q1’ q2’ q3’ q0 q1 qr q1 qr q0 b q1’ q2’ q3’ q0 q1 qr q0 qr q2 q0 q1 qr q0 qr q2 q0 qr b c c b c
Segmented Prefix operation Segment boundary after before
Segmented Prefix computation Let be any associative operation. For segmented operation of , define ’ as follows: ’ b| b a a b | b | a | (a b)| b Then ’ is associative and we can compute segmented operation in O(logn) time.
Enumerating Data = [ ] active procs = [ ] enumerated = [0 x 1 2 x x 3 x 4 0]
packing data = [ ] active procs = [ ] enumerated = [0 x 1 2 x x 3 x 4 x] packed data =[ x x x x x]
Packing and Unpacking on Hypercube Packing adjust bit 0 adjust bit 1 adjust bit 2... adjust bit k-1 Unpacking adjust bit k-1 adjust bit k-2... adjust bit 1 adjust bit 0 How about in the order of adjust bit 0, 1,..., k-1 for packing?
Unpacking Address data = [ x x x x x] active procs = [ ] enumerated = [0 x 1 2 x x 3 x 4 x] destination =[ x x x x x] unpacked data =[6 x 2 3 x x 5 x 9 x]
Copy Scan (broadcast) address data = [ ] segmented bit = [ ] result = [ ]
Radix Sort for j = k-1 to 0 // x has k bits for all i in [0.. n-1] do parallel { if j-th bit of x[i] is 0 { y[i] = enumerate c = count } if j-th bit of x[i] is 1 y [i] <- enumerate + c x [y[i]] = x [i] } Radix sort another code for j = k-1 to 0 // x has k bits for all i in [0.. n-1] do parallel { pack left x[i] if j-th bit of x[i] pack right x[i] if j-th bit of x[i] }
Quick Sort 1. Pick a pivot p 2. Broadcast p 3. For all PE i, compare A[i] with p { if A[i] <p, pack left A[i] in the segment if A[i] >= p, pack right A[i] in the segment } 4. Mark the segment boundary 5. Each segment, quick sort recursively
Solving Linear Recurrence Equations f n =a n-1 f n-1 + a n-2 f n-2 f n f n-1
Pointer Jumping and Tree Computation How to compute a prefix on a linked list? If NEXT[i] != NILL then X[i] <- X[i] + X[NEXT[i]] NEXT[i] <- NEXT[NEXT[i]] How to make order?
Application: Tree computation Pre-order numbering Each node Leaf node 1 1 Can be applied to in order, post order number of children, depth etc. Bi-component, etc also
Recurrence Equation Example: LU decomposition on a triangular matrix