Download presentation
Presentation is loading. Please wait.
Published byJeffrey Terry Modified over 8 years ago
1
Chapter 11 Broadcasting with Selective Reduction -BSR- Serpil Tokdemir GSU, Department of Computer Science
2
What is Broadcasting with Selective Reduction? BSR requires asymptotically no more resources than the PRAM for its implementation. an extension of the PRAM It consists; N processors M shared-memory locations MAU (memory access unit) Forms of memory access; ER EW CR CW
3
The BSR Model of Parallel Computation...... P1P1 P2P2 PNPN MEMORY ACCESS UNIT (MAU) MEMORY LOCATIONS … … …............ PROCESSORS SHARED MEMORY
4
Broadcasting with Selective Reduction During execution of an algorithm; several processors may read from or write to the same memory location all processors may gain access to all memory locations at the same time for the purpose of writing, at each memory location, a subset of the incoming broadcast data is selected and reduced to one value. according to an appropriate selection and reduction operator this value is finally stored in the memory location, BSR accommodates; all forms of memory access allowed by the PRAM + broadcasting with selective reduction.
5
BSR Continued the width of the resulting MAU: O(M) the depth of the resulting MAU: O(logM) the size of the resulting MAU: O(MlogM) How Long Does a Step Take in BSR? Memory access should require a (N, M)=O(logM) We assume here that a (N, M)=O(1) Similarly, a computational operation takes constant time; c (N, M)=O(1)
6
THE BSR MODEL Additional form of concurrent access to shared memory BROADCAST – allows all processors to write all-shared memory locations simultaneously. 3 phases, A broadcasting phase, Each processor P i broadcasts a datum d i and a tag g i, 1<=i<=N, destined to all memory locations. A selection phase, Each memory location U j uses a limit l j, 1<=j<=M, and a selection rule to test the condition g i l j. is selected from the set; =, >,
7
The BSR Model (Continued) A reduction phase, All data d i selected by U j during the selection phase are combined into one datum that is finally stored in U j. Reduction operator – SUM, PRODUCT, AND, OR, EXCLUSIVE-OR, MAXIMUM, MINIMUM All three phases are performed simultaneously for all processors P i and all memory locations U j.
8
The three phases of the BROADCAST instruction g 1, d 1 g N, d N g 1 l 1 g 2 l 1 g N l 1 g N l M g 2 l M g 1 l M dNdN dNdN
9
The BSR Model If a datum or a tag is not in a processor’s local register, obtain it from the shared memory by an ER or a CR The limits, selection rule and reduction operator, are assumed to be known by the memory locations. If not, they can be stored in memory by ER or CW Notation for the BROADCAST Instruction: A instruction Broadcast of BSR is written as follows: a
10
THE BSR MODEL If no data are accepted by a given memory location, Value is not affected by BROADCAST instruction If only one datum is accepted, U j is assigned the value of that datum. Comparing BSR to the PRAM In BSR, the BROADCAST instruction requires O(1) time. On a PRAM-same # of p’s and U’s- require O(M) time, since Broadcast is equivalent to M CW instructions The latter is at least as powerful as the former The BROADCAST instruction makes BSR strictly more powerful than the PRAM
11
THE BSR MODEL A, in nondecreasing order distinct numbers, in increasing order It is required to compute, for, the sum s i of all those elements of X not equal to. On the PRAM – O(n) – obviously optimal The sum S of all the elements of X is first computed, Y=X is merged with L, sorted by increasing order, Y is scanned,,, is computed by subtracting from S all the elements of X equal to. n processors can compute one of the in O(1) time
12
THE BSR MODEL BSR using one BROADCAST instruction: Processor P i,, broadcasts as the tag and datum pair. Memory location U j selects those x i not equal to, Those x i selected by U j are added up to obtain, This requires O(1) time Does not depend on X and L being sorted
13
BSR ALGORITHMS Prefix Sums Given n numbers, prefix sums BSR PREFIX SUMS – n processors and n memory locations P i broadcast index as tag and as datum. Memory location uses its index j as limit. Relation for selection and as a reduction operator. holds
14
BSR Algorithms – Prefix Sums Algorithm BSR PREFIX SUMS Consists of one BROADCAST instruction P(n)=n, t(n)=O(1), and c(n)=p(n)*t(n)=O(n) optimal for j= 1 to n do in parallel for i= 1 to n do in parallel end for end for.
15
BSR Algorithms – Prefix Sums Example: n={1, 2, 3}
16
BSR Algorithms – Sorting A, rearrange the elements of X bbbbbbbbbb – in nondecreasing order Requires n processors and n memory locations Consists of two steps; The rank r j of each element x j is computed x j – Limit < - Relation - Reduction operator U j holds r j, for x j is placed in position of the sorted sequence S. If and are equal,
17
BSR Algorithms - Sorting Second step continued, to position The next element with the next higher rank is placed in position of S. P i broadcasts the pair (r i, x i ) U j uses its index j as limit for selection as a reduction When this step terminates; U j holds s j – that is, the jth element of the sorted sequence
18
BSR Algorithms - Sorting Algorithm BSR SORT Step 1: for j= 1 to n do in parallel for i= 1 to n do in parallel Step 2: for j= 1 to n do in parallel for i= 1 to n do parallel end for
19
BSR Algorithms - Sorting Example: Processors broadcast the pairs to all memory locations; (8,1), (5,1), (2,1), (5,1) Limits are 8, 5, 2, and 5 Since 5 < 8, 2 < 5, and 5< 8, r 1 =3 Only 2 < 5, so r 2 =1 r 3 =0 Only 2 < 5, so r 4 =1
20
BSR Algorithms - Sorting Example continued; Step 2 of the algorithm Processors broadcast the pairs; (4,8), (2,5), (1,2), (2,5) Limits at the memory locations 1, 2, 3, 4 This gives the sorted sequence; {2, 5, 5, 8}
21
BSR Algorithms - Sorting Analysis: BSR SORT p(n)=n and runs in t(n)=O(1) time, c(n)=O(n) Uniform analysis assumed; the time required for memory access, was taken to be O(1). Discriminating Analysis:, is taken to be equal to O(logM) – for BSR & PRAM BSR: N=M=O(n), thus time is O(logn) Each step is executed once and containing a constant number of computations and memory access, so;
22
BSR Algorithms - Sorting - OPTIMAL PRAM SORT: N=M=O(n), thus time is O(logn) executes O(logn) computational and memory access steps, therefore, Cost is NOT optimal
23
BSR Algorithms – Computing Maximal Points,, n points in the plane, for A point of S is said to be maximal with respect to S if and only if it is not dominated by any other point of S. uses n processors and n memory locations consists of three steps: auxiliary sequence is created, m i, associated with point q i, is set initially to equal y i, The largest y coordinate is found, m j is assigned the value of that coordinate P i broadcasts, x i = tag, y i = datum
24
BSR Algorithms – Computing Maximal Points U j uses as its limit The relation > for selection for reduction, to compute m j If,, it accepts the y-coordinate of every point assigns the max of these to m j. A decision is made as to whether q i is a maximal point If m i was assigned to some point q k If, then q k dominates q i, Else, neither q k nor any other point does not dominate,
25
BSR Algorithms – Computing Maximal Points Algorithm BSR MAXIMAL POINTS Step 1: for i= 1 to n do in parallel end for Step 2: for j= 1 to n do in parallel for i= 1 to n do in parallel end for Step 3: for i= 1 to n do in parallel if then else end if end for.
26
BSR Algorithms – Computing Maximal Points Analysis; Each step – uses n processors & runs in O(1) time P(n)=n, t(n)=O(1), and c(n)=O(n) By taking memory access time O(logn), cost becomes O(nlogn) On the other hand cost for PRAM is O(nlog 2 n) – not optimal Example: are three points in the plane
27
BSR Algorithms – Computing Maximal Points After step 1 of the algorithm, m 1 =y 1, m 2 =y 2, m 3 =y 3 After step 2, m 1 =y 3, m 2 =y 3, m 3 =y 3 Since, m 1 y 2 and m 3 =y 3, both q 1 and q 3 are maximal
28
BSR Algorithms – Maximum Sum Sebsequence, the subsequence has the largest possible sum among all subsequences of X. Algorithm BSR MAXIMUM SUM SUBSEQUENCE Step 1: for j=1 to n do in parallel for i= 1 to n do in parallel end for Step 2
29
BSR Algorithms – Maximum Sum Subsequence Step 2: (2.1) for j= 1 to n do in parallel for i= 1 to n do in parallel end for (2.2) for j= 1 to n do in parallel for i= 1 to n do in parallel end for
30
BSR Algorithms – Maximum Sum Subsequences Step 3: for i= 1 to n do in parallel end for Step 4: (4.1) for i= 1 to n do in parallel (i) L b i (ii) if b i =L then u i end if end for (4.2) MAX ARBITRARY
31
BSR Algorithms – Maximum Sum Subsequences Steps of algorithm; Prefix sums are computed – uses BSR PREFIX SUMS For each j; Max prefix sum to he right of s j is found. Value and index m j, a j (i, s i ) = tag and datum U j uses j as limit, >= for selection and for reduction. To compute a i P i broadcasts (s i, i) as its tag and datum pair, U j uses m j as limit, = for selection and for reduction. For each i, the sum of max sum subsequence is computed Uses EW instruction
32
BSR Algorithms – Maximum Sum Subsequences Steps of algorithm continued The sum and starting index u of the overall maximum sum subsequence are found. Requires MAX CW instruction and an ARBITRARY CW instruction, Analysis: Each step of algorithm runs in O(1) time and uses n processors. Thus; p(n)=n, t(n)=O(1) and c(n)=O(n), Optimal
33
BSR Algorithms – Maximum Sum Subsequences Example: X={-1, 1, 2, -2} After step 1, prefix sums - s j -1, 0, 2, 0 Second broadcast instruction; m j 2, 2, 2, 0
34
BSR Algorithms – Maximum Sum Subsequences Example continued Third broadcast instruction for computing a j a j 3, 3, 3, 4 Step 3 computes each b i b i 2, 3, 2, -2 Finally; L=3 u=2 v= a 2 =3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.