Download presentation
Presentation is loading. Please wait.
Published byJack Morgan Modified over 8 years ago
1
Divide and Conquer Algorithms Sathish Vadhiyar
2
Introduction One of the important parallel algorithm models The idea is to decompose the problem into parts solve the problem on smaller parts find the global result using individual results Works naturally and works well for parallelization
3
Introduction Various models Recursive sub-division: Has a division and computation phase, then a merge phase. E.g., merge sort Local compute – merge/coordinate – local compute. E.g., following algorithms
4
Recursive sub-division: Merge sort (you know already) Solving tri-diagonal systems
5
Parallel solution of linear system with special matrices Tridiagonal Matrices a1 h1 g2 a2 h2 g3 a3 h3 gn an x1 x2 x3. xn = b1 b2 b3. bn In general: g i x i-1 + a i x i + h i x i+1 = b i Substituting for x i-1 and x i+1 in terms of {x i-2, x i } and {x i, x i+2 } respectively: G i x i-2 + A i x i + H i x i+2 = B i
6
Tridiagonal Matrices A1 H1 A2 H2 G3 A3 H3 G4 A4 H4 Gn-2 An x1 x2 x3. xn = B1 B2 B3. Bn Reordering:
7
Tridiagonal Matrices A2 H2 G4 A4 H4 Gn An A1 H1 G3 A3 H3 Gn-3 An-1 x2 x4. xn x1 x3. xn-1 = B2 B4. Bn B1 B3. Bn-1
8
Tridiagonal Systems Thus the problem of size n has been split into even and odd equations of size n/2 This is odd–even reduction For parallelization, each process can divide the problem into subproblems of smaller size and solve the subproblems This is divide-and-conquer technique
9
Tridiagonal Systems - Parallelization At each stage one representative process of the domain of processes is chosen This representative performs the odd-even reduction of problem i to two problems of size i/2 The problems are distributed to 2 representatives 12345678 1 26 1357 n n/2 n/4 n/8
10
Local compute – merge – local compute Prefix Computations Sample sort
11
Parallel Algorithm: Prefix computations on arrays Array X partitioned into subarrays Local prefix sums of each subarray calculated in parallel Prefix sums of last elements of each subarray written to a separate array Y Prefix sums of elements in Y are calculated. Each prefix sum of Y is added to corresponding block of X Divide and conquer strategy
12
Example 123456789 1,3,64,9,157,15,24 6,15,24 6,21,45 1,3,6,10,15,21,28,36,45 Divide Local prefix sum Passing last elements to a processor Computing prefix sum of last elements on the processor Adding global prefix sum to local prefix sums in each processor
13
Lessons Learned.. Has local computations Global communication/coordination Back to local computations
14
Sample Sort
15
Parallel Sorting by Regular Sampling (PSRS) 1.Each processor sorts its local data 2.Each processor selects a sample vector of size p-1; kth element is (n/p * (k+1)/p) 3.Samples are sent and merge-sorted on processor 0 4.Processor 0 defines a vector of p-1 splitters starting from p/2 element; i.e., kth element is p(k+1/2); broadcasts to the other processors
16
Example
17
PSRS 5.Each processor sends local data to correct destination processors based on splitters; all-to-all exchange 6.Each processor merges the data chunk it receives
18
Step 5 Each processor finds where each of the p-1 pivots divides its list, using a binary search i.e., finds the index of the largest element number larger than the jth pivot At this point, each processor has p sorted sublists with the property that each element in sublist i is greater than each element in sublist i-1 in any processor
19
Step 6 Each processor i performs a p-way merge-sort to merge the ith sublists of p processors
20
Example Continued
21
Analysis The first phase of local sorting takes O((n/p)log(n/p)) 2 nd phase: Sorting p(p-1) elements in processor 0 – O(p 2 logp 2 ) Each processor performs p-1 binary searches of n/p elements – plog(n/p) 3 rd phase: Each processor merges (p-1) sublists Size of data merged by any processor is no more than 2n/p (proof) Complexity of this merge sort 2(n/p)logp Summing up: O((n/p)logn)
22
Analysis 1 st phase – no communication 2 nd phase – p(p-1) data collected; p-1 data broadcast 3 rd phase: Each processor sends (p-1) sublists to other p-1 processors; processors work on the sublists independently
23
Analysis Not scalable for large number of processors Merging of p(p-1) elements done on one processor; 16384 processors require 16 GB memory
24
Sorting by Random Sampling An interesting alternative; random sample is flexible in size and collected randomly from each processor’s local data Advantage A random sampling can be retrieved before local sorting; overlap between sorting and splitter calculation
25
Sources/References On the versatility of parallel sorting by regular sampling. Li et al. Parallel Computing. 1993. Parallel Sorting by regular sampling. Shi and Schaeffer. JPDC 1992. Highly scalable parallel sorting. Solomonic and Kale. IPDPS 2010.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.