Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into.

Divide and Conquer Algorithms Sathish Vadhiyar

Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into parts solve the problem on smaller parts find the global result using individual results  Works naturally and works well for parallelization

Introduction  Various models Recursive sub-division: Has a division and computation phase, then a merge phase. E.g., merge sort Local compute – merge/coordinate – local compute. E.g., following algorithms

 Recursive sub-division: Merge sort (you know already) Solving tri-diagonal systems

Parallel solution of linear system with special matrices Tridiagonal Matrices a1 h1 g2 a2 h2 g3 a3 h3 gn an x1 x2 x3. xn = b1 b2 b3. bn In general: g i x i-1 + a i x i + h i x i+1 = b i Substituting for x i-1 and x i+1 in terms of {x i-2, x i } and {x i, x i+2 } respectively: G i x i-2 + A i x i + H i x i+2 = B i

Tridiagonal Matrices A1 H1 A2 H2 G3 A3 H3 G4 A4 H4 Gn-2 An x1 x2 x3. xn = B1 B2 B3. Bn Reordering:

Tridiagonal Matrices A2 H2 G4 A4 H4 Gn An A1 H1 G3 A3 H3 Gn-3 An-1 x2 x4. xn x1 x3. xn-1 = B2 B4. Bn B1 B3. Bn-1

Tridiagonal Systems  Thus the problem of size n has been split into even and odd equations of size n/2  This is odd–even reduction  For parallelization, each process can divide the problem into subproblems of smaller size and solve the subproblems  This is divide-and-conquer technique

Tridiagonal Systems - Parallelization  At each stage one representative process of the domain of processes is chosen  This representative performs the odd-even reduction of problem i to two problems of size i/2  The problems are distributed to 2 representatives 12345678 1 26 1357 n n/2 n/4 n/8

 Local compute – merge – local compute Prefix Computations Sample sort

Parallel Algorithm: Prefix computations on arrays  Array X partitioned into subarrays  Local prefix sums of each subarray calculated in parallel  Prefix sums of last elements of each subarray written to a separate array Y  Prefix sums of elements in Y are calculated.  Each prefix sum of Y is added to corresponding block of X  Divide and conquer strategy

Example 123456789 1,3,64,9,157,15,24 6,15,24 6,21,45 1,3,6,10,15,21,28,36,45 Divide Local prefix sum Passing last elements to a processor Computing prefix sum of last elements on the processor Adding global prefix sum to local prefix sums in each processor

Lessons Learned..  Has local computations  Global communication/coordination  Back to local computations

 Sample Sort

Parallel Sorting by Regular Sampling (PSRS) 1.Each processor sorts its local data 2.Each processor selects a sample vector of size p-1; kth element is (n/p * (k+1)/p) 3.Samples are sent and merge-sorted on processor 0 4.Processor 0 defines a vector of p-1 splitters starting from p/2 element; i.e., kth element is p(k+1/2); broadcasts to the other processors

Example

PSRS 5.Each processor sends local data to correct destination processors based on splitters; all-to-all exchange 6.Each processor merges the data chunk it receives

Step 5  Each processor finds where each of the p-1 pivots divides its list, using a binary search  i.e., finds the index of the largest element number larger than the jth pivot  At this point, each processor has p sorted sublists with the property that each element in sublist i is greater than each element in sublist i-1 in any processor

Step 6  Each processor i performs a p-way merge-sort to merge the ith sublists of p processors

Example Continued

Analysis  The first phase of local sorting takes O((n/p)log(n/p))  2 nd phase: Sorting p(p-1) elements in processor 0 – O(p 2 logp 2 ) Each processor performs p-1 binary searches of n/p elements – plog(n/p)  3 rd phase: Each processor merges (p-1) sublists Size of data merged by any processor is no more than 2n/p (proof) Complexity of this merge sort 2(n/p)logp  Summing up: O((n/p)logn)

Analysis  1 st phase – no communication  2 nd phase – p(p-1) data collected; p-1 data broadcast  3 rd phase: Each processor sends (p-1) sublists to other p-1 processors; processors work on the sublists independently

Analysis Not scalable for large number of processors Merging of p(p-1) elements done on one processor; 16384 processors require 16 GB memory

Sorting by Random Sampling  An interesting alternative; random sample is flexible in size and collected randomly from each processor’s local data  Advantage A random sampling can be retrieved before local sorting; overlap between sorting and splitter calculation

Sources/References  On the versatility of parallel sorting by regular sampling. Li et al. Parallel Computing. 1993.  Parallel Sorting by regular sampling. Shi and Schaeffer. JPDC 1992.  Highly scalable parallel sorting. Solomonic and Kale. IPDPS 2010.

Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into.

Similar presentations

Presentation on theme: "Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into.

Similar presentations

Presentation on theme: "Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into."— Presentation transcript:

Similar presentations

About project

Feedback