Lecture 3: Parallel Algorithm Design

Slides:



Advertisements
Similar presentations
General algorithmic techniques: Balanced binary tree technique Doubling technique: List Ranking Problem Divide and concur Lecture 6.
Advertisements

Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Algorithms.
MATH 224 – Discrete Mathematics
Advanced Topics in Algorithms and Data Structures Lecture 7.2, page 1 Merging two upper hulls Suppose, UH ( S 2 ) has s points given in an array according.
2/9/06CS 3343 Analysis of Algorithms1 Convex Hull  Given a set of pins on a pinboard  And a rubber band around them  How does the rubber band look when.
CS Divide and Conquer/Recurrence Relations1 Divide and Conquer.
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
A simple example finding the maximum of a set S of n numbers.
1 Parallel Parentheses Matching Plus Some Applications.
1/13/15CMPS 3130/6130: Computational Geometry1 CMPS 3130/6130: Computational Geometry Spring 2015 Convex Hulls Carola Wenk.
Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
The Divide-and-Conquer Strategy
Lectures on Recursive Algorithms1 COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski.
Convex Hulls in Two Dimensions Definitions Basic algorithms Gift Wrapping (algorithm of Jarvis ) Graham scan Divide and conquer Convex Hull for line intersections.
CS4413 Divide-and-Conquer
1 Potential for Parallel Computation Module 2. 2 Potential for Parallelism Much trivially parallel computing  Independent data, accounts  Nothing to.
Divide-and-Conquer Recursive in structure –Divide the problem into several smaller sub-problems that are similar to the original but smaller in size –Conquer.
CS223 Advanced Data Structures and Algorithms 1 Divide and Conquer Neil Tang 4/15/2010.
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
8/29/06CS 6463: AT Computational Geometry1 CS 6463: AT Computational Geometry Spring 2006 Convex Hulls Carola Wenk.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Lecture 8 Jianjun Hu Department of Computer Science and Engineering University of South Carolina CSCE350 Algorithms and Data Structure.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Chapter 4: Divide and Conquer The Design and Analysis of Algorithms.
1.4 Exercises (cont.) Definiton: A set S of points is said to be affinely (convex) independent if no point of S is an affine combination of the others.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 3 Recurrence equations Formulating recurrence equations Solving recurrence equations.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 3 Recurrence equations Formulating recurrence equations Solving recurrence equations.
CSC 2300 Data Structures & Algorithms January 26, 2007 Chapter 2. Algorithm Analysis.
The Euler-tour technique
CSC 2300 Data Structures & Algorithms January 30, 2007 Chapter 2. Algorithm Analysis.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
MergeSort Source: Gibbs & Tamassia. 2 MergeSort MergeSort is a divide and conquer method of sorting.
Unit 1. Sorting and Divide and Conquer. Lecture 1 Introduction to Algorithm and Sorting.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
5 -1 Chapter 5 The Divide-and-Conquer Strategy A simple example finding the maximum of a set S of n numbers.
Divide and Conquer Andreas Klappenecker [based on slides by Prof. Welch]
Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.
Divide and Conquer Andreas Klappenecker [based on slides by Prof. Welch]
Divide and Conquer Faculty Name: Ruhi Fatima Topics Covered Divide and Conquer Matrix multiplication Recurrence.
Lecture # 6 1 Advance Analysis of Algorithms. Divide-and-Conquer Divide the problem into a number of subproblems Similar sub-problems of smaller size.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Introduction to Algorithms: Divide-n-Conquer Algorithms
Lecture 3: Parallel Algorithm Design
Unit 1. Sorting and Divide and Conquer
Chapter 5 Divide & conquer
Lecture 2: Parallel computational models
Chapter 4 Divide-and-Conquer
Quick-Sort 9/12/2018 3:26 PM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Algorithm design techniques Dr. M. Gavrilova
MergeSort Source: Gibbs & Tamassia.
CSCE 411 Design and Analysis of Algorithms
CMPS 3130/6130: Computational Geometry Spring 2017
Merge Sort 2/23/ :15 PM Merge Sort 7 2   7  2   4  4 9
Merge Sort 4/10/ :25 AM Merge Sort 7 2   7  2   4  4 9
CMPS 3120: Computational Geometry Spring 2013
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
Merge Sort 5/30/2019 7:52 AM Merge Sort 7 2   7  2  2 7
Divide and Conquer Merge sort and quick sort Binary search
Presentation transcript:

Lecture 3: Parallel Algorithm Design

Techniques of Parallel Algorithm Design Balanced binary tree Pointer jumping Accelerated cascading Divide and conquer Pipelining Multi-level divide and conquer  . . . . .

Balanced binary tree Processing on binary tree: Let the leaves correspond to input and internal nodes to processors. Example Find the sum of n integers (x1, x2, ... , xn).

Problem of finding Prefix Sum Balanced binary tree Problem of finding Prefix Sum Definition of Prefix Sum Input: n integers put in array A[1..n] on the shared memory Output: array B[1..n], where for each B[i] (1≦i≦n) B[i] = A[1] + A[2] + .... + A[i] Example Input: A[1..5] = (5, 8, -7, -10, 3),    Output: B[1..5] = (5, 13, 6, -4, -1) Sequential algorithm for Prefix Sum main (){ B[1] = A[1]; for (i = 2; i≦n; i++) { B[i] = B[i-1] + A[i]; }

Solving Prefix Sum problem on balanced binary tree (1) Outline of the parallel algorithm for prefix sum To simplify the problem, let n = 2k (k is an integer) Calculate the sub-sum from the leaves to the root in bottom up style. Using the sub-sum obtained in (1) , calculate the prefix sum from the root to the leaves in up down style.

Solving Prefix Sum problem on balanced binary tree (2) First read the input at the leaves. Then, calculate the sub-sum from the leaves to the root in bottom up style. (2) From the root to the leaves, do the following: send the right son its sub-sum obtained in (1), and send the left son the value of (its sub-sum) – the right son’s sub-sum). 12-(-2) P 4 3 2 1 6 7 =14 12 12-5 =7 -(-3) =10 -(-4) =6 -5 =-3 -2 =4 12-10 =2

Solving Prefix Sum problem on balanced binary tree (3) Correctness of the algorithm  When step (1) finished, the sub-sum in each internal node is the sum of its subtree.

Solving Prefix Sum problem on balanced binary tree (4) Correctness of the algorithm - Continue In step (2), at each internal node The sub-sum sent to the right son is the summation of its subtree. (b) The sub-sum sent to the left son is the sum of its subtree subtracted by the sum of its right son’s subtree. P 4 3 2 1 12-10 =2 7 5 (b) (a) 12

Solving Prefix Sum problem on balanced binary tree (5) Algorithm Parallel-PrefixSum (EREW PRAM algorithm) main (){ if (number of processor == i) B[0, i] = A[i]; for (h=1; h≦log n; h++) { if (number of processor j ≦ n/2h) { B[h, j] = B[h-1, 2j-1] + B[h-1, 2j]; } C[log n, 1] = B[log n, 1] for (h = (log n) - 1; h≧0; h--) { if (j is even) C[h, j] = C[h+1, j/2]; if (j is odd) C[h, j] = C[h+1, (j+1)/2] - B[h, j+1];

Solving Prefix Sum problem on balanced binary tree (6) (First step) B[3,1] B[0,4] B[0,3] B[0,2] B[0,1] A[1] A[2] A[3] A[4] B[1,2] B[1,1] B[2,1] B[0,8] B[0,7] B[0,5] B[0,6] A[5] A[6] A[7] A[8] B[1,4] B[1,3] B[2,2]

Solving Prefix Sum problem on balanced binary tree (7) (Second step) C[3,1] C [0,4] [0,3] [0,2] [0,1] [1,2] [1,1] [2,1] [0,8] [0,7] [0,5] [0,6] [1,4] [1,3] [2,2]

Solving Prefix Sum problem on balanced binary tree (8) Analysis of the algorithm  Computing time: for loop repeated log n times and each loop can be executed in O(1) time → O(log n) time  Number of processors: Not larger than n → n processors  Speed up: O(n/log n)  Cost: O(n log n) It is not cost optimal since the running time of the optimal Θ(n).

Accelerated cascading Balanced binary tree Accelerated cascading To reduce the cost, solve the problem sequentially when the size of the problem is small. Accelerated cascading is used, usually, with balanced binary tree and divide and conquer techniques.

Balanced binary tree Accelerated cascading for Prefix Sum problem Policy for improving the algorithm To make the algorithm cost optimal, we decrease the number of processors from n to n/logn. (Note: Computing time of the algorithm is O(logn).) Steps: Instead of processing n elements in parallel, divide n elements into n/logn groups with logn elements each. To each group assign one processor and solve the problem for the group sequentially.

Balanced binary tree Accelerated cascading for Prefix Sum problem Improved algorithm Parallel-PrefixSum Divide n elements in A[1..n] in to n/log n groups with log n elements each.  (O(1) time,O(n/log n) processors) (2) Assign each group one processor and find the prefix sum for each group. (O(log n) time,O(n/log n) processors) A[1..n] log n elements

Balanced binary tree Accelerated cascading for Prefix Sum problem (3) Improved algorithm Parallel-PrefixSum - continue (3) Let S be the set of the last element in each group (it is the sum of the group). Use algorithm Parallel-PrefixSum to find the prefix sum of S. ( O(log (n/log n) ) = O(log n) time,O(n/log n) processors) Algorithm Parallel-PrefixSum Last element in each group

Balanced binary tree Accelerated cascading for Prefix Sum problem (4) Improved algorithm Parallel-PrefixSum - continue (4) Use the prefix sum of S to find the prefix sum of the input A[1..n]. (O(log n) time,O(n/log n) processor) Result of (3)

Balanced binary tree Accelerated cascading for Prefix Sum problem (5) Analysis of the improved algorithm Computing time and the number of processors:  Each step: O(log n) time, O(n/log n) → Totally, O(log n) time, O(n/log n) processors  Speed up = O(n/log n)  Cost: O(log n × n/log n) = O(n)  It is cost optimal.  It is also time optimal ( Don’t show the proof here)   It is optimal algorithm.

Divide and Conquer Divide and conquer (1) 2 divide and conquer (2) n divide and conquer(ε<1) ε

Divide and Conquer Divide and conquer technique  Well known technique in algorithm design  Solving problems recursively  Used very often in both sequential and parallel algorithms How to divide and conquer Dividing step: dividing the problem into a number of subproblems. Conquering step: solving each subproblem recursively. (3) Merging step: merging the solutions of subproblems to the solution of the original problem.

Divide and Conquer Convex hull problem Input: a set of n points in the plane. Output: the smallest convex polygon which contains all points of the input. (The convex polygon is represented by the list of its vertices in order of clockwise.)  Basic problems in computational geometry.  A lot of applications.  Solved in O(nlogn) time sequentially. In the following we only consider the upper convex hull. (Upper convex hull: ( P9, P8, P1, P0 ) ) P 1 2 3 9 4 5 6 7 8 Output: ( P ,P ,P ,P ,P ,P ) Lower convex hull Upper convex hull

Merging of two upper convex hulls Divide and Conquer Merging of two upper convex hulls Finding the upper common tangent p 1 2 3 9 4 5 6 7 8 10 Common upper tangent = (p ,p ) It is known that common tangents can be found in O(log n) time sequentially.

Divide and Conquer 2 divide and conquer (1) Outline of the algorithm Parallel-UpperConvexHull Preprocessing Sort all the points according to their x coordinates, and let the result is the sequence (p1, p2, p3, ... , pn). If the size of sequence is 2, return the sequence. Divide (p1, p2, p3, ... , pn) to the left half part and the right half part, and find the upper convex hull of each recursively. (3) Find the upper common tangent of two upper convex hulls obtained in (2), and output the solution of the problem.  

Divide and Conquer 2 divide and conquer (2) How 2 divide and conquer works Find the upper common tangent for two upper convex hulls of two vertices each. Find the upper common tangent for two upper convex hulls of four vertices each. Find the upper common tangent for two upper convex hulls of eight vertices each.

Divide and Conquer 2 divide and conquer(3) Recursive execution When the problem is divide once, the size of the subproblem becomes half. Suppose the size of the subproblems becomes 2 when the problem is divided k times. n/2k= 2 ⇒  k = log2 n - 1 n n/2 n/4 2 Height= log

Divide and Conquer 2divide and conquer (4) Complexity of the algorithm Preprocessing: O(log n) time,n processors  Steps (1)〜(3): each step runs O(log n) time,use n/2 processors   T(n) = T(n/2) + O(log n) Therefore, T(n) = O(log n) ∴ The algorithm runs in O(log n) time using O(n) processors. Computational model: There is no concurrent access ⇒ EREW PRAM Proprocessing Sort the sequence of the points according to their x coordinates. If the size of the sequence is 2, return the sequence. Divide the sequence into the left half part and the right half part, and find the upper convex hull of each recursively. Find the upper common tangent of two upper convex hulls obtained in (2), and output the upper convex hull of the sequence. 2 2

Divide and Conquer 2divide and conquer (5) Finding the complexity of the algorithm from recursive tree Computing time Number of processors At the level of the leaves, n/2 processors are used at the same time. ⇒ n/2 processors n n/2 n/4 2 Height log c Time Totally c log n c log n/2 c log n/4 O(log n) Processors 1 4 T(n)×P(n)=O(nlog n)   It is not cost optimal 2

Divide and Conquer n divide and conquer Outline of the algorithm 1/2 Outline of the algorithm Preprocessing Sort the sequence of the input points according to their x coordinates, and let the result be sequence (p1, p2, p3, ... , pn). (1) If the size of the sequence is 2, return the sequence. (2) Divide (p1, p2, p3, ... , pn) to equally-sized subsequence, and find the upper convex hull of each recursively. (3) Merge upper convex hulls into the upper convex hull of the sequence. n 1/2 n 1/2

Divide and Conquer n Merging upper convex hull 1/2 Merging upper convex hull n 1/2 Assign each upper convex hull processors to find the upper common tangents in   O(log n) time, and then determine the edges which belong to the solution.    Case 1 Case 2

Recursive tree of n divide and conquer 1/2 When the problem is divided once, the size of the subproblems becomes . Suppose that the size of the subproblems becomes 2 when the problem is divided k times.   = 2 ⇒  k = log log n n 1/2 n 1/(2 ) k Height= loglog n n 1/2 1/4 2

Analysis of the algorithm Divide and Conquer Analysis of the algorithm Preprocessing: O(log n) time,n processors.  Steps (1)〜(3):  each step O(log n) time,n processors.   T(n) = T(n  ) + O(log n), therefore, T(n) = O(log n) ∴ Totally, the algorithm runs in O(log n) time using O(n) processors. Computational model Concurrent reading happens in the procedure of finding the upper common tangents ⇒ CREW PRAM Preprocessing Sort the sequence of the points in their x coordinates. (1) If the size of the sequence is 2,return the sequence. (2) Divide the sequence into  equally-sized subsequences, and find the upper convex hull of each recursively. (3) Find the upper common tangents of the upper convex hulls obtained in (2), and determine the solution. n 1/2 1/2 T(n)×P(n)=O(nlog n)  Optimal !!!

Exercise 1. Suppose nxn matrix A and matrix B are saved in two dimension arrays. Design a PRAM algorithm for A×B using n and nxn processors, respectively. Answer the following questions: What PRAM models that you use in your algorithms? What are the runings time? Are you algorithms cost optimal? Are your algorithms time optimal? 2. Design a PRAM algorithm for A×B using k (k <= nxn processors). Answer the same questions.