1 CS 240A : Divide-and-Conquer with Cilk++ Thanks to Charles E. Leiserson for some of these slides Divide & Conquer Paradigm Solving recurrences Sorting:

Slides:



Advertisements
Similar presentations
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Advertisements

Data Structures Lecture 9 Fang Yu Department of Management Information Systems National Chengchi University Fall 2010.
5/5/20151 Analysis of Algorithms Lecture 6&7: Master theorem and substitution method.
Comp 122, Spring 2004 Divide and Conquer (Merge Sort)
Divide and Conquer Strategy
Divide-and-Conquer Recursive in structure –Divide the problem into several smaller sub-problems that are similar to the original but smaller in size –Conquer.
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
1 Divide-and-Conquer CSC401 – Analysis of Algorithms Lecture Notes 11 Divide-and-Conquer Objectives: Introduce the Divide-and-conquer paradigm Review the.
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
11 Computer Algorithms Lecture 6 Recurrence Ch. 4 (till Master Theorem) Some of these slides are courtesy of D. Plaisted et al, UNC and M. Nicolescu, UNR.
Foundations of Algorithms, Fourth Edition
CIT 596 Complexity. Tight bounds Is mergesort O(n 4 )? YES! But it is silly to use that as your measure More common to hear it as….? What about quicksort?
1 Exceptions Exceptions oops, mistake correction oops, mistake correction Issues with Matrix and Vector Issues with Matrix and Vector Quicksort Quicksort.
Spring 2015 Lecture 5: QuickSort & Selection
1 Issues with Matrix and Vector Issues with Matrix and Vector Quicksort Quicksort Determining Algorithm Efficiency Determining Algorithm Efficiency Substitution.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
CS 253: Algorithms Chapter 7 Mergesort Quicksort Credit: Dr. George Bebis.
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Chapter 4: Divide and Conquer The Design and Analysis of Algorithms.
CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 4. Recurrences - 1 Recurrences.
CS 240A : Examples with Cilk++
Princeton University COS 423 Theory of Algorithms Spring 2002 Kevin Wayne Linear Time Selection These lecture slides are adapted from CLRS 10.3.
Quicksort CIS 606 Spring Quicksort Worst-case running time: Θ(n 2 ). Expected running time: Θ(n lg n). Constants hidden in Θ(n lg n) are small.
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4 Modified by: Daniel Gomez-Prado, University of Massachusetts Amherst.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Data Structures Review Session 1
Updates HW#1 has been delayed until next MONDAY. There were two errors in the assignment Merge sort runs in Θ(n log n). Insertion sort runs in Θ(n2).
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
Design and Analysis of Algorithms – Chapter 51 Divide and Conquer (I) Dr. Ying Lu RAIK 283: Data Structures & Algorithms.
Recurrences The expression: is a recurrence. –Recurrence: an equation that describes a function in terms of its value on smaller functions Analysis of.
Design and Analysis of Algorithms - Chapter 41 Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two.
Recurrences The expression: is a recurrence. –Recurrence: an equation that describes a function in terms of its value on smaller functions BIL741: Advanced.
1 Divide and Conquer Binary Search Mergesort Recurrence Relations CSE Lecture 4 – Algorithms II.
Divide-and-Conquer 7 2  9 4   2   4   7
Multithreaded Programming in Cilk L ECTURE 2 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence.
Analyzing Recursive Algorithms A recursive algorithm can often be described by a recurrence equation that describes the overall runtime on a problem of.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 1 Chapter.
10/13/20151 CS 3343: Analysis of Algorithms Lecture 9: Review for midterm 1 Analysis of quick sort.
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
1 Computer Algorithms Lecture 7 Master Theorem Some of these slides are courtesy of D. Plaisted, UNC and M. Nicolescu, UNR.
2IL50 Data Structures Fall 2015 Lecture 2: Analysis of Algorithms.
1 CS 140 : Non-numerical Examples with Cilk++ Divide and conquer paradigm for Cilk++ Quicksort Mergesort Thanks to Charles E. Leiserson for some of these.
Project 2 due … Project 2 due … Project 2 Project 2.
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
10/25/20151 CS 3343: Analysis of Algorithms Lecture 6&7: Master theorem and substitution method.
Order Statistics David Kauchak cs302 Spring 2012.
1 CS 140 : Feb 19, 2015 Cilk Scheduling & Applications Analyzing quicksort Optional: Master method for solving divide-and-conquer recurrences Tips on parallelism.
Recitation on analysis of algorithms. Formal definition of O(n) We give a formal definition and show how it is used: f(n) is O(g(n)) iff There is a positive.
Midterm Review 1. Midterm Exam Thursday, October 15 in classroom 75 minutes Exam structure: –TRUE/FALSE questions –short questions on the topics discussed.
Foundations II: Data Structures and Algorithms
Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner
Lecture 5 Today, how to solve recurrences We learned “guess and proved by induction” We also learned “substitution” method Today, we learn the “master.
Divide and Conquer. Recall Divide the problem into a number of sub-problems that are smaller instances of the same problem. Conquer the sub-problems by.
Spring 2015 Lecture 2: Analysis of Algorithms
Recurrences (in color) It continues…. Recurrences When an algorithm calls itself recursively, its running time is described by a recurrence. When an algorithm.
2IL50 Data Structures Spring 2016 Lecture 2: Analysis of Algorithms.
Sorting Algorithms Merge Sort Quick Sort Hairong Zhao New Jersey Institute of Technology.
Introduction to Algorithms
CS 140 : Numerical Examples on Shared Memory with Cilk++
CS 3343: Analysis of Algorithms
CS 3343: Analysis of Algorithms
CSE 2010: Algorithms and Data Structures
Divide and Conquer Merge sort and quick sort Binary search
Divide-and-Conquer 7 2  9 4   2   4   7
Presentation transcript:

1 CS 240A : Divide-and-Conquer with Cilk++ Thanks to Charles E. Leiserson for some of these slides Divide & Conquer Paradigm Solving recurrences Sorting: Quicksort and Mergesort

2 T P = execution time on P processors T 1 = workT ∞ = span* Speedup on p processors ∙T 1 /T p Potential parallelism ∙T 1 /T ∞ Work and Span (Recap)

3 Sorting ∙Sorting is possibly the most frequently executed operation in computing! ∙Quicksort is the fastest sorting algorithm in practice with an average running time of O(N log N), (but O(N 2 ) worst case performance) ∙Mergesort has worst case performance of O(N log N) for sorting N elements ∙Both based on the recursive divide-and- conquer paradigm

4 QUICKSORT ∙Basic Quicksort sorting an array S works as follows:  If the number of elements in S is 0 or 1, then return.  Pick any element v in S. Call this pivot.  Partition the set S-{v} into two disjoint groups: ♦ S 1 = {x  S-{v} | x  v} ♦ S 2 = {x  S-{v} | x  v}  Return quicksort(S 1 ) followed by v followed by quicksort(S 2 )

5 QUICKSORT Select Pivot

6 QUICKSORT Partition around Pivot

7 QUICKSORT Quicksort recursively

8 Parallelizing Quicksort ∙Serial Quicksort sorts an array S as follows:  If the number of elements in S is 0 or 1, then return.  Pick any element v in S. Call this pivot.  Partition the set S-{v} into two disjoint groups: ♦ S 1 = {x  S-{v} | x  v} ♦ S 2 = {x  S-{v} | x  v}  Return quicksort(S 1 ) followed by v followed by quicksort(S 2 )

9 Parallel Quicksort (Basic) The second recursive call to qsort does not depend on the results of the first recursive call We have an opportunity to speed up the call by making both calls in parallel.

10 Performance ∙./qsort cilk_set_worker_count 1 >> seconds ∙./qsort cilk_set_worker_count 16 >> seconds 5.93 ∙Speedup = T 1 /T 16 = 0.083/0.014 = 5.93 ∙./qsort cilk_set_worker_count 1 >> seconds ∙./qsort cilk_set_worker_count 16 >> 1.58 seconds 6.67 ∙Speedup = T 1 /T 16 = 10.57/1.58 = 6.67

11 Measure Work/Span Empirically ∙cilkscreen -w./qsort Work = Span = Burdened span = Parallelism = Burdened parallelism = #Spawn = #Atomic instructions = 14 ∙cilkscreen -w./qsort Work = Span = Burdened span = Parallelism = Burdened parallelism = #Spawn = #Atomic instructions = 8 workspan ws; ws.start(); sample_qsort(a, a + n); ws.stop(); ws.report(std::cout);

12 Analyzing Quicksort Quicksort recursively Assume we have a “great” partitioner that always generates two balanced sets

13 ∙Work: T 1 (n) = 2T 1 (n/2) + Θ(n) 2T 1 (n/2) = 4T 1 (n/4) + 2 Θ(n/2) …. n/2 T 1 (2) = n T 1 (1) + n/2 Θ(2) T 1 (n)= Θ(n lg n) ∙Span recurrence: T ∞ (n) = T ∞ (n/2) + Θ(n) Solves to T ∞ (n) = Θ(n) Analyzing Quicksort +

14 Analyzing Quicksort ∙Indeed, partitioning (i.e., constructing the array S 1 = {x  S-{v} | x  v}) can be accomplished in parallel in time Θ(lg n) ∙Which gives a span T ∞ (n) = Θ(lg 2 n ) ∙And parallelism Θ(n/lg n) ∙Basic parallel qsort can be found in CLRS Parallelism: T 1 (n) T ∞ (n) = Θ(lg n) Not much ! Way better !

15 The Master Method The Master Method for solving recurrences applies to recurrences of the form T(n) = a T(n/b) + f(n), where a ≥ 1, b > 1, and f is asymptotically positive. I DEA : Compare n log b a with f(n). *The unstated base case is T(n) =  (1) for sufficiently small n. *

16 Master Method — C ASE 1 n log b a ≫ f(n) T(n) = a T(n/b) + f(n) Specifically, f(n) = O(n log b a – ε ) for some constant ε > 0. Solution: T(n) = Θ(n log b a ). Eg matrix mult: a=8, b=2, f(n)=n 2  T 1 (n)=Θ(n 3 )

17 Master Method — C ASE 2 n log b a ≈ f(n) Specifically, f(n) = Θ(n log b a lg k n) for some constant k ≥ 0. Solution: T(n) = Θ(n log b a lg k+1 n)). T(n) = a T(n/b) + f(n) Eg qsort: a=2, b=2, k=0  T 1 (n)=Θ(n lg n)

18 Master Method — C ASE 3 n log b a ≪ f(n) Specifically, f(n) = Ω(n log b a + ε ) for some constant ε > 0, and f(n) satisfies the regularity condition that a f(n/b) ≤ c f(n) for some constant c < 1. Solution: T(n) = Θ(f(n)). T(n) = a T(n/b) + f(n) Eg: Span of qsort

19 Master Method Summary CASE 1: f (n) = O(n log b a – ε ), constant ε > 0  T(n) = Θ(n log b a ). CASE 2: f (n) = Θ(n log b a lg k n), constant k  0  T(n) = Θ(n log b a lg k+1 n). CASE 3: f (n) = Ω(n log b a + ε ), constant ε > 0, and regularity condition  T(n) = Θ(f(n)). T(n) = a T(n/b) + f(n)

20 MERGESORT ∙Mergesort is an example of a recursive sorting algorithm. ∙It is based on the divide-and-conquer paradigm ∙It uses the merge operation as its fundamental component (which takes in two sorted sequences and produces a single sorted sequence) ∙Simulation of MergesortSimulation of Mergesort ∙Drawback of mergesort: Not in-place (uses an extra temporary array)

21 template void Merge(T *C, T *A, T *B, int na, int nb) { while (na>0 && nb>0) { if (*A <= *B) { *C++ = *A++; na--; } else { *C++ = *B++; nb--; } while (na>0) { *C++ = *A++; na--; } while (nb>0) { *C++ = *B++; nb--; } template void Merge(T *C, T *A, T *B, int na, int nb) { while (na>0 && nb>0) { if (*A <= *B) { *C++ = *A++; na--; } else { *C++ = *B++; nb--; } while (na>0) { *C++ = *A++; na--; } while (nb>0) { *C++ = *B++; nb--; } Merging Two Sorted Arrays Time to merge n elements = Θ(n).

22 template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } Parallel Merge Sort merge A: input (unsorted) B: output (sorted) C: temporary A: input (unsorted) B: output (sorted) C: temporary

23 template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } Work of Merge Sort 2T 1 (n/2) + Θ(n) = Θ(n lg n) Work: T 1 (n)= CASE 2: n log b a = n log 2 2 = n f(n) = Θ(n log b a lg 0 n) CASE 2: n log b a = n log 2 2 = n f(n) = Θ(n log b a lg 0 n)

24 template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } Span of Merge Sort T ∞ (n/2) + Θ(n) = Θ(n) Span: T ∞ (n)= CASE 3: n log b a = n log 2 1 = 1 f(n) = Θ(n) CASE 3: n log b a = n log 2 1 = 1 f(n) = Θ(n)

25 Parallelism of Merge Sort T 1 (n)= Θ(n lg n)Work: T ∞ (n)= Θ(n)Span: Parallelism: T 1 (n) T ∞ (n) = Θ(lg n) We need to parallelize the merge!

26 B A 0na 0nb na ≥ nb Parallel Merge ≤ A[ma] ≥ A[ma] Binary Search mb-1 mb Recursive P_Merge Recursive P_Merge ma = na/2 ≤ A[ma]≥ A[ma] K EY I DEA : If the total number of elements to be merged in the two arrays is n = na + nb, the total number of elements in the larger of the two recursive merges is at most (3/4) n. Throw away at least na/2 ≥ n/4

27 Parallel Merge template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { P_Merge(C, B, A, nb, na); } else if (na==0) { return; } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { P_Merge(C, B, A, nb, na); } else if (na==0) { return; } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } Coarsen base cases for efficiency.

28 Span of Parallel Merge template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { ⋮ int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { ⋮ int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } T ∞ (3n/4) + Θ(lg n) = Θ(lg 2 n ) Span: T ∞ (n)= CASE 2: n log b a = n log 4/3 1 = 1 f(n) = Θ(n log b a lg 1 n) CASE 2: n log b a = n log 4/3 1 = 1 f(n) = Θ(n log b a lg 1 n)

29 Work of Parallel Merge template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { ⋮ int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { ⋮ int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } T 1 (αn) + T 1 ((1-α)n) + Θ(lg n), where 1/4 ≤ α ≤ 3/4. Work: T 1 (n)= Claim: T 1 (n)= Θ(n).

30 Analysis of Work Recurrence Substitution method: Inductive hypothesis is T 1 (k) ≤ c 1 k – c 2 lg k, where c 1,c 2 > 0. Prove that the relation holds, and solve for c 1 and c 2. Work: T 1 (n)= T 1 (αn) + T 1 ((1-α)n) + Θ(lg n), where 1/4 ≤ α ≤ 3/4. T 1 (n)=T 1 (  n) + T 1 ((1–  )n) +  (lg n) ≤c 1 (  n) – c 2 lg(  n) + c 1 (1–  )n – c 2 lg((1–  )n) +  (lg n)

31 Analysis of Work Recurrence T 1 (n)=T 1 (  n) + T 1 ((1–  )n) +  (lg n) ≤c 1 (  n) – c 2 lg(  n) + c 1 (1–  )n – c 2 lg((1–  )n) +  (lg n) Work: T 1 (n)= T 1 (αn) + T 1 ((1-α)n) + Θ(lg n), where 1/4 ≤ α ≤ 3/4.

32 T 1 (n)=T 1 (  n) + T 1 ((1–  )n) + Θ(lg n) ≤c 1 (  n) – c 2 lg(  n) + c 1 (1–  )n – c 2 lg((1–  )n) + Θ(lg n) Analysis of Work Recurrence ≤ c 1 n – c 2 lg(  n) – c 2 lg((1–  )n) + Θ(lg n) ≤ c 1 n – c 2 ( lg(  (1–  )) + 2 lg n ) + Θ(lg n) ≤ c 1 n – c 2 lg n – (c 2 (lg n + lg(  (1–  ))) – Θ(lg n)) ≤ c 1 n – c 2 lg n by choosing c 2 large enough. Choose c 1 large enough to handle the base case. Work: T 1 (n)= T 1 (αn) + T 1 ((1-α)n) + Θ(lg n), where 1/4 ≤ α ≤ 3/4.

33 Parallelism of P_Merge T 1 (n)= Θ(n)Work: T ∞ (n)= Θ(lg 2 n)Span: Parallelism: T 1 (n) T ∞ (n) = Θ(n/lg 2 n)

34 template void P_MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T C[n]; cilk_spawn P_MergeSort(C, A, n/2); P_MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; P_Merge(B, C, C+n/2, n/2, n-n/2); } template void P_MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T C[n]; cilk_spawn P_MergeSort(C, A, n/2); P_MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; P_Merge(B, C, C+n/2, n/2, n-n/2); } Parallel Merge Sort 2T 1 (n/2) + Θ(n) = Θ(n lg n) Work: T 1 (n)= CASE 2: n log b a = n log 2 2 = n f(n) = Θ(n log b a lg 0 n) CASE 2: n log b a = n log 2 2 = n f(n) = Θ(n log b a lg 0 n)

35 template void P_MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T C[n]; cilk_spawn P_MergeSort(C, A, n/2); P_MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; P_Merge(B, C, C+n/2, n/2, n-n/2); } template void P_MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T C[n]; cilk_spawn P_MergeSort(C, A, n/2); P_MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; P_Merge(B, C, C+n/2, n/2, n-n/2); } Parallel Merge Sort T ∞ (n/2) + Θ(lg 2 n) = Θ(lg 3 n) Span: T ∞ (n)= CASE 2: n log b a = n log 2 1 = 1 f(n) = Θ(n log b a lg 2 n) CASE 2: n log b a = n log 2 1 = 1 f(n) = Θ(n log b a lg 2 n)

36 Parallelism of P_MergeSort T 1 (n)= Θ(n lg n)Work: T ∞ (n)= Θ(lg 3 n)Span: Parallelism: T 1 (n) T ∞ (n) = Θ(n/lg 2 n)