1 CS 140 : Non-numerical Examples with Cilk++ Divide and conquer paradigm for Cilk++ Quicksort Mergesort Thanks to Charles E. Leiserson for some of these.

Slides:



Advertisements
Similar presentations
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Advertisements

Data Structures Lecture 9 Fang Yu Department of Management Information Systems National Chengchi University Fall 2010.
5/5/20151 Analysis of Algorithms Lecture 6&7: Master theorem and substitution method.
September 12, Algorithms and Data Structures Lecture III Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Divide-and-Conquer Recursive in structure –Divide the problem into several smaller sub-problems that are similar to the original but smaller in size –Conquer.
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
Analysis of Algorithms CS 477/677 Sorting – Part B Instructor: George Bebis (Chapter 7)
CIT 596 Complexity. Tight bounds Is mergesort O(n 4 )? YES! But it is silly to use that as your measure More common to hear it as….? What about quicksort?
1 Exceptions Exceptions oops, mistake correction oops, mistake correction Issues with Matrix and Vector Issues with Matrix and Vector Quicksort Quicksort.
Spring 2015 Lecture 5: QuickSort & Selection
Recitation on analysis of algorithms. runtimeof MergeSort /** Sort b[h..k]. */ public static void mS(Comparable[] b, int h, int k) { if (h >= k) return;
1 Issues with Matrix and Vector Issues with Matrix and Vector Quicksort Quicksort Determining Algorithm Efficiency Determining Algorithm Efficiency Substitution.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
CS 253: Algorithms Chapter 7 Mergesort Quicksort Credit: Dr. George Bebis.
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Chapter 4: Divide and Conquer The Design and Analysis of Algorithms.
CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 4. Recurrences - 1 Recurrences.
CS 240A : Examples with Cilk++
Quicksort CIS 606 Spring Quicksort Worst-case running time: Θ(n 2 ). Expected running time: Θ(n lg n). Constants hidden in Θ(n lg n) are small.
Updates HW#1 has been delayed until next MONDAY. There were two errors in the assignment Merge sort runs in Θ(n log n). Insertion sort runs in Θ(n2).
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
Recurrences Part 3. Recursive Algorithms Recurrences are useful for analyzing recursive algorithms Recurrence – an equation or inequality that describes.
Design and Analysis of Algorithms – Chapter 51 Divide and Conquer (I) Dr. Ying Lu RAIK 283: Data Structures & Algorithms.
Recurrences The expression: is a recurrence. –Recurrence: an equation that describes a function in terms of its value on smaller functions Analysis of.
Recurrences The expression: is a recurrence. –Recurrence: an equation that describes a function in terms of its value on smaller functions BIL741: Advanced.
1 Divide and Conquer Binary Search Mergesort Recurrence Relations CSE Lecture 4 – Algorithms II.
Analysis of Recursive Algorithms October 29, 2014
Multithreaded Algorithms Andreas Klappenecker. Motivation We have discussed serial algorithms that are suitable for running on a uniprocessor computer.
Divide-and-Conquer 7 2  9 4   2   4   7
Multithreaded Programming in Cilk L ECTURE 2 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence.
Analysis of Algorithms
Analyzing Recursive Algorithms A recursive algorithm can often be described by a recurrence equation that describes the overall runtime on a problem of.
David Luebke 1 10/3/2015 CS 332: Algorithms Solving Recurrences Continued The Master Theorem Introduction to heapsort.
Recitation 11 Analysis of Algorithms and inductive proofs 1.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
10/13/20151 CS 3343: Analysis of Algorithms Lecture 9: Review for midterm 1 Analysis of quick sort.
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
1 Computer Algorithms Lecture 7 Master Theorem Some of these slides are courtesy of D. Plaisted, UNC and M. Nicolescu, UNR.
2IL50 Data Structures Fall 2015 Lecture 2: Analysis of Algorithms.
Project 2 due … Project 2 due … Project 2 Project 2.
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
10/25/20151 CS 3343: Analysis of Algorithms Lecture 6&7: Master theorem and substitution method.
Order Statistics David Kauchak cs302 Spring 2012.
1 CS 140 : Feb 19, 2015 Cilk Scheduling & Applications Analyzing quicksort Optional: Master method for solving divide-and-conquer recurrences Tips on parallelism.
Recurrences David Kauchak cs161 Summer Administrative Algorithms graded on efficiency! Be specific about the run times (e.g. log bases) Reminder:
Recitation on analysis of algorithms. Formal definition of O(n) We give a formal definition and show how it is used: f(n) is O(g(n)) iff There is a positive.
Divide and Conquer Andreas Klappenecker [based on slides by Prof. Welch]
Midterm Review 1. Midterm Exam Thursday, October 15 in classroom 75 minutes Exam structure: –TRUE/FALSE questions –short questions on the topics discussed.
Foundations II: Data Structures and Algorithms
Divide and Conquer. Recall Divide the problem into a number of sub-problems that are smaller instances of the same problem. Conquer the sub-problems by.
Spring 2015 Lecture 2: Analysis of Algorithms
Recurrences (in color) It continues…. Recurrences When an algorithm calls itself recursively, its running time is described by a recurrence. When an algorithm.
2IL50 Data Structures Spring 2016 Lecture 2: Analysis of Algorithms.
1 CS 240A : Divide-and-Conquer with Cilk++ Thanks to Charles E. Leiserson for some of these slides Divide & Conquer Paradigm Solving recurrences Sorting:
CSCI 256 Data Structures and Algorithm Analysis Lecture 10 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
CS 140 : Numerical Examples on Shared Memory with Cilk++
CS 3343: Analysis of Algorithms
CS 3343: Analysis of Algorithms
CS 3343: Analysis of Algorithms
Chapter 4: Divide and Conquer
Data Structures Review Session
Introduction to Algorithms
The Selection Problem.
Divide and Conquer Merge sort and quick sort Binary search
Presentation transcript:

1 CS 140 : Non-numerical Examples with Cilk++ Divide and conquer paradigm for Cilk++ Quicksort Mergesort Thanks to Charles E. Leiserson for some of these slides

2 T P = execution time on P processors T 1 = workT ∞ = span* *Also called critical-path length or computational depth. Speedup on p processors ∙T 1 /T p Parallelism ∙T 1 /T ∞ Work and Span (Recap)

3 Scheduling ∙ Cilk++ allows the programmer to express potential parallelism in an application. ∙The Cilk++ scheduler maps strands onto processors dynamically at runtime. ∙Since on-line schedulers are complicated, we’ll explore the ideas with an off-line scheduler. Network … Memory I/O PPPP PPPP P P P P $$$

4 Greedy Scheduling I DEA : Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed.

5 Greedy Scheduling I DEA : Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. Complete step ∙≥ P strands ready. ∙Run any P. P = 3

6 Greedy Scheduling I DEA : Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. Complete step ∙≥ P strands ready. ∙Run any P. P = 3 Incomplete step ∙< P strands ready. ∙Run all of them.

7 Theorem [G68, B75, BL93]. Any greedy scheduler achieves T P  T 1 /P + T ∞. Analysis of Greedy Proof. ∙# complete steps  T 1 /P, since each complete step performs P work. ∙# incomplete steps  T ∞, since each incomplete step reduces the span of the unexecuted dag by 1. ■ P = 3

8 Optimality of Greedy Corollary. Any greedy scheduler achieves within a factor of 2 of optimal. Proof. Let T P * be the execution time produced by the optimal scheduler. Since T P * ≥ max{T 1 /P, T ∞ } by the Work and Span Laws, we have T P ≤ T 1 /P + T ∞ ≤ 2⋅max{T 1 /P, T ∞ } ≤ 2T P *. ■

9 Linear Speedup Corollary. Any greedy scheduler achieves near-perfect linear speedup whenever P ≪ T 1 /T ∞. Proof. Since P ≪ T 1 /T ∞ is equivalent to T ∞ ≪ T 1 /P, the Greedy Scheduling Theorem gives us T P ≤ T 1 /P + T ∞ ≈ T 1 /P. Thus, the speedup is T 1 /T P ≈ P. ■ Definition. The quantity T 1 /PT ∞ is called the parallel slackness.

10 Sorting ∙Sorting is possibly the most frequently executed operation in computing! ∙Quicksort is the fastest sorting algorithm in practice with an average running time of O(N log N), (but O(N 2 ) worst case performance) ∙Mergesort has worst case performance of O(N log N) for sorting N elements ∙Both based on the recursive divide-and- conquer paradigm

11 Parallelizing Quicksort ∙Serial Quicksort sorts an array S as follows:  If the number of elements in S is 0 or 1, then return.  Pick any element v in S. Call this pivot.  Partition the set S-{v} into two disjoint groups: ♦ S 1 = {x  S-{v} | x  v} ♦ S 2 = {x  S-{v} | x  v}  Return quicksort(S 1 ) followed by v followed by quicksort(S 2 ) Not necessarily so !

12 Parallel Quicksort (Basic) The second recursive call to qsort does not depend on the results of the first recursive call We have an opportunity to speed up the call by making both calls in parallel.

13 Performance ∙./qsort cilk_set_worker_count 1 >> seconds ∙./qsort cilk_set_worker_count 4 >> seconds 3.58 ∙Speedup = T 1 /T 4 = 0.122/0.034 = 3.58 ∙./qsort cilk_set_worker_count 1 >> seconds ∙./qsort cilk_set_worker_count 4 >> 3.84 seconds 3.78 ∙Speedup = T 1 /T 4 = 14.54/3.84 = 3.78

14 Measure Work/Span Empirically ∙cilkview./qsort Work :6,008,068,218 instructions Span :1,635,913,102 instructions Burdened span :1,636,331,960 instructions Parallelism :3.67 … ∙Only the qsort function, exclude data generation (page 112 of the cilk++ programmer’s guide)cilk++ programmer’s guide qsort only ws qsort only ws … qsort only ws qsort only ws infinity

15 Analyzing Quicksort Quicksort recursively Assume we have a “great” partitioner that always generates two balanced sets

16 ∙Work: T 1 (n) = 2T 1 (n/2) + Θ(n) 2T 1 (n/2) = 4T 1 (n/4) + 2 Θ(n/2) …. n/2 T 1 (2) = n T 1 (1) + n/2 Θ(2) T 1 (n)= Θ(n lg n) ∙Span recurrence: T ∞ (n) = T ∞ (n/2) + Θ(n) Solves to T ∞ (n) = Θ(n) Analyzing Quicksort +

17 Analyzing Quicksort ∙Indeed, partitioning (i.e., constructing the array S 1 = {x  S-{v} | x  v}) can be accomplished in parallel in time Θ(lg n) ∙Which gives a span T ∞ (n) = Θ(lg 2 n ) ∙And parallelism Θ(n/lg n) ∙Basic parallel qsort can be found under $cilkpath/examples/qsort ∙Parallel partitioning might be a final project Parallelism: T 1 (n) T ∞ (n) = Θ(lg n) Not much ! Way better !

18 The Master Method (Optional) The Master Method for solving recurrences applies to recurrences of the form T(n) = a T(n/b) + f(n), where a ≥ 1, b > 1, and f is asymptotically positive. I DEA : Compare n log b a with f(n). *The unstated base case is T(n) =  (1) for sufficiently small n. *

19 Master Method — C ASE 1 n log b a ≫ f(n) T(n) = a T(n/b) + f(n) Specifically, f(n) = O(n log b a – ε ) for some constant ε > 0. Solution: T(n) = Θ(n log b a ).

20 Master Method — C ASE 2 n log b a ≈ f(n) Specifically, f(n) = Θ(n log b a lg k n) for some constant k ≥ 0. Solution: T(n) = Θ(n log b a lg k+1 n)). T(n) = a T(n/b) + f(n) Ex(qsort): a =2, b=2, k=0  T 1 (n)=Θ(n lg n)

21 Master Method — C ASE 3 n log b a ≪ f(n) Specifically, f(n) = Ω(n log b a + ε ) for some constant ε > 0, and f(n) satisfies the regularity condition that a f(n/b) ≤ c f(n) for some constant c < 1. Solution: T(n) = Θ(f(n)). T(n) = a T(n/b) + f(n) Example: Span of qsort

22 Master Method Summary CASE 1: f (n) = O(n log b a – ε ), constant ε > 0  T(n) = Θ(n log b a ). CASE 2: f (n) = Θ(n log b a lg k n), constant k  0  T(n) = Θ(n log b a lg k+1 n). CASE 3: f (n) = Ω(n log b a + ε ), constant ε > 0, and regularity condition  T(n) = Θ(f(n)). T(n) = a T(n/b) + f(n)

23 MERGESORT ∙Mergesort is an example of a recursive sorting algorithm. ∙It is based on the divide-and-conquer paradigm ∙It uses the merge operation as its fundamental component (which takes in two sorted sequences and produces a single sorted sequence) ∙Simulation of MergesortSimulation of Mergesort ∙Drawback of mergesort: Not in-place (uses an extra temporary array)

24 template void Merge(T *C, T *A, T *B, int na, int nb) { while (na>0 && nb>0) { if (*A <= *B) { *C++ = *A++; na--; } else { *C++ = *B++; nb--; } while (na>0) { *C++ = *A++; na--; } while (nb>0) { *C++ = *B++; nb--; } template void Merge(T *C, T *A, T *B, int na, int nb) { while (na>0 && nb>0) { if (*A <= *B) { *C++ = *A++; na--; } else { *C++ = *B++; nb--; } while (na>0) { *C++ = *A++; na--; } while (nb>0) { *C++ = *B++; nb--; } Merging Two Sorted Arrays Time to merge n elements = Θ(n).

25 template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } Parallel Merge Sort merge A: input (unsorted) B: output (sorted) C: temporary A: input (unsorted) B: output (sorted) C: temporary

26 template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } Work of Merge Sort 2T 1 (n/2) + Θ(n) = Θ(n lg n) Work: T 1 (n)= CASE 2: n log b a = n log 2 2 = n f(n) = Θ(n log b a lg 0 n) CASE 2: n log b a = n log 2 2 = n f(n) = Θ(n log b a lg 0 n)

27 template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } template void MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T* C = new T[n]; cilk_spawn MergeSort(C, A, n/2); MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; Merge(B, C, C+n/2, n/2, n-n/2); delete[] C; } Span of Merge Sort T ∞ (n/2) + Θ(n) = Θ(n) Span: T ∞ (n)= CASE 3: n log b a = n log 2 1 = 1 f(n) = Θ(n) CASE 3: n log b a = n log 2 1 = 1 f(n) = Θ(n)

28 Parallelism of Merge Sort T 1 (n)= Θ(n lg n)Work: T ∞ (n)= Θ(n)Span: Parallelism: T 1 (n) T ∞ (n) = Θ(lg n) We need to parallelize the merge!

29 B A 0na 0nb na ≥ nb Parallel Merge ≤ A[ma] ≥ A[ma] Binary Search mb-1 mb Recursive P_Merge Recursive P_Merge ma = na/2 ≤ A[ma]≥ A[ma] K EY I DEA : If the total number of elements to be merged in the two arrays is n = na + nb, the total number of elements in the larger of the two recursive merges is at most (3/4) n. Throw away at least na/2 ≥ n/4

30 Parallel Merge template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { P_Merge(C, B, A, nb, na); } else if (na==0) { return; } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { P_Merge(C, B, A, nb, na); } else if (na==0) { return; } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } Coarsen base cases for efficiency.

31 Span of Parallel Merge template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { ⋮ int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { ⋮ int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } T ∞ (3n/4) + Θ(lg n) = Θ(lg 2 n ) Span: T ∞ (n)= CASE 2: n log b a = n log 4/3 1 = 1 f(n) = Θ(n log b a lg 1 n) CASE 2: n log b a = n log 4/3 1 = 1 f(n) = Θ(n log b a lg 1 n)

32 Work of Parallel Merge template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { ⋮ int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } template void P_Merge(T *C, T *A, T *B, int na, int nb) { if (na < nb) { ⋮ int mb = BinarySearch(A[ma], B, nb); C[ma+mb] = A[ma]; cilk_spawn P_Merge(C, A, B, ma, mb); P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb); cilk_sync; } T 1 (αn) + T 1 ((1-α)n) + Θ(lg n), where 1/4 ≤ α ≤ 3/4. Work: T 1 (n)= Claim: T 1 (n)= Θ(n).

33 Analysis of Work Recurrence Substitution method: Inductive hypothesis is T 1 (k) ≤ c 1 k – c 2 lg k, where c 1,c 2 > 0. Prove that the relation holds, and solve for c 1 and c 2. Work: T 1 (n)= T 1 (αn) + T 1 ((1-α)n) + Θ(lg n), where 1/4 ≤ α ≤ 3/4. T 1 (n)=T 1 (  n) + T 1 ((1–  )n) +  (lg n) ≤c 1 (  n) – c 2 lg(  n) + c 1 (1–  )n – c 2 lg((1–  )n) +  (lg n)

34 Analysis of Work Recurrence T 1 (n)=T 1 (  n) + T 1 ((1–  )n) +  (lg n) ≤c 1 (  n) – c 2 lg(  n) + c 1 (1–  )n – c 2 lg((1–  )n) +  (lg n) Work: T 1 (n)= T 1 (αn) + T 1 ((1-α)n) + Θ(lg n), where 1/4 ≤ α ≤ 3/4.

35 T 1 (n)=T 1 (  n) + T 1 ((1–  )n) + Θ(lg n) ≤c 1 (  n) – c 2 lg(  n) + c 1 (1–  )n – c 2 lg((1–  )n) + Θ(lg n) Analysis of Work Recurrence ≤ c 1 n – c 2 lg(  n) – c 2 lg((1–  )n) + Θ(lg n) ≤ c 1 n – c 2 ( lg(  (1–  )) + 2 lg n ) + Θ(lg n) ≤ c 1 n – c 2 lg n – (c 2 (lg n + lg(  (1–  ))) – Θ(lg n)) ≤ c 1 n – c 2 lg n by choosing c 2 large enough. Choose c 1 large enough to handle the base case. Work: T 1 (n)= T 1 (αn) + T 1 ((1-α)n) + Θ(lg n), where 1/4 ≤ α ≤ 3/4.

36 Parallelism of P_Merge T 1 (n)= Θ(n)Work: T ∞ (n)= Θ(lg 2 n)Span: Parallelism: T 1 (n) T ∞ (n) = Θ(n/lg 2 n)

37 template void P_MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T C[n]; cilk_spawn P_MergeSort(C, A, n/2); P_MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; P_Merge(B, C, C+n/2, n/2, n-n/2); } template void P_MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T C[n]; cilk_spawn P_MergeSort(C, A, n/2); P_MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; P_Merge(B, C, C+n/2, n/2, n-n/2); } Parallel Merge Sort 2T 1 (n/2) + Θ(n) = Θ(n lg n) Work: T 1 (n)= CASE 2: n log b a = n log 2 2 = n f(n) = Θ(n log b a lg 0 n) CASE 2: n log b a = n log 2 2 = n f(n) = Θ(n log b a lg 0 n)

38 template void P_MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T C[n]; cilk_spawn P_MergeSort(C, A, n/2); P_MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; P_Merge(B, C, C+n/2, n/2, n-n/2); } template void P_MergeSort(T *B, T *A, int n) { if (n==1) { B[0] = A[0]; } else { T C[n]; cilk_spawn P_MergeSort(C, A, n/2); P_MergeSort(C+n/2, A+n/2, n-n/2); cilk_sync; P_Merge(B, C, C+n/2, n/2, n-n/2); } Parallel Merge Sort T ∞ (n/2) + Θ(lg 2 n) = Θ(lg 3 n) Span: T ∞ (n)= CASE 2: n log b a = n log 2 1 = 1 f(n) = Θ(n log b a lg 2 n) CASE 2: n log b a = n log 2 1 = 1 f(n) = Θ(n log b a lg 2 n)

39 Parallelism of P_MergeSort T 1 (n)= Θ(n lg n)Work: T ∞ (n)= Θ(lg 3 n)Span: Parallelism: T 1 (n) T ∞ (n) = Θ(n/lg 2 n)