Multithreaded Programming in Cilk L ECTURE 2 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence.

Slides:



Advertisements
Similar presentations
Divide-and-Conquer CIS 606 Spring 2010.
Advertisements

A simple example finding the maximum of a set S of n numbers.
More on Divide and Conquer. The divide-and-conquer design paradigm 1. Divide the problem (instance) into subproblems. 2. Conquer the subproblems by solving.
DIVIDE AND CONQUER. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer.
Introduction to Algorithms Jiafen Liu Sept
1 Divide & Conquer Algorithms. 2 Recursion Review A function that calls itself either directly or indirectly through another function Recursive solutions.
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
Introduction to Algorithms 6.046J/18.401J L ECTURE 3 Divide and Conquer Binary search Powering a number Fibonacci numbers Matrix multiplication Strassen’s.
Divide and Conquer. Recall Complexity Analysis – Comparison of algorithm – Big O Simplification From source code – Recursive.
ADA: 4. Divide/Conquer1 Objective o look at several divide and conquer examples (merge sort, binary search), and 3 approaches for calculating their.
1 Exceptions Exceptions oops, mistake correction oops, mistake correction Issues with Matrix and Vector Issues with Matrix and Vector Quicksort Quicksort.
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
1 Issues with Matrix and Vector Issues with Matrix and Vector Quicksort Quicksort Determining Algorithm Efficiency Determining Algorithm Efficiency Substitution.
Algorithm Design Strategy Divide and Conquer. More examples of Divide and Conquer  Review of Divide & Conquer Concept  More examples  Finding closest.
Parallelizing C Programs Using Cilk Mahdi Javadi.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
The master method The master method applies to recurrences of the form T(n) = a T(n/b) + f (n), where a  1, b > 1, and f is asymptotically positive.
CS421 - Course Information Website Syllabus Schedule The Book:
Administrivia, Lecture 5 HW #2 was assigned on Sunday, January 20. It is due on Thursday, January 31. –Please use the correct edition of the textbook!
CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 4. Recurrences - 1 Recurrences.
CS 240A : Examples with Cilk++
Recurrences The expression: is a recurrence. –Recurrence: an equation that describes a function in terms of its value on smaller functions Analysis of.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Recurrences The expression: is a recurrence. –Recurrence: an equation that describes a function in terms of its value on smaller functions BIL741: Advanced.
1 Divide and Conquer Binary Search Mergesort Recurrence Relations CSE Lecture 4 – Algorithms II.
Arithmetic.
David Luebke 1 10/3/2015 CS 332: Algorithms Solving Recurrences Continued The Master Theorem Introduction to heapsort.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 3 Prof. Erik Demaine.
10/13/20151 CS 3343: Analysis of Algorithms Lecture 9: Review for midterm 1 Analysis of quick sort.
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
1 CS 140 : Non-numerical Examples with Cilk++ Divide and conquer paradigm for Cilk++ Quicksort Mergesort Thanks to Charles E. Leiserson for some of these.
Project 2 due … Project 2 due … Project 2 Project 2.
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Divide and Conquer Andreas Klappenecker [based on slides by Prof. Welch]
1Computer Sciences Department. Book: Introduction to Algorithms, by: Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Electronic:
1 Chapter 4 Divide-and-Conquer. 2 About this lecture Recall the divide-and-conquer paradigm, which we used for merge sort: – Divide the problem into a.
1 CS 140 : Feb 19, 2015 Cilk Scheduling & Applications Analyzing quicksort Optional: Master method for solving divide-and-conquer recurrences Tips on parallelism.
Divide and Conquer Andreas Klappenecker [based on slides by Prof. Welch]
CSC 413/513: Intro to Algorithms Solving Recurrences Continued The Master Theorem Introduction to heapsort.
Foundations II: Data Structures and Algorithms
Lecture 5 Today, how to solve recurrences We learned “guess and proved by induction” We also learned “substitution” method Today, we learn the “master.
Divide and Conquer. Recall Divide the problem into a number of sub-problems that are smaller instances of the same problem. Conquer the sub-problems by.
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
Young CS 331 D&A of Algo. Topic: Divide and Conquer1 Divide-and-Conquer General idea: Divide a problem into subprograms of the same kind; solve subprograms.
Recurrences (in color) It continues…. Recurrences When an algorithm calls itself recursively, its running time is described by a recurrence. When an algorithm.
Divide and Conquer Faculty Name: Ruhi Fatima Topics Covered Divide and Conquer Matrix multiplication Recurrence.
1 CS 240A : Divide-and-Conquer with Cilk++ Thanks to Charles E. Leiserson for some of these slides Divide & Conquer Paradigm Solving recurrences Sorting:
MA/CSSE 473 Day 30 B Trees Dynamic Programming Binomial Coefficients Warshall's algorithm No in-class quiz today Student questions?
Recursion Unrolling for Divide and Conquer Programs Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
1 Cilk Chao Huang CS498LVK. 2 Introduction A multithreaded parallel programming language Effective for exploiting dynamic, asynchronous parallelism (Chess.
Lecture 6 Sorting II Divide-and-Conquer Algorithms.
MA/CSSE 473 Day 14 Strassen's Algorithm: Matrix Multiplication Decrease and Conquer DFS.
BITS Pilani Pilani Campus Data Structure and Algorithms Design Dr. Maheswari Karthikeyan Lecture3.
Introduction to Algorithms: Divide-n-Conquer Algorithms
CS 140 : Numerical Examples on Shared Memory with Cilk++
CS 3343: Analysis of Algorithms
Chapter 4: Divide and Conquer
Chapter 4 Divide-and-Conquer
CSCE 411 Design and Analysis of Algorithms
CS 3343: Analysis of Algorithms
Chapter 4: Divide and Conquer
Unit-2 Divide and Conquer
Multithreaded Programming in Cilk LECTURE 1
Topic: Divide and Conquer
Introduction to Algorithms
CSE 373 Data Structures and Algorithms
Divide & Conquer Algorithms
Introduction to CILK Some slides are from:
Presentation transcript:

Multithreaded Programming in Cilk L ECTURE 2 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Minicourse Outline ● L ECTURE 1 Basic Cilk programming: Cilk keywords, performance measures, scheduling. ● L ECTURE 2 Analysis of Cilk algorithms: matrix multiplication, sorting, tableau construction. ● L ABORATORY Programming matrix multiplication in Cilk — Dr. Bradley C. Kuszmaul ● L ECTURE 3 Advanced Cilk programming: inlets, abort, speculation, data synchronization, & more.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, The Master Method The Master Method for solving recurrences applies to recurrences of the form T(n) = a T(n/b) + f (n), where a ¸ 1, b > 1, and f is asymptotically positive. I DEA : Compare n log b a with f (n). *The unstated base case is T(n) =  (1) for sufficiently small n. *

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Master Method — C ASE 1 n log b a À f (n) Specifically, f (n) = O(n log b a –  ) for some constant  > 0. Solution: T(n) =  (n log b a ). T(n) = a T(n/b) + f (n)

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Master Method — C ASE 2 Specifically, f (n) =  (n log b a lg k n) for some constant k ¸ 0. Solution: T(n) =  (n log b a lg k+1 n). n log b a ¼ f (n) T(n) = a T(n/b) + f (n)

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Master Method — C ASE 3 Specifically, f (n) =  (n log b a +  ) for some constant  > 0 and f (n) satisfies the regularity condition that a f (n/b) · c f (n) for some constant c < 1. Solution: T(n) =  (f (n)). n log b a ¿ f (n) T(n) = a T(n/b) + f (n)

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Master Method Summary C ASE 1: f (n) = O(n log b a –  ), constant  > 0  T(n) =  (n log b a ). C ASE 2: f (n) =  (n log b a lg k n), constant k  0  T(n) =  (n log b a lg k+1 n). C ASE 3: f (n) =  (n log b a +  ), constant  > 0, and regularity condition  T(n) =  ( f (n)). T(n) = a T(n/b) + f (n)

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Master Method Quiz T(n) = 4 T(n/2) + n n log b a = n 2 À n ) C ASE 1: T(n) =  (n 2 ). T(n) = 4 T(n/2) + n 2 n log b a = n 2 = n 2 lg 0 n ) C ASE 2: T(n) =  (n 2 lg n). T(n) = 4 T(n/2) + n 3 n log b a = n 2 ¿ n 3 ) C ASE 3: T(n) =  (n 3 ). T(n) = 4 T(n/2) + n 2 / lg n Master method does not apply!

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Square-Matrix Multiplication c 11 c 12  c1nc1n c 21 c 22  c2nc2n    cn1cn1 cn2cn2  c nn a 11 a 12  a1na1n a 21 a 22  a2na2n    an1an1 an2an2  a nn b 11 b 12  b1nb1n b 21 b 22  b2nb2n    bn1bn1 bn2bn2  b nn = £ CAB c ij =  k = 1 n a ik b kj Assume for simplicity that n = 2 k.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Recursive Matrix Multiplication 8 multiplications of (n/2) £ (n/2) matrices. 1 addition of n £ n matrices. Divide and conquer — C 11 C 12 C 21 C 22 = £ A 11 A 12 A 21 A 22 B 11 B 12 B 21 B 22 =+ A 11 B 11 A 11 B 12 A 21 B 11 A 21 B 12 A 12 B 21 A 12 B 22 A 22 B 21 A 22 B 22

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } Matrix Multiply in Pseudo-Cilk C = A ¢ B Absence of type declarations.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } C = A ¢ B Coarsen base cases for efficiency. Matrix Multiply in Pseudo-Cilk

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } C = A ¢ B Submatrices are produced by pointer calculation, not copying of elements. Also need a row- size argument for array indexing. Matrix Multiply in Pseudo-Cilk

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } C = A ¢ B cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } C = C + T Matrix Multiply in Pseudo-Cilk

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, A 1 (n)= ? 4 A 1 (n/2) +  (1) cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } Work of Matrix Addition Work: n log b a = n log 2 4 = n 2 À  (1). — C ASE 1 =(n2)=(n2)

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } cilk void Add(*C, *T, n) { h base case & partition matrices i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } A 1 (n)= ? Span of Matrix Addition A 1 (n/2) +  (1) Span: n log b a = n log 2 1 = 1 ) f (n) =  (n log b a lg 0 n). — C ASE 2 =  (lg n) maximum

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, M 1 (n)= ? Work of Matrix Multiplication 8 M 1 (n/2) +A 1 (n) +  (1) Work: cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2);  spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2);  spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } 8 n log b a = n log 2 8 = n 3 À  (n 2 ). =8 M 1 (n/2) +  (n 2 ) =(n3)=(n3) — C ASE 1

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2);  spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2);  spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2);  spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2);  spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } M 1 (n)= ? M 1 (n/2) + A 1 (n) +  (1) Span of Matrix Multiplication Span: n log b a = n log 2 1 = 1 ) f (n) =  (n log b a lg 1 n). =M 1 (n/2) +  (lg n) =  (lg 2 n) — C ASE 2 8

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Parallelism of Matrix Multiply M1(n)= (n3)M1(n)= (n3) Work: M 1 (n)=  (lg 2 n) Span: Parallelism: M1(n)M1(n) M1(n)M1(n) =  (n 3 /lg 2 n) For 1000 £ 1000 matrices, parallelism ¼ (10 3 ) 3 /10 2 = 10 7.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, cilk void Mult(*C, *A, *B, n) { h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2);  spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } cilk void Mult(*C, *A, *B, n) { h base case & partition matrices i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2);  spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } Stack Temporaries float *T = Cilk_alloca(n*n*sizeof(float)); In hierarchical-memory machines (especially chip multiprocessors), memory accesses are so expensive that minimizing storage often yields higher performance. I DEA : Trade off parallelism for less storage.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, No-Temp Matrix Multiplication cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; } cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; } Saves space, but at what expense?

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, =(n3)=(n3) Work of No-Temp Multiply M 1 (n)= ? 8 M 1 (n/2) +  (1) Work: — C ASE 1 cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; } cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; }

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; } cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; } cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; } cilk void MultA(*C, *A, *B, n) { // C = C + A * B h base case & partition matrices i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; } =(n)=(n) M 1 (n)= ? Span of No-Temp Multiply Span: — C ASE 1 2 M 1 (n/2) +  (1) maximum

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Parallelism of No-Temp Multiply M1(n)=(n3)M1(n)=(n3) Work: M1(n)=(n)M1(n)=(n) Span: Parallelism: M1(n)M1(n) M1(n)M1(n) =  (n 2 ) For 1000 £ 1000 matrices, parallelism ¼ (10 3 ) 3 /10 3 = Faster in practice!

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Testing Synchronization Cilk language feature: A programmer can check whether a Cilk procedure is “synched” (without actually performing a sync ) by testing the pseudovariable SYNCHED : SYNCHED = 0 ) some spawned children might not have returned. SYNCHED = 1 ) all spawned children have definitely returned.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Best of Both Worlds cilk void Mult1(*C, *A, *B, n) {// multiply & store h base case & partition matrices i spawn Mult1(C11,A11,B11,n/2); // multiply & store spawn Mult1(C12,A11,B12,n/2); spawn Mult1(C22,A21,B12,n/2); spawn Mult1(C21,A21,B11,n/2); if (SYNCHED) { spawn MultA1(C11,A12,B21,n/2); // multiply & add spawn MultA1(C12,A12,B22,n/2); spawn MultA1(C22,A22,B22,n/2); spawn MultA1(C21,A22,B21,n/2); } else { float *T = Cilk_alloca(n*n*sizeof(float)); spawn Mult1(T11,A12,B21,n/2); // multiply & store spawn Mult1(T12,A12,B22,n/2); spawn Mult1(T22,A22,B22,n/2); spawn Mult1(T21,A22,B21,n/2); sync; spawn Add(C,T,n); // C = C + T } sync; return; } cilk void Mult1(*C, *A, *B, n) {// multiply & store h base case & partition matrices i spawn Mult1(C11,A11,B11,n/2); // multiply & store spawn Mult1(C12,A11,B12,n/2); spawn Mult1(C22,A21,B12,n/2); spawn Mult1(C21,A21,B11,n/2); if (SYNCHED) { spawn MultA1(C11,A12,B21,n/2); // multiply & add spawn MultA1(C12,A12,B22,n/2); spawn MultA1(C22,A22,B22,n/2); spawn MultA1(C21,A22,B21,n/2); } else { float *T = Cilk_alloca(n*n*sizeof(float)); spawn Mult1(T11,A12,B21,n/2); // multiply & store spawn Mult1(T12,A12,B22,n/2); spawn Mult1(T22,A22,B22,n/2); spawn Mult1(T21,A22,B21,n/2); sync; spawn Add(C,T,n); // C = C + T } sync; return; } This code is just as parallel as the original, but it only uses more space if runtime parallelism actually exists.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Ordinary Matrix Multiplication c ij =  k = 1 n a ik b kj I DEA : Spawn n 2 inner products in parallel. Compute each inner product in parallel. Work:  (n 3 ) Span:  (lg n) Parallelism:  (n 3 /lg n) B UT, this algorithm exhibits poor locality and does not exploit the cache hierarchy of modern microprocessors, especially CMP’s.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Merging Two Sorted Arrays void Merge(int *C, int *A, int *B, int na, int nb) { while (na>0 && nb>0) { if (*A <= *B) { *C++ = *A++; na--; } else { *C++ = *B++; nb--; } while (na>0) { *C++ = *A++; na--; } while (nb>0) { *C++ = *B++; nb--; } void Merge(int *C, int *A, int *B, int na, int nb) { while (na>0 && nb>0) { if (*A <= *B) { *C++ = *A++; na--; } else { *C++ = *B++; nb--; } while (na>0) { *C++ = *A++; na--; } while (nb>0) { *C++ = *B++; nb--; } Time to merge n elements = ? (n). (n).

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } Merge Sort merge

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, =  (n lg n) T 1 (n)= ? 2 T 1 (n/2) +  (n) Work of Merge Sort Work: — C ASE 2 n log b a = n log 2 2 = n ) f (n) =  (n log b a lg 0 n). cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); }

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, T 1 (n)= ? T 1 (n/2) +  (n) Span of Merge Sort Span: — C ASE 3 =(n)=(n) n log b a = n log 2 1 = 1 ¿  (n). cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); }

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Parallelism of Merge Sort T 1 (n)=  (n lg n) Work: T1(n)=(n)T1(n)=(n) Span: Parallelism: T1(n)T1(n) T1(n)T1(n) =  (lg n) We need to parallelize the merge!

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, B A 0na 0nb na ¸ nb Parallel Merge · A[na/2] ¸ A[na/2] Binary search jj+1 Recursive merge Recursive merge na/2 · A[na/2] ¸ A[na/2] K EY I DEA : If the total number of elements to be merged in the two arrays is n = na + nb, the total number of elements in the larger of the two recursive merges is at most ? (3/4) n.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Parallel Merge cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { spawn P_Merge(C, B, A, nb, na); } else if (na==1) { if (nb == 0) { C[0] = A[0]; } else { C[0] = (A[0]<B[0]) ? A[0] : B[0]; /* minimum */ C[1] = (A[0]<B[0]) ? B[0] : A[0]; /* maximum */ } } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; } cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { spawn P_Merge(C, B, A, nb, na); } else if (na==1) { if (nb == 0) { C[0] = A[0]; } else { C[0] = (A[0]<B[0]) ? A[0] : B[0]; /* minimum */ C[1] = (A[0]<B[0]) ? B[0] : A[0]; /* maximum */ } } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; } Coarsen base cases for efficiency.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, T 1 (n)= ? T 1 (3n/4) +  (lg n) Span of P_Merge Span: — C ASE 2 =  (lg 2 n) cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) {  } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; } cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) {  } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; } n log b a = n log 4/3 1 = 1 ) f (n) =  (n log b a lg 1 n).

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, T 1 (n) = ? T 1 (  n) + T 1 ((1–  )n) +  (lg n), where 1/4 ·  · 3/4. Work of P_Merge Work: cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) {  } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; } cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) {  } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; } C LAIM : T 1 (n) =  (n).

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Analysis of Work Recurrence Substitution method: Inductive hypothesis is T 1 (k) · c 1 k – c 2 lg k, where c 1,c 2 > 0. Prove that the relation holds, and solve for c 1 and c 2. T 1 (n) = T 1 (  n) + T 1 ((1–  )n) +  (lg n), where 1/4 ·  · 3/4. T 1 (n)=T 1 (  n) + T 1 ((1–  )n) +  (lg n) · c 1 (  n) – c 2 lg(  n) + c 1 ((1–  )n) – c 2 lg((1–  )n) +  (lg n)

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Analysis of Work Recurrence T 1 (n)=T 1 (  n) + T 1 ((1–  )n) +  (lg n) · c 1 (  n) – c 2 lg(  n) + c 1 (1–  )n – c 2 lg((1–  )n) +  (lg n) T 1 (n) = T 1 (  n) + T 1 ((1–  )n) +  (lg n), where 1/4 ·  · 3/4.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, T 1 (n)=T 1 (  n) + T 1 ((1–  )n) +  (lg n) · c 1 (  n) – c 2 lg(  n) + c 1 (1–  )n – c 2 lg((1–  )n) +  (lg n) Analysis of Work Recurrence · c 1 n – c 2 lg(  n) – c 2 lg((1–  )n) +  (lg n) · c 1 n – c 2 ( lg(  (1–  )) + 2 lg n ) +  (lg n) · c 1 n – c 2 lg n – ( c 2 (lg n + lg(  (1–  ))) –  (lg n) ) · c 1 n – c 2 lg n by choosing c 1 and c 2 large enough. T 1 (n) = T 1 (  n) + T 1 ((1–  )n) +  (lg n), where 1/4 ·  · 3/4.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Parallelism of P_Merge T1(n)=(n)T1(n)=(n) Work: T 1 (n)=  (lg 2 n) Span: Parallelism: T1(n)T1(n) T1(n)T1(n) =  (n/lg 2 n)

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync; spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync; spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } Parallel Merge Sort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, T 1 (n)=2 T 1 (n/2) +  (n) Work of Parallel Merge Sort Work: — C ASE 2 =  (n lg n) cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync; spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync; spawn P_Merge(B, C, C+n/2, n/2, n-n/2); }

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Span of Parallel Merge Sort T 1 (n)= ? T 1 (n/2) +  (lg 2 n) Span: — C ASE 2 =  (lg 3 n) n log b a = n log 2 1 = 1 ) f (n) =  (n log b a lg 2 n). cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync; spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync; spawn P_Merge(B, C, C+n/2, n/2, n-n/2); }

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Parallelism of Merge Sort T 1 (n)=  (n lg n) Work: T 1 (n)=  (lg 3 n) Span: Parallelism: T1(n)T1(n) T1(n)T1(n) =  (n/lg 2 n)

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Tableau Construction A[i, j] = f ( A[i, j–1], A[i–1, j], A[i–1, j–1] ). Problem: Fill in an n £ n tableau A, where Dynamic programming Longest common subsequence Edit distance Time warping Work:  (n 2 ).

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, n n spawn I; sync; spawn II; spawn III; sync; spawn IV; sync; I I II III IV Cilk code Recursive Construction

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, n n Work: T 1 (n) = ? 4T 1 (n/2) +  (1) spawn I; sync; spawn II; spawn III; sync; spawn IV; sync; I I II III IV Cilk code Recursive Construction =  (n 2 ) — C ASE 1

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Span: T 1 (n) = ? n n spawn I; sync; spawn II; spawn III; sync; spawn IV; sync; I I II III IV Cilk code Recursive Construction 3T 1 (n/2) +  (1) =  (n lg3 ) — C ASE 1

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Analysis of Tableau Construction Work: T 1 (n) =  (n 2 ) Span: T 1 (n) =  (n lg3 ) ¼  (n 1.58 ) Parallelism: T1(n)T1(n) T1(n)T1(n) ¼  (n 0.42 )

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, n spawn I; sync; spawn II; spawn III; sync; spawn IV; spawn V; spawn VI sync; spawn VII; spawn VIII; sync; spawn IX; sync; A More-Parallel Construction I I II III IV V V VI VII VIII IX n

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, n spawn I; sync; spawn II; spawn III; sync; spawn IV; spawn V; spawn VI sync; spawn VII; spawn VIII; sync; spawn IX; sync; A More-Parallel Construction I I II III IV V V VI VII VIII IX n Work: T 1 (n) = ? 9T 1 (n/3) +  (1) =  (n 2 ) — C ASE 1

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, n spawn I; sync; spawn II; spawn III; sync; spawn IV; spawn V; spawn VI sync; spawn VII; spawn VIII; sync; spawn IX; sync; A More-Parallel Construction I I II III IV V V VI VII VIII IX n Span: T 1 (n) = ? 5T 1 (n/3) +  (1) =  (n log 3 5 ) — C ASE 1

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Analysis of Revised Construction Work: T 1 (n) =  (n 2 ) Span: T 1 (n) =  (n log 3 5 ) ¼  (n 1.46 ) Parallelism: T1(n)T1(n) T1(n)T1(n) ¼  (n 0.54 ) More parallel by a factor of  (n 0.54 )/  (n 0.42 ) =  (n 0.12 ).

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Puzzle You may only use basic Cilk control constructs ( spawn, sync ) for synchronization. No locks, synchronizing through memory, etc. What is the largest parallelism that can be obtained for the tableau- construction problem using Cilk?

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Key Ideas Cilk is simple: cilk, spawn, sync, SYNCHED Recurrences, recurrences, recurrences, … Work & span

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk —L ECTURE 2July 14, Minicourse Outline ● L ECTURE 1 Basic Cilk programming: Cilk keywords, performance measures, scheduling. ● L ECTURE 2 Analysis of Cilk algorithms: matrix multiplication, sorting, tableau construction. ● L ABORATORY Programming matrix multiplication in Cilk — Dr. Bradley C. Kuszmaul ● L ECTURE 3 Advanced Cilk programming: inlets, abort, speculation, data synchronization, & more.