Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multithreaded Programming in Cilk L ECTURE 3 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence.

Similar presentations


Presentation on theme: "Multithreaded Programming in Cilk L ECTURE 3 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence."— Presentation transcript:

1 Multithreaded Programming in Cilk L ECTURE 3 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

2 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 2 Minicourse Outline ● L ECTURE 1 Basic Cilk programming: Cilk keywords, performance measures, scheduling. ● L ECTURE 2 Analysis of Cilk algorithms: matrix multiplication, sorting, tableau construction. ● L ABORATORY Programming matrix multiplication in Cilk — Dr. Bradley C. Kuszmaul ● L ECTURE 3 Advanced Cilk programming: inlets, abort, speculation, data synchronization, & more.

3 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 3 L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

4 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 4 Operating on Returned Values Programmers may sometimes wish to incorporate a value returned from a spawned child into the parent frame by means other than a simple variable assignment. Cilk achieves this functionality using an internal function, called an inlet, which is executed as a secondary thread on the parent frame when the child returns. x += spawn foo(a,b,c); Example:

5 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 5 int max, ix = -1;  for (i=0; i<1000000; i++) { } sync; /* ix now indexes the largest foo(i) */ int max, ix = -1;  for (i=0; i<1000000; i++) { } sync; /* ix now indexes the largest foo(i) */ Semantics of Inlets The inlet keyword defines a void internal function to be an inlet. In the current implementation of Cilk, the inlet definition may not contain a spawn, and only the first argument of the inlet may be spawned at the call site. update ( spawn foo(i), i ); inlet void update ( int val, int index ) { if (idx == -1 || val > max) { ix = index; max = val; }

6 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 6 Semantics of Inlets int max, ix = -1; inlet void update ( int val, int index ) { if (idx == -1 || val > max) { ix = index; max = val; }  for (i=0; i<1000000; i++) { } sync; /* ix now indexes the largest foo(i) */ int max, ix = -1; inlet void update ( int val, int index ) { if (idx == -1 || val > max) { ix = index; max = val; }  for (i=0; i<1000000; i++) { } sync; /* ix now indexes the largest foo(i) */ 1.The non- spawn args to update() are evaluated. 2.The Cilk procedure foo(i) is spawned. 3.Control passes to the next statement. 4.When foo(i) returns, update() is invoked. update ( spawn foo(i), i );

7 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 7 Semantics of Inlets int max, ix = -1; inlet void update ( int val, int index ) { if (idx == -1 || val > max) { ix = index; max = val; }  for (i=0; i<1000000; i++) { update ( spawn foo(i), i ); } sync; /* ix now indexes the largest foo(i) */ int max, ix = -1; inlet void update ( int val, int index ) { if (idx == -1 || val > max) { ix = index; max = val; }  for (i=0; i<1000000; i++) { update ( spawn foo(i), i ); } sync; /* ix now indexes the largest foo(i) */ Cilk provides implicit atomicity among the threads belonging to the same frame, and thus no locking is necessary to avoid data races.

8 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 8 Implicit Inlets cilk int wfib(int n) { if (n == 0) { return 0; } else { int i, x = 1; for (i=0; i<=n-2; i++) { x += spawn wfib(i); } sync; return x; } cilk int wfib(int n) { if (n == 0) { return 0; } else { int i, x = 1; for (i=0; i<=n-2; i++) { x += spawn wfib(i); } sync; return x; } For assignment operators, the Cilk compiler automatically generates an implicit inlet to perform the update.

9 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 9 L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

10 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 10 Computing a Product p =  A i i = 0 n int product(int *A, int n) { int i, p=1; for (i=0; i<n; i++) { p *= A[i]; int product(int *A, int n) { int i, p=1; for (i=0; i<n; i++) { p *= A[i]; } return p; } return p; } Optimization: Quit early if the partial product ever becomes 0.

11 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 11 Computing a Product p =  A i i = 0 n int product(int *A, int n) { int i, p=1; for (i=0; i<n; i++) { p *= A[i]; int product(int *A, int n) { int i, p=1; for (i=0; i<n; i++) { p *= A[i]; } return p; } return p; } if (p == 0) break; Optimization: Quit early if the partial product ever becomes 0.

12 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 12 Computing a Product in Parallel p =  A i i = 0 n cilk int prod(int *A, int n) { int p = 1; if (n == 1) { return A[0]; } else { p *= spawn product(A, n/2); p *= spawn product(A+n/2, n-n/2); sync; return p; } cilk int prod(int *A, int n) { int p = 1; if (n == 1) { return A[0]; } else { p *= spawn product(A, n/2); p *= spawn product(A+n/2, n-n/2); sync; return p; } How do we quit early if we discover a zero?

13 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 13 Cilk’s Abort Feature cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } 1. Recode the implicit inlet to make it explicit.

14 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 14 cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } Cilk’s Abort Feature 2. Check for 0 within the inlet.

15 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 15 cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } Cilk’s Abort Feature 2. Check for 0 within the inlet.

16 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 16 cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } Cilk’s Abort Feature

17 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 17 cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); if (p == 0) { /* Don’t spawn if we’ve */ return 0; /* already aborted! */ } mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); if (p == 0) { /* Don’t spawn if we’ve */ return 0; /* already aborted! */ } mult( spawn product(A+n/2, n-n/2) ); sync; return p; } Cilk’s Abort Feature Implicit atomicity eases reasoning about races.

18 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 18 L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

19 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 19 Min-Max Search ●Two players: MAX and MIN. ●The game tree represents all moves from the current position within a given search depth. ●At leaves, apply a static evaluation function. ●MAX chooses the maximum score among its children. ●MIN chooses the minimum score among its children.

20 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 20 5 5 Alpha-Beta Pruning 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

21 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 21 5 5 Alpha-Beta Pruning 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

22 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 22 5 5 Alpha-Beta Pruning 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

23 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 23 5 5 Alpha-Beta Pruning 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

24 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 24 5 5 Alpha-Beta Pruning 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 3 3 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

25 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 25 5 5 Alpha-Beta Pruning 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 3 3 3 3 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

26 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 26 5 5 Alpha-Beta Pruning 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

27 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 27 5 5 Alpha-Beta Pruning 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

28 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 28 5 5 Alpha-Beta Pruning 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

29 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 29 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

30 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 30 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

31 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 31 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

32 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 32 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 2 2 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

33 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 33 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 2 2 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

34 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 34 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 2 2 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 ¸6¸6 ¸6¸6 5 5  I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

35 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 35 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 2 2 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 ¸6¸6 ¸6¸6 5 5  I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

36 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 36 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 2 2 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 ¸6¸6 ¸6¸6 5 5  I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

37 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 37 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 2 2 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 ¸6¸6 ¸6¸6 5 5  I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

38 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 38 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 2 2 7 7 4 4 8 8 ¸6¸6 ¸6¸6 3 3 6 6 4 4 3 3 6 6 ¸6¸6 ¸6¸6 5 5  4 4 8 8 I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

39 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 39 5 5 Alpha-Beta Pruning 6 6 2 2 9 9 2 2 7 7 4 4 8 8 3 3 6 6 4 4 3 3 6 6 ¸6¸6 ¸6¸6 5 5  ¸6¸6 ¸6¸6  4 4 8 8 Unfortunately, this heuristic is inherently serial. I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

40 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 40 O BSERVATION: In a best-ordered tree, the degree of every internal node is either 1 or maximal. Parallel Min-Max Search I DEA: [Feldman-Mysliwietz-Monien 91] If the first child fails to generate a cutoff, speculate that the remaining children can be searched in parallel without wasting any work: “young brothers wait.”

41 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 41 Parallel Alpha-Beta (I) cilk int search(position *prev, int move, int depth) { position cur; /* Current position */ int bestscore = -INF; /* Best score so far */ int num_moves; /* Number of children */ int mv; /* Index of child */ int sc; /* Child’s score */ int cutoff = FALSE; /* Have we seen a cutoff? */ cilk int search(position *prev, int move, int depth) { position cur; /* Current position */ int bestscore = -INF; /* Best score so far */ int num_moves; /* Number of children */ int mv; /* Index of child */ int sc; /* Child’s score */ int cutoff = FALSE; /* Have we seen a cutoff? */ ●View from MAX’s perspective; MIN’s viewpoint can be obtained by negating scores — negamax. ●The node generates its current position from its parent’s position prev and move. ●The alpha and beta limits and the move list are fields of the position data structure. #Cilk keywords used so far 1 1

42 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 42 Parallel Alpha-Beta (II) inlet void get_score(int child_sc) { child_sc = -child_sc; /* Negamax */ if (child_sc > bestscore) { bestscore = child_sc; if (child_sc > cur.alpha) { cur.alpha = child_sc; if (child_sc >= cur.beta) { /* Beta cutoff */ cutoff = TRUE; /* No need to search more */ abort; /* Terminate other children */ } inlet void get_score(int child_sc) { child_sc = -child_sc; /* Negamax */ if (child_sc > bestscore) { bestscore = child_sc; if (child_sc > cur.alpha) { cur.alpha = child_sc; if (child_sc >= cur.beta) { /* Beta cutoff */ cutoff = TRUE; /* No need to search more */ abort; /* Terminate other children */ } 1 1 2 2 3 3

43 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 43 Parallel Alpha-Beta (III) /* Create current position and set up for search */ make_move(prev, move, &cur); sc = eval(&cur); /* Static evaluation */ if ( abs(sc)>=MATE || depth<=0 ) { /* Leaf node */ return (sc); } cur.alpha = -prev->beta; /* Negamax */ cur.beta = -prev->alpha; /* Generate moves, hopefully in best-first order*/ num_moves = gen_moves(&cur); /* Create current position and set up for search */ make_move(prev, move, &cur); sc = eval(&cur); /* Static evaluation */ if ( abs(sc)>=MATE || depth<=0 ) { /* Leaf node */ return (sc); } cur.alpha = -prev->beta; /* Negamax */ cur.beta = -prev->alpha; /* Generate moves, hopefully in best-first order*/ num_moves = gen_moves(&cur); 3 3

44 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 44 Parallel Alpha-Beta (IV) /* Search the moves */ for (mv=0; !cutoff && mv<num_moves; mv++) { get_score( spawn search(&cur, mv, depth-1) ); if (mv==0) sync; /* Young brothers wait */ } sync; return (bestscore); } /* Search the moves */ for (mv=0; !cutoff && mv<num_moves; mv++) { get_score( spawn search(&cur, mv, depth-1) ); if (mv==0) sync; /* Young brothers wait */ } sync; return (bestscore); } 3 3 4 4 5 5 6 6 ● Only 6 Cilk keywords need be embedded in the C program to parallelize it. ● In fact, the program can be parallelized using only 5 keywords at the expense of minimal obfuscation.

45 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 45 L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

46 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 46 Mutual Exclusion Cilk’s solution to mutual exclusion is no better than anybody else’s. Cilk provides a library of spin locks declared with Cilk_lockvar. To avoid deadlock with the Cilk scheduler, a lock should only be held within a Cilk thread. I.e., spawn and sync should not be executed while a lock is held. Fortunately, Cilk’s control parallelism often mitigates the need for extensive locking.

47 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 47 Cilk’s Memory Model Programmers may also synchronize through memory using lock-free protocols, although Cilk is agnostic on consistency model. If a program contains no data races, Cilk effectively supports sequential consistency. If a program contains data races, Cilk’s behavior depends on the consistency model of the underlying hardware. To aid portability, the Cilk_fence() function implements a memory barrier on machines with weak memory models.

48 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 48 Cilk’s Nondeterminator debugging tool provably guarantees to detect and localize data-race bugs. Debugging Data Races FAIL Information localizing a data race. PASS Every execution produces the same result. Input data set “Abelian” Cilk program A data race occurs whenever two logically parallel threads, holding no locks in common, access the same location and one of the threads modifies the location.

49 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 49 L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

50 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 50 Compiling Cilk Cilk source Cilk source cilk2c C post- source C post- source source-to-source translator gcc object code object code C compiler ld Cilk RTS Cilk RTS binary linking loader cilk2c translates straight C code into identical C postsource. The cilkc compiler encapsulates the process.

51 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 51 SLOW FAST The fast clone is always spawned, saving live variables on Cilk’s work deque (shadow stack). fast clone—serial, common-case code. slow clone—code with parallel bookkeeping. The cilk2c translator generates two “clones” of each Cilk procedure: A check is made whenever a procedure returns to see if the resuming parent has been stolen. The slow clone is resumed if a thread is stolen, restoring variables from the shadow stack. Cilk’s Compiler Strategy

52 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 52 Compiling spawn — Fast Clone suspend parent run child resume parent remotely cilk2c x = spawn fib(n-1); Cilk source frame->entry = 1; frame->n = n; push(frame); x = fib(n-1); if (pop()==FAILURE) { frame->x = x; frame->join--; h clean up & return to scheduler i } C post- source entry join n n x x y y entry join Cilk deque frame

53 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 53 No synchronization overhead in the fast clone! Compiling sync — Fast Clone sync; cilk2c Cilk source ; C post- source SLOW FAST

54 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 54 Compiling the Slow Clone void fib_slow(fib_frame *frame) { int n,x,y; switch (frame->entry) { case 1: goto L1; case 2: goto L2; case 3: goto L3; }  frame->entry = 1; frame->n = n; push(frame); x = fib(n-1); if (pop()==FAILURE) { frame->x = x; frame->join--; h clean up & return to scheduler i } if (0) { L1:; n = frame->n; }  } entry join n n x x y y entry join Cilk deque restore program counter continue same as fast clone restore local variables if resuming frame

55 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 55 Breakdown of Work Overhead C state saving frame allocation stealing protocol MIPS R10000 UltraSPARC I Pentium Pro Alpha 21164 T1/TST1/TS Benchmark: fib on one processor. 0 27ns 1234567 78ns 113ns 115ns (circa 1997)

56 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 56 L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

57 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 57 The JCilk System JCilk to Jgo Jgo Compiler JVM JCilk RTS Fib.jcilkFib.jgo Jgo = Java + goto. The Jgo compiler was built by modifying gcj to accept goto statements so that a continuation mechanism for JCilk could be implemented. JCilk Compiler Fib.class

58 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 58 JCilk Keywords cilk spawn sync SYNCHED inlet abort Same as Cilk, except that cilk can also modify try. Eliminated! JCilk leverages Java’s exception mechanism to render two Cilk keywords unnecessary.

59 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 59 Exception Handling in Java “During the process of throwing an exception, the Java virtual machine abruptly completes, one by one, any expressions, statements, method and constructor invocations, initializers, and field initialization expressions that have begun but not completed execution in the current thread. This process continues until a handler is found that indicates that it handles that particular exception by naming the class of the exception or a superclass of the class of the exception.” — J. Gosling, B Joy, G. Steele, and G. Bracha, Java Language Specification, 2000, pp. 219–220.

60 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 60 Exception Handling in JCilk private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; }

61 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 61 private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; } Exception! Exception Handling in JCilk An exception causes all subcomputations dynamically enclosed by the catching clause to abort!

62 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 62 private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; } Exception Handling in JCilk An exception causes all subcomputations dynamically enclosed by the catching clause to abort! ArithmeticEx’n Nothing aborts.

63 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 63 private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; } Exception Handling in JCilk An exception causes all subcomputations dynamically enclosed by the catching clause to abort! RuntimeEx’n

64 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 64 private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; } IOException Exception Handling in JCilk An exception causes all subcomputations dynamically enclosed by the catching clause to abort!

65 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 65 private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; } Exception Handling in JCilk RuntimeEx’n The appropriate catch clause is executed only after all spawned methods within the corresponding try block terminate. RuntimeEx’n

66 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 66 JCilk’s Exception Mechanism JCilk’s exception semantics allow programs such as alpha-beta to be coded without Cilk’s inlet and abort keywords. Unfortunately, Java exceptions are slow, reducing the utility of JCilk’s faithful extension.

67 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 67 L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

68 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 68 Future Work Adaptive computing Get rid of --nproc. Build a job scheduler that uses parallelism feedback to balance processor resources among Cilk jobs. Integrating Cilk with static threads Currently, interfacing a Cilk program to other system processes requires arcane knowledge. Build linguistic support into Cilk for Cilk processes that communicate. Develop a job scheduler that uses pipeload to allocate resources among Cilk processes.

69 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 69 Key Ideas Cilk is simple: cilk, spawn, sync, SYNCHED, inlet, abort JCilk is simpler Work & span

70 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 70 Open-Cilk Consortium We are in the process of forming a consortium to manage, organize, and promote Cilk open-source technology. If you are interested in participating, please let us know.

71 © 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, 2006 71 ACM Symposium on Parallelism in Algorithms and Architectures Cambridge, MA, USA July 30 – August 2, 2006 SPAA 2006


Download ppt "Multithreaded Programming in Cilk L ECTURE 3 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence."

Similar presentations


Ads by Google