Multithreaded Programming in Cilk L ECTURE 3 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
© 2009 Charles E. Leiserson and Pablo Halpern1 Introduction to Cilk++ Programming PADTAD July 20, 2009 Cilk, Cilk++, Cilkview, and Cilkscreen, are trademarks.
Deadlock and Starvation
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Scalable and Precise Dynamic Datarace Detection for Structured Parallelism Raghavan RamanJisheng ZhaoVivek Sarkar Rice University June 13, 2012 Martin.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science The Implementation of the Cilk-5 Multithreaded Language (Frigo, Leiserson, and.
CILK: An Efficient Multithreaded Runtime System. People n Project at MIT & now at UT Austin –Bobby Blumofe (now UT Austin, Akamai) –Chris Joerg –Brad.
Parallelizing Incremental Bayesian Segmentation (IBS) Joseph Hastings Sid Sen.
Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.
Using Search in Problem Solving
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
Exceptions in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Synchronization in Java Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
29-Jun-15 Java Concurrency. Definitions Parallel processes—two or more Threads are running simultaneously, on different cores (processors), in the same.
Multithreading in Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Cilk CISC 879 Parallel Computation Erhan Atilla Avinal.
1 Further OO Concepts II – Java Program at run-time Overview l Steps in Executing a Java Program. l Loading l Linking l Initialization l Creation of Objects.
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:
A Bridge to Your First Computer Science Course Prof. H.E. Dunsmore Concurrent Programming Threads Synchronization.
Programming Languages and Paradigms Object-Oriented Programming.
Multithreaded Programming in Cilk L ECTURE 2 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence.
Object Oriented Programming Lecture 8: Introduction to laboratorial exercise – part II, Introduction to GUI frames in Netbeans, Introduction to threads.
October 3, 2012Introduction to Artificial Intelligence Lecture 9: Two-Player Games 1 Iterative Deepening A* Algorithm A* has memory demands that increase.
Lecture 5 : JAVA Thread Programming Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.
Lecture 2 Foundations and Definitions Processes/Threads.
Java Threads 11 Threading and Concurrent Programming in Java Introduction and Definitions D.W. Denbo Introduction and Definitions D.W. Denbo.
1 Processes, Threads, Race Conditions & Deadlocks Operating Systems Review.
111 © 2002, Cisco Systems, Inc. All rights reserved.
BIO Java 1 Exception Handling Aborting program not always a good idea – can’t lose messages – E-commerce: must ensure correct handling of private.
ICS 313: Programming Language Theory Chapter 13: Concurrency.
Java Thread and Memory Model
Exceptions and Assertions Chapter 15 – CSCI 1302.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
CS533 – Spring Jeanie M. Schwenk Experiences and Processes and Monitors with Mesa What is Mesa? “Mesa is a strongly typed, block structured programming.
CS212: Object Oriented Analysis and Design Lecture 19: Exception Handling.
Detecting Atomicity Violations via Access Interleaving Invariants
Winter 2006CISC121 - Prof. McLeod1 Stuff No stuff today!
Threads in Java Threads Introduction: After completing this chapter, you will be able to code your own thread, control them efficiently without.
Thread A thread represents an independent module of an application that can be concurrently execution With other modules of the application. MULTITHREADING.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
CMSC 202 Computer Science II for Majors. CMSC 202UMBC Topics Exceptions Exception handling.
1 Cilk Chao Huang CS498LVK. 2 Introduction A multithreaded parallel programming language Effective for exploiting dynamic, asynchronous parallelism (Chess.
Agenda  Quick Review  Finish Introduction  Java Threads.
CILK: An Efficient Multithreaded Runtime System
Computer Engg, IIT(BHU)
Changing thread semantics
Multithreaded Programming in Cilk LECTURE 1
Introduction to CILK Some slides are from:
Threads Chapter 4.
Java Concurrency 17-Jan-19.
Languages and Compilers (SProg og Oversættere)
Cilk A C language for programming dynamic multithreaded applications on shared-memory multiprocessors. Example applications: virus shell assembly graphics.
Java Concurrency.
Java Concurrency.
Cilk and Writing Code for Hardware
Foundations and Definitions
Java Concurrency 29-May-19.
Introduction to CILK Some slides are from:
CSc 453 Interpreters & Interpretation
Presentation transcript:

Multithreaded Programming in Cilk L ECTURE 3 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Minicourse Outline ● L ECTURE 1 Basic Cilk programming: Cilk keywords, performance measures, scheduling. ● L ECTURE 2 Analysis of Cilk algorithms: matrix multiplication, sorting, tableau construction. ● L ABORATORY Programming matrix multiplication in Cilk — Dr. Bradley C. Kuszmaul ● L ECTURE 3 Advanced Cilk programming: inlets, abort, speculation, data synchronization, & more.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Operating on Returned Values Programmers may sometimes wish to incorporate a value returned from a spawned child into the parent frame by means other than a simple variable assignment. Cilk achieves this functionality using an internal function, called an inlet, which is executed as a secondary thread on the parent frame when the child returns. x += spawn foo(a,b,c); Example:

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, int max, ix = -1;  for (i=0; i< ; i++) { } sync; /* ix now indexes the largest foo(i) */ int max, ix = -1;  for (i=0; i< ; i++) { } sync; /* ix now indexes the largest foo(i) */ Semantics of Inlets The inlet keyword defines a void internal function to be an inlet. In the current implementation of Cilk, the inlet definition may not contain a spawn, and only the first argument of the inlet may be spawned at the call site. update ( spawn foo(i), i ); inlet void update ( int val, int index ) { if (idx == -1 || val > max) { ix = index; max = val; }

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Semantics of Inlets int max, ix = -1; inlet void update ( int val, int index ) { if (idx == -1 || val > max) { ix = index; max = val; }  for (i=0; i< ; i++) { } sync; /* ix now indexes the largest foo(i) */ int max, ix = -1; inlet void update ( int val, int index ) { if (idx == -1 || val > max) { ix = index; max = val; }  for (i=0; i< ; i++) { } sync; /* ix now indexes the largest foo(i) */ 1.The non- spawn args to update() are evaluated. 2.The Cilk procedure foo(i) is spawned. 3.Control passes to the next statement. 4.When foo(i) returns, update() is invoked. update ( spawn foo(i), i );

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Semantics of Inlets int max, ix = -1; inlet void update ( int val, int index ) { if (idx == -1 || val > max) { ix = index; max = val; }  for (i=0; i< ; i++) { update ( spawn foo(i), i ); } sync; /* ix now indexes the largest foo(i) */ int max, ix = -1; inlet void update ( int val, int index ) { if (idx == -1 || val > max) { ix = index; max = val; }  for (i=0; i< ; i++) { update ( spawn foo(i), i ); } sync; /* ix now indexes the largest foo(i) */ Cilk provides implicit atomicity among the threads belonging to the same frame, and thus no locking is necessary to avoid data races.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Implicit Inlets cilk int wfib(int n) { if (n == 0) { return 0; } else { int i, x = 1; for (i=0; i<=n-2; i++) { x += spawn wfib(i); } sync; return x; } cilk int wfib(int n) { if (n == 0) { return 0; } else { int i, x = 1; for (i=0; i<=n-2; i++) { x += spawn wfib(i); } sync; return x; } For assignment operators, the Cilk compiler automatically generates an implicit inlet to perform the update.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Computing a Product p =  A i i = 0 n int product(int *A, int n) { int i, p=1; for (i=0; i<n; i++) { p *= A[i]; int product(int *A, int n) { int i, p=1; for (i=0; i<n; i++) { p *= A[i]; } return p; } return p; } Optimization: Quit early if the partial product ever becomes 0.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Computing a Product p =  A i i = 0 n int product(int *A, int n) { int i, p=1; for (i=0; i<n; i++) { p *= A[i]; int product(int *A, int n) { int i, p=1; for (i=0; i<n; i++) { p *= A[i]; } return p; } return p; } if (p == 0) break; Optimization: Quit early if the partial product ever becomes 0.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Computing a Product in Parallel p =  A i i = 0 n cilk int prod(int *A, int n) { int p = 1; if (n == 1) { return A[0]; } else { p *= spawn product(A, n/2); p *= spawn product(A+n/2, n-n/2); sync; return p; } cilk int prod(int *A, int n) { int p = 1; if (n == 1) { return A[0]; } else { p *= spawn product(A, n/2); p *= spawn product(A+n/2, n-n/2); sync; return p; } How do we quit early if we discover a zero?

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Cilk’s Abort Feature cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } 1. Recode the implicit inlet to make it explicit.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } Cilk’s Abort Feature 2. Check for 0 within the inlet.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } Cilk’s Abort Feature 2. Check for 0 within the inlet.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } Cilk’s Abort Feature

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); if (p == 0) { /* Don’t spawn if we’ve */ return 0; /* already aborted! */ } mult( spawn product(A+n/2, n-n/2) ); sync; return p; } cilk int product(int *A, int n) { int p = 1; inlet void mult(int x) { p *= x; if (p == 0) { abort; /* Aborts existing children, */ } /* but not future ones. */ return; } if (n == 1) { return A[0]; } else { mult( spawn product(A, n/2) ); if (p == 0) { /* Don’t spawn if we’ve */ return 0; /* already aborted! */ } mult( spawn product(A+n/2, n-n/2) ); sync; return p; } Cilk’s Abort Feature Implicit atomicity eases reasoning about races.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Min-Max Search ●Two players: MAX and MIN. ●The game tree represents all moves from the current position within a given search depth. ●At leaves, apply a static evaluation function. ●MAX chooses the maximum score among its children. ●MIN chooses the minimum score among its children.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning ¸6¸6 ¸6¸6 5 5  I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning ¸6¸6 ¸6¸6 5 5  I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning ¸6¸6 ¸6¸6 5 5  I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning ¸6¸6 ¸6¸6 5 5  I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning ¸6¸6 ¸6¸ ¸6¸6 ¸6¸6 5 5  I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Alpha-Beta Pruning ¸6¸6 ¸6¸6 5 5  ¸6¸6 ¸6¸6  Unfortunately, this heuristic is inherently serial. I DEA : If MAX discovers a move so good that MIN would never allow that position, MAX’s other children need not be searched — beta cutoff.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, O BSERVATION: In a best-ordered tree, the degree of every internal node is either 1 or maximal. Parallel Min-Max Search I DEA: [Feldman-Mysliwietz-Monien 91] If the first child fails to generate a cutoff, speculate that the remaining children can be searched in parallel without wasting any work: “young brothers wait.”

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Parallel Alpha-Beta (I) cilk int search(position *prev, int move, int depth) { position cur; /* Current position */ int bestscore = -INF; /* Best score so far */ int num_moves; /* Number of children */ int mv; /* Index of child */ int sc; /* Child’s score */ int cutoff = FALSE; /* Have we seen a cutoff? */ cilk int search(position *prev, int move, int depth) { position cur; /* Current position */ int bestscore = -INF; /* Best score so far */ int num_moves; /* Number of children */ int mv; /* Index of child */ int sc; /* Child’s score */ int cutoff = FALSE; /* Have we seen a cutoff? */ ●View from MAX’s perspective; MIN’s viewpoint can be obtained by negating scores — negamax. ●The node generates its current position from its parent’s position prev and move. ●The alpha and beta limits and the move list are fields of the position data structure. #Cilk keywords used so far 1 1

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Parallel Alpha-Beta (II) inlet void get_score(int child_sc) { child_sc = -child_sc; /* Negamax */ if (child_sc > bestscore) { bestscore = child_sc; if (child_sc > cur.alpha) { cur.alpha = child_sc; if (child_sc >= cur.beta) { /* Beta cutoff */ cutoff = TRUE; /* No need to search more */ abort; /* Terminate other children */ } inlet void get_score(int child_sc) { child_sc = -child_sc; /* Negamax */ if (child_sc > bestscore) { bestscore = child_sc; if (child_sc > cur.alpha) { cur.alpha = child_sc; if (child_sc >= cur.beta) { /* Beta cutoff */ cutoff = TRUE; /* No need to search more */ abort; /* Terminate other children */ }

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Parallel Alpha-Beta (III) /* Create current position and set up for search */ make_move(prev, move, &cur); sc = eval(&cur); /* Static evaluation */ if ( abs(sc)>=MATE || depth<=0 ) { /* Leaf node */ return (sc); } cur.alpha = -prev->beta; /* Negamax */ cur.beta = -prev->alpha; /* Generate moves, hopefully in best-first order*/ num_moves = gen_moves(&cur); /* Create current position and set up for search */ make_move(prev, move, &cur); sc = eval(&cur); /* Static evaluation */ if ( abs(sc)>=MATE || depth<=0 ) { /* Leaf node */ return (sc); } cur.alpha = -prev->beta; /* Negamax */ cur.beta = -prev->alpha; /* Generate moves, hopefully in best-first order*/ num_moves = gen_moves(&cur); 3 3

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Parallel Alpha-Beta (IV) /* Search the moves */ for (mv=0; !cutoff && mv<num_moves; mv++) { get_score( spawn search(&cur, mv, depth-1) ); if (mv==0) sync; /* Young brothers wait */ } sync; return (bestscore); } /* Search the moves */ for (mv=0; !cutoff && mv<num_moves; mv++) { get_score( spawn search(&cur, mv, depth-1) ); if (mv==0) sync; /* Young brothers wait */ } sync; return (bestscore); } ● Only 6 Cilk keywords need be embedded in the C program to parallelize it. ● In fact, the program can be parallelized using only 5 keywords at the expense of minimal obfuscation.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Mutual Exclusion Cilk’s solution to mutual exclusion is no better than anybody else’s. Cilk provides a library of spin locks declared with Cilk_lockvar. To avoid deadlock with the Cilk scheduler, a lock should only be held within a Cilk thread. I.e., spawn and sync should not be executed while a lock is held. Fortunately, Cilk’s control parallelism often mitigates the need for extensive locking.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Cilk’s Memory Model Programmers may also synchronize through memory using lock-free protocols, although Cilk is agnostic on consistency model. If a program contains no data races, Cilk effectively supports sequential consistency. If a program contains data races, Cilk’s behavior depends on the consistency model of the underlying hardware. To aid portability, the Cilk_fence() function implements a memory barrier on machines with weak memory models.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Cilk’s Nondeterminator debugging tool provably guarantees to detect and localize data-race bugs. Debugging Data Races FAIL Information localizing a data race. PASS Every execution produces the same result. Input data set “Abelian” Cilk program A data race occurs whenever two logically parallel threads, holding no locks in common, access the same location and one of the threads modifies the location.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Compiling Cilk Cilk source Cilk source cilk2c C post- source C post- source source-to-source translator gcc object code object code C compiler ld Cilk RTS Cilk RTS binary linking loader cilk2c translates straight C code into identical C postsource. The cilkc compiler encapsulates the process.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, SLOW FAST The fast clone is always spawned, saving live variables on Cilk’s work deque (shadow stack). fast clone—serial, common-case code. slow clone—code with parallel bookkeeping. The cilk2c translator generates two “clones” of each Cilk procedure: A check is made whenever a procedure returns to see if the resuming parent has been stolen. The slow clone is resumed if a thread is stolen, restoring variables from the shadow stack. Cilk’s Compiler Strategy

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Compiling spawn — Fast Clone suspend parent run child resume parent remotely cilk2c x = spawn fib(n-1); Cilk source frame->entry = 1; frame->n = n; push(frame); x = fib(n-1); if (pop()==FAILURE) { frame->x = x; frame->join--; h clean up & return to scheduler i } C post- source entry join n n x x y y entry join Cilk deque frame

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, No synchronization overhead in the fast clone! Compiling sync — Fast Clone sync; cilk2c Cilk source ; C post- source SLOW FAST

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Compiling the Slow Clone void fib_slow(fib_frame *frame) { int n,x,y; switch (frame->entry) { case 1: goto L1; case 2: goto L2; case 3: goto L3; }  frame->entry = 1; frame->n = n; push(frame); x = fib(n-1); if (pop()==FAILURE) { frame->x = x; frame->join--; h clean up & return to scheduler i } if (0) { L1:; n = frame->n; }  } entry join n n x x y y entry join Cilk deque restore program counter continue same as fast clone restore local variables if resuming frame

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Breakdown of Work Overhead C state saving frame allocation stealing protocol MIPS R10000 UltraSPARC I Pentium Pro Alpha T1/TST1/TS Benchmark: fib on one processor. 0 27ns ns 113ns 115ns (circa 1997)

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, The JCilk System JCilk to Jgo Jgo Compiler JVM JCilk RTS Fib.jcilkFib.jgo Jgo = Java + goto. The Jgo compiler was built by modifying gcj to accept goto statements so that a continuation mechanism for JCilk could be implemented. JCilk Compiler Fib.class

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, JCilk Keywords cilk spawn sync SYNCHED inlet abort Same as Cilk, except that cilk can also modify try. Eliminated! JCilk leverages Java’s exception mechanism to render two Cilk keywords unnecessary.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Exception Handling in Java “During the process of throwing an exception, the Java virtual machine abruptly completes, one by one, any expressions, statements, method and constructor invocations, initializers, and field initialization expressions that have begun but not completed execution in the current thread. This process continues until a handler is found that indicates that it handles that particular exception by naming the class of the exception or a superclass of the class of the exception.” — J. Gosling, B Joy, G. Steele, and G. Bracha, Java Language Specification, 2000, pp. 219–220.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Exception Handling in JCilk private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; }

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; } Exception! Exception Handling in JCilk An exception causes all subcomputations dynamically enclosed by the catching clause to abort!

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; } Exception Handling in JCilk An exception causes all subcomputations dynamically enclosed by the catching clause to abort! ArithmeticEx’n Nothing aborts.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; } Exception Handling in JCilk An exception causes all subcomputations dynamically enclosed by the catching clause to abort! RuntimeEx’n

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; } IOException Exception Handling in JCilk An exception causes all subcomputations dynamically enclosed by the catching clause to abort!

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, private cilk void foo() throws IOException { spawn A(); cilk try { spawn B(); cilk try { spawn C(); } catch(ArithmeticEx’n e) { doSomething(); } } catch(RuntimeException e) { doSomethingElse(); } spawn D(); doYetSomethingElse(); sync; } Exception Handling in JCilk RuntimeEx’n The appropriate catch clause is executed only after all spawned methods within the corresponding try block terminate. RuntimeEx’n

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, JCilk’s Exception Mechanism JCilk’s exception semantics allow programs such as alpha-beta to be coded without Cilk’s inlet and abort keywords. Unfortunately, Java exceptions are slow, reducing the utility of JCilk’s faithful extension.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, L ECTURE 3 Data Synchronization JCilk Speculative Computing Conclusion Under the Covers Inlets Abort

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Future Work Adaptive computing Get rid of --nproc. Build a job scheduler that uses parallelism feedback to balance processor resources among Cilk jobs. Integrating Cilk with static threads Currently, interfacing a Cilk program to other system processes requires arcane knowledge. Build linguistic support into Cilk for Cilk processes that communicate. Develop a job scheduler that uses pipeload to allocate resources among Cilk processes.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Key Ideas Cilk is simple: cilk, spawn, sync, SYNCHED, inlet, abort JCilk is simpler Work & span

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, Open-Cilk Consortium We are in the process of forming a consortium to manage, organize, and promote Cilk open-source technology. If you are interested in participating, please let us know.

© 2006 by Charles E. LeisersonMultithreaded Programming in Cilk — L ECTURE 3July 17, ACM Symposium on Parallelism in Algorithms and Architectures Cambridge, MA, USA July 30 – August 2, 2006 SPAA 2006