U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Chapel: The Cascade High Productivity Language Ting Yang University of Massachusetts Amherst
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 2 Context DARPA HPCS Program Cray’s Cascade Project Chapel Language HPCS = High Productivity Computing Systems Programmability Performance Portability Robustness Cascade = Cray’s HPCS Project System-wide consideration of productivity impacts Processors, memory, network, OS Runtime, compilers, languages Chapel = Cascade High-Productivity Language IBM Sun
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 3 Introduction – Why Chapel Fragmented Model: MPI, SHMEM, UPC Write code on processor-by-processor basis Break data structure Break control flow Mix algorithms with per-processor management details in the computation Virtual processor topology Communication details Choice of data structures, memory layout Fail to support composition of parallelism Lack of productivity, flexibility, portability. Difficult to understand and maintain
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 4 Introduction Global-view Model: HPF, OpenMP, ZPL, NESL Need not decompose data and control flow Decomposition: compiler and runtime Users provide high level guides Natural and Intuitive Lack of abstractions: set, hash, graph Performance is not as good as MPL. Difficult to compile
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 5 Introduction - Chapel Chapel: Cascade High-Productivity Language Built from HPF and ZPL Strictly typed Overall goal: Simplify the creation of parallel programs Provide high-performance production-grade codes More generality Motivating Language Technologies: Multithreaded parallel programming Locality-aware programming Object-oriented programming Generic programming and type inference
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 6 Outline Introduction Multithreaded Parallel Programming Data Parallel Task Parallel Locality-aware Programming Data Distribution Computation Distribution Other Features Summery
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 7 Multithreaded Parallel Programming Provide global view of computation and data structures Composition of parallelism Abstraction of data and task parallelism Data: domains, arrays, graphs, Task: cobegins, atomic, sync variables Virtualization of threads locales
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 8 Data Parallelism: Domains Domain: an index set (first class) Specifies the size and shape of “arrays” Support sequence and parallel iteration Potentially decomposed across locales Each domain has an index type: index(domain) Fundamental concept of data parallelism Generalization of ZPL’s region Important Domains Arithmetic: indices are Cartesian tuples Arrays, multidimensional Arrays Can be strided and arbitrarily sparse Infinite: indices are hash keys Maps, hash tables, associative arrays Opaque: anonymous Sets, trees, graphs Others: Enumerate
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 9 Domain Declaration
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 10 More domain declarations
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 11 Domain Uses Declaring Arrays var A, B [D] : float Sub-array references A(DInner) = B(DInner); Sequential iteration for (i,j) in Dinner { … A(I,j)… } or: for ij in Dinner { …A(ij)… } Parallel iteration forall (i,j) in Dinner { … A(I,j)… } or: for [ij in Dinner { …A(ij)… } Array re-allocation D = [1..2*m, 1..2/n] A B A DInner B DInner D D
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 12 Infinite Domains var People: domain( string); var Age: [People] integer; var Birthdate: [People] string; Age(“john”) = 60; Birthdate[“john”] = “12/11/1946” forall person in People { if (Birthdate(person) == today ) { Age(person) += 1; }
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 13 Opaque Domains var Vertices: domain( opaque) for i in (1..5) { Vertices.newIndex(); } Var AV, BV: [Vertices] float Vertices AV BV
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 14 Building A Tree var Vertices: domain( opaque); var left, right: [Vertices] index(Vertices); var root: index(Vertices); root = Vertices.newIndex(); left(root) = Vertices.newIndex(); right(root) = Vertices.newIndex(); left(right(root)) = Vertices.newIndex(); root
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 15 The Domain/Index Hierarchy Every Domain has an Index type Eliminates most runtime boundary checks
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 16 Task Parallelism co-begins: statements that may run in parallel cobegin { ComputeTaskA (…); ComputeTaskB (…); } atomic blocks atomic { newnode.next = insertpt; newnode.prev = insertpt.prev; insertpt.prev.next = newnode; insertpt.prev = newnode; } sync and single-assignment variables Synchronize tasks ComputeTaskA ( ) { cobegin { ComputeTaskC (…); ComputeTaskD (…); } ComputeTaskE(…); }
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 17 Outline Introduction Multithreaded Parallel Programming Data Parallel Task Parallel Locality-aware Programming Data Distribution Computation Distribution Other Features Summery
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 18 Locality-aware programming locale: machine unit of storage and processing Specify number of locales on command-line./myProgram –nl 8 Chapel provides with built-in locale array: const Locales: [1..numLocales] locale ; Users may define their own locale arrays: var CompGrid: [1..GridRows, 1..GridCols] locale = …; var TaskALocs: [1..numTaskALocs] locale = …; var TaskBLocs: [1..numTaskBLocs] locale = …;
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 19 Data Distribution Domains can be distributed across locales var D: domain(2) distrubuted(block(2) to CompGrid) = …; Distributions specified by Mapping of indices to locales Per-locale storage layout of domain indices and array element Distributions implemented a a class hierarchy Chapel provides a group of standard distributions User may also write their own ??? Support reduce and scan (parallel prefix) Including user-defined operations
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 20 Computation Distribution “on” keyward associates tasks to locale(s) “on” can also used as data-driven manner
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 21 Outline Introduction Multithreaded Parallel Programming Data Parallel Task Parallel Locality-aware Programming Data Distribution Computation Distribution Other Features Summery
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 22 Other Features Object Oriented Interface Optional OO style overloading Advanced language features expressed in class Generics and Type Inferences Type variables and Parameters Similar to class template in C++ Sequences (“seq”), iterators; “ordered” keyword suppresses parallelism Modules (for name-space management) Parallel garbage collection ???
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 23 Outline Introduction Multithreaded Parallel Programming Data Parallel Task Parallel Locality-aware Programming Data Distribution Computation Distribution Other Features Chapel Status
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 24 Chapel Status First sequential prototype on one locale Not finished yet Currently can run programs simple domains up to 2-dimensions partial type Inferences Threads locales processors A full prototype in one or two years