IBM’s X10 Presentation by Isaac Dooley CS498LVK Spring 2006
Sources: Report on the Experimental Language X10, Vijay Saraswat X10 Tutorial v10.pdf
Goal of X10 “The fundamental goal of X10 is to enable scalable, high-performance, high-productivity transformational programming for high-end computers -- for traditional numerical computation workloads… as well as commercial server workloads.”
Description “X10 is a type-safe, modern, parallel, distributed object-oriented language intended to be very easily accessible to Java(TM) programmers. It is targeted to future low-end and high-end systems with nodes that are built out of multi-core SMP chips with non- uniform memory hierarchies, and interconnected in scalable cluster configurations. …
Description … A member of the Partitioned Global Address Space (PGAS) family of languages, X10 highlights the explicit reification of locality in the form of places; lightweight activities embodied in async, future, foreach, and ateach constructs; constructs for termination detection (finish) and phased computation (clocks); the use of lock-free synchronization (atomic blocks); and the manipulation of global arrays and data structures.”
Memory Model Not Distributed Memory, Not Shared Memory “Fragmented Memory Model” Partitioned Global Address Space(GAS) –Like UPC, Co-Array Fortran, Titanium Globally Asynchronous, Locally Synchronous (GALS)
Language Extended Subset of Java –Without Java Concurrency(threads, monitors) –With Distributed Arrays –With new built-in types –With Places, Activites, Clocks
Language Places –Host some data –Runs some activities(lightweight threads) –Must spawn activities to access data at other places –Correspond to an Address Space (SMP node) Immutable data Freely copied between places on access
Language Clocks –Used to order activities Distributed Arrays –Support Collectives
Language Atomic Sections –atomic S –Excecute locally, only accessing local data Asynchronous Activities –async (P) S –future (P) E
Regions Distributions are maps from a region to a subset of places A region [0:200,1:100] specifies a collection of 2-D points Points are used as array indices Operations on regions are provided –Union, disjunction, set difference
Deadlock, Data Races X10 guarantee −Any program written with async, finish, atomic, foreach, ateach, and clock parallel constructs will never deadlock Unrestricted use of future and force may lead to deadlock, but Restricted use of future and force in X10 can preserve guaranteed freedom from deadlocks To eliminate Data Races, atomic methods and blocks should be used
Example of “future” public class TutFuture1 { static int fib(final int n) { if ( n <= 0 ) return 0; else if ( n == 1 ) return 1; else { future fn_1 = future { fib(n-1) }; future fn_2 = future { fib(n-2) }; return fn_1.force() + fn_2.force(); } } // fib() public static void main(String[] args) { System.out.println("fib(10) = " + fib(10)); } Example of recursive divide-and- conquer parallelism --- calls to fib(n-1) and fib(n-2) execute in parallel
Example of “async” final int n=100; … finish { async for (int i=1 ; i<=n ; i+=2 ) oddSum.val += i; for (int j=2 ; j<=n ; j+=2 ) evenSum.val += j; } // Both oddSum and evenSum have been computed now System.out.println("oddSum = " + oddSum.val + " ; evenSum = " + evenSum.val); Parent activity creates new child to execute “ for (int i=1 ; i<=n ; i+=2 ) oddSum.val += i ” An async statement returns immediately Parent execution proceeds immediately to next statement Any access to parent’s local data must be through final variables
Example of “atomic” finish { async for (int i=1 ; i<=n ; i+=2 ) { double r = 1.0d / i ; atomic rSum += r; } for (int j=2 ; j<=n ; j+=2 ) { double r = 1.0d / j ; atomic rSum += r; } } An atomic statement/method is conceptually executed in a single step, while other activities are suspended − Note: programmer does not manage any locks explicitly An atomic section may not include − Blocking operations − Creation of activities
Example of “atomic” public class TutAtomic2 { const int a = new boxedInt(100); const int b = new boxedInt(100); public static atomic void incr_a() { a.val++ ; b.val-- ; } public static atomic void decr_a() { a.val-- ; b.val++ ; } public static void main(String args[]) { int sum; finish { async for (int i=1 ; i<=10 ; i++ ) incr_a(); for (int i=1 ; i<=10 ; i++ ) decr_a(); } atomic sum = a.val + b.val; System.out.println("a+b = " + sum); } Console output: a+b = 200
Code for Jacobi 2-D public class Jacobi { const region R = [0:N+1, 0:N+1]; const region RInner= [1:N, 1:N]; const distribution D = distribution.factory.block(R); const distribution DInner = D | RInner; const distribution DBoundary = D - RInner; double[D] B = new double[D] (point p[i,j]) {return DBoundary.contains(p) ? (N-1)/2 : N*(i-1)+(j-1); }; // Exploded Variable Declaration, implicitly defined for i,j public boolean run() { int iters = 0; double err; while(true) { double[.] Temp = new double[DInner] (point [i,j]) {return (read(i+1,j)+read(i-1,j)+read(i,j+1)+read(i,j-1))/4.0; }; if((err=((B | DInner) - Temp).abs().sum()) < epsilon) break; B.update(Temp); iters++; }} public double read(final int i, final int j) { return future(D[i,j]) B[i,j].force(); } public static void main(String args[]) { boolean b= (new Jacobi()).run(); System.out.println(" " + (b? "Test succeeded." :"Test failed.")); System.exit(b?0:1); }