IBM’s X10 Presentation by Isaac Dooley CS498LVK Spring 2006.

Slides:

Advertisements

Similar presentations

IBM Research: Software Technology © 2006 IBM Corporation 1 Programming Language X10 Christoph von Praun IBM Research HPC WPL Sandia National Labs December.

Advertisements

X10 Tutorial PSC Software Productivity Study May 23 – 27, 2005 Vivek Sarkar IBM T.J. Watson Research Center This work has been supported.

Object Initialization in X10 Yoav Zibin David Cunningham Igor Peshansky Vijay Saraswat IBM research in TJ Watson.

Concurrency Issues Motivation, Problems, Directions Dennis Kafura - CS Operating Systems1.

Operating Systems: Internals and Design Principles

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,

Introduction to the Partitioned Global Address Space (PGAS) Programming Model David E. Hudak, Ph.D. Program Director for HPC Engineering

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)

Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science X10: IBM’s bid into parallel languages Paul B Kohler Kevin S Grimaldi University.

Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.

Computer Systems/Operating Systems - Class 8

An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu

Arrays. Memory organization Table at right shows 16 bytes, each consisting of 8 bits Each byte has an address, shown in the column to the left

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

CS 240A: Models of parallel programming: Machines, languages, and complexity measures.

Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.

29-Jun-15 Java Concurrency. Definitions Parallel processes—two or more Threads are running simultaneously, on different cores (processors), in the same.

Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.

Chapter 4 Threads, SMP, and Microkernels Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E.

© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.

Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.

Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.

12/1/98 COP 4020 Programming Languages Parallel Programming in Ada and Java Gregory A. Riccardi Department of Computer Science Florida State University.

Presented by High Productivity Language Systems: Next-Generation Petascale Programming Aniruddha G. Shet, Wael R. Elwasif, David E. Bernholdt, and Robert.

Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.

Concurrent Programming. Concurrency  Concurrency means for a program to have multiple paths of execution running at (almost) the same time. Examples:

HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.

1 Threads, SMP, and Microkernels Chapter 4. 2 Process Resource ownership: process includes a virtual address space to hold the process image (fig 3.16)

CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.

Lecture 8 Page 1 CS 111 Online Other Important Synchronization Primitives Semaphores Mutexes Monitors.

October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,

© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.

Hierarchical Phasers for Scalable Synchronization and Reductions in Dynamic Parallelism 1 黃翔 Dept. of Electrical Engineering National Cheng Kung University.

Martin Kruliš by Martin Kruliš (v1.0)1.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

HParC language. Background Shared memory level –Multiple separated shared memory spaces Message passing level-1 –Fast level of k separate message passing.

CISC. What is it?  CISC - Complex Instruction Set Computer  CISC is a design philosophy that:  1) uses microcode instruction sets  2) uses larger.

Software Design 13.1 From controller to threads l Threads are lightweight processes (what’s a process?)  Threads are part of a single program, share state.

A brief intro to: Parallelism, Threads, and Concurrency

CS5102 High Performance Computer Systems Thread-Level Parallelism

Outline Other synchronization primitives

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

Other Important Synchronization Primitives

Computer Engg, IIT(BHU)

X10: Performance and Productivity at Scale

Programming Models for SimMillennium

Dynamic Parallelism Martin Kruliš by Martin Kruliš (v1.0)

The University of Adelaide, School of Computer Science

Threads, SMP, and Microkernels

Shared Memory Programming

Lecture 4- Threads, SMP, and Microkernels

Java Concurrency 17-Jan-19.

Concurrency: Mutual Exclusion and Process Synchronization

Immersed Boundary Method Simulation in Titanium Objectives

Software Transactional Memory Should Not be Obstruction-Free

Chapter 4: Threads.

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 17 Multiprocessors and Thread-Level Parallelism

Java Concurrency 29-May-19.

The University of Adelaide, School of Computer Science

CMSC 202 Threads.

Distributed Databases

Lecture 17 Multiprocessors and Thread-Level Parallelism

Presentation transcript:

IBM’s X10 Presentation by Isaac Dooley CS498LVK Spring 2006

Sources: Report on the Experimental Language X10, Vijay Saraswat X10 Tutorial v10.pdf

Goal of X10 “The fundamental goal of X10 is to enable scalable, high-performance, high-productivity transformational programming for high-end computers -- for traditional numerical computation workloads… as well as commercial server workloads.”

Description “X10 is a type-safe, modern, parallel, distributed object-oriented language intended to be very easily accessible to Java(TM) programmers. It is targeted to future low-end and high-end systems with nodes that are built out of multi-core SMP chips with non- uniform memory hierarchies, and interconnected in scalable cluster configurations. …

Description … A member of the Partitioned Global Address Space (PGAS) family of languages, X10 highlights the explicit reification of locality in the form of places; lightweight activities embodied in async, future, foreach, and ateach constructs; constructs for termination detection (finish) and phased computation (clocks); the use of lock-free synchronization (atomic blocks); and the manipulation of global arrays and data structures.”

Memory Model Not Distributed Memory, Not Shared Memory “Fragmented Memory Model” Partitioned Global Address Space(GAS) –Like UPC, Co-Array Fortran, Titanium Globally Asynchronous, Locally Synchronous (GALS)

Language Extended Subset of Java –Without Java Concurrency(threads, monitors) –With Distributed Arrays –With new built-in types –With Places, Activites, Clocks

Language Places –Host some data –Runs some activities(lightweight threads) –Must spawn activities to access data at other places –Correspond to an Address Space (SMP node) Immutable data Freely copied between places on access

Language Clocks –Used to order activities Distributed Arrays –Support Collectives

Language Atomic Sections –atomic S –Excecute locally, only accessing local data Asynchronous Activities –async (P) S –future (P) E

Regions Distributions are maps from a region to a subset of places A region [0:200,1:100] specifies a collection of 2-D points Points are used as array indices Operations on regions are provided –Union, disjunction, set difference

Deadlock, Data Races X10 guarantee −Any program written with async, finish, atomic, foreach, ateach, and clock parallel constructs will never deadlock Unrestricted use of future and force may lead to deadlock, but Restricted use of future and force in X10 can preserve guaranteed freedom from deadlocks To eliminate Data Races, atomic methods and blocks should be used

Example of “future” public class TutFuture1 { static int fib(final int n) { if ( n <= 0 ) return 0; else if ( n == 1 ) return 1; else { future fn_1 = future { fib(n-1) }; future fn_2 = future { fib(n-2) }; return fn_1.force() + fn_2.force(); } } // fib() public static void main(String[] args) { System.out.println("fib(10) = " + fib(10)); } Example of recursive divide-and- conquer parallelism --- calls to fib(n-1) and fib(n-2) execute in parallel

Example of “async” final int n=100; … finish { async for (int i=1 ; i<=n ; i+=2 ) oddSum.val += i; for (int j=2 ; j<=n ; j+=2 ) evenSum.val += j; } // Both oddSum and evenSum have been computed now System.out.println("oddSum = " + oddSum.val + " ; evenSum = " + evenSum.val); Parent activity creates new child to execute “ for (int i=1 ; i<=n ; i+=2 ) oddSum.val += i ” An async statement returns immediately Parent execution proceeds immediately to next statement Any access to parent’s local data must be through final variables

Example of “atomic” finish { async for (int i=1 ; i<=n ; i+=2 ) { double r = 1.0d / i ; atomic rSum += r; } for (int j=2 ; j<=n ; j+=2 ) { double r = 1.0d / j ; atomic rSum += r; } } An atomic statement/method is conceptually executed in a single step, while other activities are suspended − Note: programmer does not manage any locks explicitly An atomic section may not include − Blocking operations − Creation of activities

Example of “atomic” public class TutAtomic2 { const int a = new boxedInt(100); const int b = new boxedInt(100); public static atomic void incr_a() { a.val++ ; b.val-- ; } public static atomic void decr_a() { a.val-- ; b.val++ ; } public static void main(String args[]) { int sum; finish { async for (int i=1 ; i<=10 ; i++ ) incr_a(); for (int i=1 ; i<=10 ; i++ ) decr_a(); } atomic sum = a.val + b.val; System.out.println("a+b = " + sum); } Console output: a+b = 200

Code for Jacobi 2-D public class Jacobi { const region R = [0:N+1, 0:N+1]; const region RInner= [1:N, 1:N]; const distribution D = distribution.factory.block(R); const distribution DInner = D | RInner; const distribution DBoundary = D - RInner; double[D] B = new double[D] (point p[i,j]) {return DBoundary.contains(p) ? (N-1)/2 : N*(i-1)+(j-1); }; // Exploded Variable Declaration, implicitly defined for i,j public boolean run() { int iters = 0; double err; while(true) { double[.] Temp = new double[DInner] (point [i,j]) {return (read(i+1,j)+read(i-1,j)+read(i,j+1)+read(i,j-1))/4.0; }; if((err=((B | DInner) - Temp).abs().sum()) < epsilon) break; B.update(Temp); iters++; }} public double read(final int i, final int j) { return future(D[i,j]) B[i,j].force(); } public static void main(String args[]) { boolean b= (new Jacobi()).run(); System.out.println(" " + (b? "Test succeeded." :"Test failed.")); System.exit(b?0:1); }