Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Coarray: a parallel extension to Fortran Jim.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

879 CISC Parallel Computation High Performance Fortran (HPF) Ibrahim Halil Saruhan Although the [Fortran] group broke new ground …
2003 Michigan Technological University March 19, Steven Seidel Department of Computer Science Michigan Technological University
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
Introduction to the Partitioned Global Address Space (PGAS) Programming Model David E. Hudak, Ph.D. Program Director for HPC Engineering
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
More Tools Done basic, most useful programming tools: MPI OpenMP What else is out there? Languages and compilers? Parallel computing is significantly more.
1 An Evaluation of Global Address Space Languages: Co-Array Fortran and Unified Parallel C Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey Rice University.
PGAS Language Update Kathy Yelick. PGAS Languages: Why use 2 Programming Models when 1 will do? Global address space: thread may directly read/write remote.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
Reference: Message Passing Fundamentals.
Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Yuri Dotsenko Jason Lee EckhardtJohn Mellor-Crummey Department of.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Parallel Programming Models and Paradigms
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.
Hossein Bastan Isfahan University of Technology 1/23.
Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
MPI3 Hybrid Proposal Description
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 A Multi-platform Co-Array Fortran Compiler Yuri Dotsenko Cristian Coarfa John Mellor-Crummey Department of Computer Science Rice University Houston,
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
SPMD: Single Program Multiple Data Streams
1 John Mellor-Crummey Cristian Coarfa, Yuri Dotsenko Department of Computer Science Rice University Experiences Building a Multi-platform Compiler for.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
1 Concurrent Languages – Part 1 COMP 640 Programming Languages.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
1 Serial Run-time Error Detection and the Fortran Standard Glenn Luecke Professor of Mathematics, and Director, High Performance Computing Group Iowa State.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
University of Minnesota Comments on Co-Array Fortran Robert W. Numrich Minnesota Supercomputing Institute University of Minnesota, Minneapolis.
October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa {johnmc, dotsenko,
University of Minnesota Introduction to Co-Array Fortran Robert W. Numrich Minnesota Supercomputing Institute University of Minnesota, Minneapolis and.
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Lecture 7: POSIX Threads - Pthreads. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
HPC Components for CCA Manoj Krishnan and Jarek Nieplocha Computational Sciences and Mathematics Division Pacific Northwest National Laboratory.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Fortran 2003 Jim Xia IBM Toronto Lab
Unified Parallel C Kathy Yelick EECS, U.C. Berkeley and NERSC/LBNL NERSC Team: Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu,
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
Experiences with Co-array Fortran on Hardware Shared Memory Platforms Yuri DotsenkoCristian Coarfa John Mellor-CrummeyDaniel Chavarria-Miranda Rice University,
A Multi-platform Co-Array Fortran Compiler for High-Performance Computing Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey {dotsenko, ccristi,
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
HParC language. Background Shared memory level –Multiple separated shared memory spaces Message passing level-1 –Fast level of k separate message passing.
1/46 PARALLEL SOFTWARE ( SECTION 2.4). 2/46 The burden is on software From now on… In shared memory programs: Start a single process and fork threads.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
University of Minnesota Introduction to Co-Array Fortran Robert W. Numrich Minnesota Supercomputing Institute University of Minnesota, Minneapolis and.
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level 1 Fundamentals of Programming Most.
Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
An Emerging, Portable Co-Array Fortran Compiler for High-Performance Computing Daniel Chavarría-Miranda, Cristian Coarfa, Yuri.
Computer Engg, IIT(BHU)
Introduction to parallelism and the Message Passing Interface
Programming Parallel Computers
Presentation transcript:

Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Coarray: a parallel extension to Fortran Jim Xia IBM Toronto Lab

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group 2 Agenda  Coarray background  Programming model  Synchronization  Comparing coarrays to UPC and MPI  Q&A

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group 3 Existing parallel model MPI: de facto standard on distributed memory systems Difficult to program OpenMP: popular on shared memory Lack of data locality control Not designed for distributed systems

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Coarray background Proposed by Numrich and Reid [1998] Natural extension of Fortran's array language Originally named F-- (as jokey reference to C++)‏ One of the Partitioned Global Address Space languages (PGAS)‏ Other GAS languages: UPC and Titanium Benefits One-sided communication User controlled data distribution and locality Suitable for a variety of architectures: distributed, shared or hybrid Standardized as a part of Fortran 2008 Expected to be published in 2010

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Programming model Single Program Multiple Data (SPMD)‏ Fixed number of processes (images)‏ “Everything is local!” [Numerich] All data is local All computation is local Explicit data partition with one-sided communication Remote data movement through codimensions Programmer explicitly controls the synchronizations Good or bad?

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Coarray syntax CODIMENSION attribute double precision, dimension(2,2), CODIMENSION[*] :: x or simply use [ ] syntax double precision :: x(2,2)[*] a coarray can have a corank higher than 1 ● double precision :: A(100,100)[5,*] from ANY single image, one can refer to the array x on image Q using [ ]  X(:,:)[Q]  e.g. Y(:,:) = X(:,:)[Q]  X(2,2)[Q] = Z Coindexed objects Normally the remote data Without [ ] the data reference is local to the image ● X(1,1) = X(2,2)[Q] ● !LHS is local data; RHS is a coindexed object, likely a ● !remote data

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Image 1 x(1,1)‏ x(1,2)‏ x(2,1)‏x(2,2)‏ Image 2 x(1,1)‏ x(1,2)‏ x(2,1)‏x(2,2)‏ Image p x(1,1)‏x(1,2)‏ x(2,1)‏ x(2,2)‏ Image q x(1,1)‏x(1,2)‏ x(2,1)‏x(2,2)‏ Image n x(1,1)‏x(1,2)‏ x(2,1)‏x(2,2)‏ Logical view of coarray X(2,2)[*] Coarray memory model A fixed number of images during execution Each has a local array of shape (2 x 2)‏ examples of data access: local data and remote data X(1,1) = X(2,2)[q] !assignment occurs on all images if (this_image() == 1) X(2,2)[q] = SUM(X(2,:)[p])‏ !computation of SUM occurs on image 1

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Image 1 Real :: X(N)[*] X (N)‏ x(2:n-1)‏ Image T X (N)‏ x(2:n-1)‏ Me X (N)‏ x(2:n-1)‏ Left X (N)‏ x(2:n-1)‏ Right X (N)‏ x(2:n-1)‏ image indexing me = this_image()‏ if (me == 1) then left = num_images()‏ else left = me - 1 end if Execute the shift SYNC ALL temp = x(n-1)‏ x(2:n-1) = x(1:n-2)‏ x(1) = x(n)[left] SYNC ALL x(n) = temp Example: circular shift by 1 “Global view” on coarray Fortran intrinsic CSHIFT only works on local arrays this_image(): index of the executing image num_images(): the total number of images

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Synchronization primitives Multi-image synchronization SYNC ALL Synchronization across all images SYNC IMAGES Synchronization on a list of images Memory barrier SYNC MEMORY Image serialization CRITICAL (“the big hammer”)‏ Allows one image to execute the block at a time LOCK: provide fine-grained disjoint data access Simple lock support Some statements may imply synchronization SYNC ALL implied when the application starts

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Image 1 Master image to distribute and collect data distribute data sync images (*)‏ Perform task sync images (*)‏ collect data perform IO (Wait for data)‏ sync images (1)‏ Perform task sync images (1)‏ Image p (Wait for data)‏ sync images (1)‏ Perform task sync images (1)‏ Image q (Wait for data)‏ sync images (1)‏ Perform task sync images (1)‏ Image n if (this_image() == 1) then call distributeData ()‏ SYNC IMAGES (*)‏ call performTask ()‏ SYNC IMAGES (*)‏ call collectData ()‏ call performIO ()‏ else SYNC IMAGES (1)‏ call performTask ()‏ SYNC IMAGES (1)‏ call otherWork ()‏ end if Other work Good: Image q starts performTask once its own data are set – no wait for image p Works well on a balanced system Bad if the load is not balanced Efficient if collaboration among small set of images Example: SYNC IMAGES

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Atomic load and store Two atomic operations provided for spin-lock-loop ATOMIC_DEFINE and ATOMIC_REF LOGICAL(ATOMIC_LOGICAL_KIND),SAVE :: LOCKED[*] =.TRUE. LOGICAL :: VAL INTEGER :: IAM, P, Q IAM = THIS_IMAGE()‏ IF (IAM == P) THEN ! preceding work SYNC MEMORY CALL ATOMIC_DEFINE (LOCKED[Q],.FALSE.)‏ SYNC MEMORY ELSE IF (IAM == Q) THEN VAL =.TRUE. DO WHILE (VAL)‏ CALL ATOMIC_REF (VAL, LOCKED)‏ END DO SYNC MEMORY ! Subsequent work END IF

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group CAF implementation and Performance studies Existing coarray implementations Cray Rice University G95 Coarray applications Most on large distributed systems e.g. ocean modeling Performance evaluation A number of performance studies have been done CAF Fortran 90 + MPI IBM is implementing coarrays CAF and UPC on a common run-time

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Standardization status Coarray is in base language of Fortran 2008 Could be finalized this May Standard to be published in 2010 Fortran to be the first general purpose language to support parallel programming The coarray TR (future coarray features)‏ TEAM and collective subroutines More synchronization primitives notify / query (point – to – point)‏ Parallel IO: multiple images on same file

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group CAF : REAL :: X(2)[*] UPC : shared [2] float x[2*THREADS] Image 1Image 2 Image num_images()‏... X (2)‏ X (1)‏ X (2)‏ X (1)‏ X (2)‏ X (1)‏ Thread 0Thread 1 Thread THREADS-1... x [2*THREADS-1] x [2*THREADS-2] x [3] x [2] x [1] x [0] Comparison between CAF and UPC

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Coarrays and MPI Early experience demonstrated coarrays and MPI can coexist in the same application Migration from MPI to coarray has shown some success Major obstacle: CAF is not widely available Fortran J3 committee willing to work with MPI forum Two issues Fortran committee is currently working on to support: C interop with void * void * buf; (C)‏ TYPE(*), dimension(...) :: buf (Fortran)‏ MPI nonblocking calls: MPI_ISEND, MPI_IRECV and MPI_WAIT

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Example: comparing CAF to MPI MPI: if (master) then r(1) = reynolds... r(18) = viscosity call mpi_bcast(r,18,real_mp_type, masterid, MPI_comm_world, ierr)‏ else call mpi_bcast(r, 18, real_mp_type, masterid, MPI_comm_world, ierr)‏ reynolds = r(1)‏... viscosity = r(18)‏ endif (Ashby and Reid, 2008)‏ CAF : sync all if (master) then do i=1, num_images()-1 reynolds[i] = reynolds... viscosity[i] = viscosity end do end if sync all Or simply: sync all reynolds = reynolds[masterid]... viscosity = viscosity[masterid]

Compilation Technology SCINET compiler workshop | coarray Fortran © 2009 IBM Corporation Software Group Q & A