Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra.

Similar presentations


Presentation on theme: "A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra."— Presentation transcript:

1 A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing Systems 2012 (DISCS’12) Workshop, November 16, 2012 1 Department of Computer Science, University of Houston 2 Department of Earth, Atmospheric, and Planetary Sciences, MIT 3 Total E&P 1DISCS'12 Workshop

2 Industry is looking for faster and more cost-effective ways to process massive amounts of data more powerful hardware more productive programming models innovative software techniques Oil and Gas Industry: Compute Needs 2DISCS'12 Workshop

3 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 3DISCS'12 Workshop

4 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 4DISCS'12 Workshop

5 Coarray Model in Fortran 2008 Derives from Co-Array Fortran (CAF) SPMD execution model, PGAS memory model – execution entities called images – coarrays: globally-accessible, symmetric data objects additional intrinsic subroutines/functions for querying process and data information additional statements in language for synchronization 5DISCS'12 Workshop

6 Working with Distributed Data using Coarrays ……………… … … … … … 1 2 3 4 M 1234* real:: B[M, *] B references local B B[3,4] references local B B[3,3] references B in left neighbor 6DISCS'12 Workshop

7 Working with Distributed Data using Coarrays ……………… … … … … … 1 2 3 4 M 1234* real:: B(10,10)[M, *] B(2:4,2:4) references local subarray of B B(2:4,2:4)[3,4] references local subarray of B B(2:4,2:4)[3,3] references subarray of B in left neighbor 7DISCS'12 Workshop

8 2D Halo Exchange Example with CAF real :: a(0:R+1, 0:C+1)[pR,*] … a(R+1,1)[top(1),top(2)] = a(1,1:C) a(0,1:C)[bottom(1),bottom(2)] = a(R,1:C) a(1:R,0)[right(1),right(2)] = a(1:R,C) a(1:R,C+1)[left(1),left(2)] = a(1:R,1) sync all 8DISCS'12 Workshop

9 2D Halo Exchange with MPI real :: a(0:R+1, 0:C+1) … call mpi_isend( a(1,1:C), C, mpi_real, & top(myp), TAG,...) call mpi_irecv( a(R+1,1:C), C, mpi_real, & bottom(myp), TAG,...) call mpi_isend( a(R,1:C), C, mpi_real, & bottom(myp), TAG,...) call mpi_irecv( a(0,1:C), C, mpi_real, & top(myp), TAG,...) call mpi_isend( a(1:R,C), R, mpi_real, & right(myp), TAG,...) call mpi_irecv( a(1:R,0), R, mpi_real, & left(myp), TAG,...) call mpi_isend( a(1:R,1), R, mpi_real, & left(myp), TAG,...) call mpi_irecv( a(C+1,1:R), R, mpi_real, & right(myp), TAG,...) call mpi_waitall( 8,...) 9DISCS'12 Workshop

10 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 10DISCS'12 Workshop

11 Implementation of CAF OpenUH compiler – an industry-quality, optimizing compiler based on Open64 – features: dependence and data-flow analysis, interprocedural analysis, OpenMP – backend supports multiple targets (x86_64, IA64, IA32, MIPS, PTX) Fortran Front-End with coarray support CAF Source Code Coarray Translation Phase OpenUH CAF Runtime Library OpenUH CAF Runtime Library Loop Optimizer Global Optimizer Code Gen exec. OpenUH Compiler 11DISCS'12 Workshop

12 Runtime Support for CAF Runtime Interface (libcaf) 1-sided Communication PGAS Memory Allocation Synchronization Collectives Support (e.g. reductions) Atomics Portable Communication Substrate: GASNet or ARMCI 12DISCS'12 Workshop

13 Comparison with other Implementations CompilerCommercial/FreeFortran 2008 Coarray Support? OpenUHFreeYes G95Partially Free, No longer supported Missing Locks Support GfortranFreeIn progress Rice CAF 2.0FreePartially, but adds different features Cray FortranCommercialYes Intel FortranCommercialYes 13DISCS'12 Workshop

14 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 14DISCS'12 Workshop

15 Seismic Subsurface Imaging: Reverse Time Migration A source wave is emitted per shot Reflected waves captured by array of sensors RTM (in time domain) uses finite difference method to numerically solve wave equation and reconstruct subsurface image (in parallel, with domain decomposition) 15DISCS'12 Workshop

16 RTM Implementations Isotropic – simplest model – assumes reflected waves propagate at same speed in every direction from a point – only swaps faces (8 swaps in halo exchange) Tilted Transverse Isotropy (TTI) – assumes waves may propagate at different speeds – swaps faces and edges (18 swaps in halo exchange) 16DISCS'12 Workshop

17 Typical Data Usage Generally several thousand shots – data parallel problem, where each shot can be processed independently in parallel – each shot handles several GB of data – so, total data to analyze is in terabytes range Handling I/O – C I/O reads in velocity and coefficient models – Shot headers read by master and distributed – Each processor writes to a distinct file, and file is merged in post-processing step 17DISCS'12 Workshop

18 Results for CAF RTM port Total Domain Size: 1024 x 768 x 512 (3.0 GB, per shot) Forward Shot Isotropic case: up to 32% faster compared to corresponding MPI implementation TTI case: competitive performance with MPI 18DISCS'12 Workshop

19 Results for CAF RTM port Total Domain Size: 1024 x 768 x 512 (3.0 GB, per shot) Backward Shot Isotropic case: performance hit at 256 procs TTI case: lagging a bit behind MPI 19DISCS'12 Workshop

20 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 20DISCS'12 Workshop

21 Extending Fortran for Parallel I/O We are currently designing a prototype implementation for a parallel I/O language extension Fortran I/O was not yet extended to facilitate cooperative I/O to shared files – original Co-Array Fortran specified a simple extension to Fortran I/O – parallel I/O may be added in a future version of the standard 21DISCS'12 Workshop

22 Fortran I/O Fortran provides interfaces for formatted and unformatted I/O record 1 record 2 record 3 record 4 … open( 10, file=‘fn’, action=‘write’, & access=‘direct’, recl=k ) … write (10, rec=3) A open( 10, file=‘fn’, action=‘write’, & access=‘direct’, recl=k ) … write (10, rec=3) A A write file ‘fn’ connected to unit 10 22DISCS'12 Workshop

23 Current limitations of I/O Issues: 1.no defined, legal way for multiple images to access the same file 2.a file is a 1-dimensional sequence of records 3.records are read/written one at a time 4.no mechanism for collectives accesses to a shared file amongst multiple images 23DISCS'12 Workshop

24 Proposed Extension for Parallel I/O Allow a file to be “share-opened”, e.g. OPEN( 10, file=‘fn’, TEAM=‘yes’, …) – all images form a team with shared access to the same file – implicit synchronization recommended only for direct access mode FLUSH statement used to ensure changes by one image are visible to other images in team CLOSE statement has implicit image synchronization 24DISCS'12 Workshop

25 Further extensions we’re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files 1,1 … open( 10, file=‘fn’, action=‘write’, & access=‘direct’, ndim=2, & dims=(/M/), team=‘yes’, recl=k ) … open( 10, file=‘fn’, action=‘write’, & access=‘direct’, ndim=2, & dims=(/M/), team=‘yes’, recl=k ) … file ‘fn’ connected to unit 10 1,21,3… 2,12,22,3… 3,13,23,3… 4,14,24,3… 5,15,25,3… M,1M,2M,3… 25DISCS'12 Workshop

26 Further extensions we’re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files 1,1 … write (10, rec_lb=(/ 2,2 /), rec_ub=(/ 4,3 /) ) & A(1:4, 1:2) write (10, rec_lb=(/ 2,2 /), rec_ub=(/ 4,3 /) ) & A(1:4, 1:2) file ‘fn’ connected to unit 10 1,21,3… 2,1 2,2 2,3 … 3,1 3,2 3,3 … 4,1 4,2 4,3 … 5,15,25,3… M,1M,2M,3… A(1:4,1:2) write 26DISCS'12 Workshop

27 Further extensions we’re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files 1,1 type(T) :: A(2,2)[3,*] … my_rec_lbs = get_rec_lbs( this_image() ) my_rec_ubs = get_rec_ubs( this_image() ) write_team( 10, rec_lb=my_rec_lbs, & rec_lb=my_rec_lbs) & A(:,:) type(T) :: A(2,2)[3,*] … my_rec_lbs = get_rec_lbs( this_image() ) my_rec_ubs = get_rec_ubs( this_image() ) write_team( 10, rec_lb=my_rec_lbs, & rec_lb=my_rec_lbs) & A(:,:) file ‘fn’ connected to unit 10 1,21,31,4 2,12,22,32,4 3,13,23,33,4 4,14,24,34,4 5,15,25,35,4 6,16,26,36,4 A(1:2,1:2)[1,1] A(1:2,1:2)[2,1] A(1:2,1:2)[1,2] A(1:2,1:2)[2,2] A(1:2,1:2)[3,1]A(1:2,1:2)[3,2] write_team 27DISCS'12 Workshop

28 Leverage Global Arrays as memory buffers for I/O Implementation in progress which utilizes global arrays (GA) as I/O buffers in memory I/O requests asynchronous disk updates compute nodes I/O nodes 28DISCS'12 Workshop

29 Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 29DISCS'12 Workshop

30 In Summary Fortran coarray model may be used for processing large data sets Developed implementation that’s freely available and used it to develop RTM application Fortran’s I/O model doesn’t support parallel I/O for large-scale, multi-dimensional array data sets, and we are working on addressing this 30DISCS'12 Workshop

31 Thanks 31DISCS'12 Workshop


Download ppt "A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra."

Similar presentations


Ads by Google