Download presentation
Presentation is loading. Please wait.
Published byHarriet Thornton Modified over 9 years ago
1
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing Systems 2012 (DISCS’12) Workshop, November 16, 2012 1 Department of Computer Science, University of Houston 2 Department of Earth, Atmospheric, and Planetary Sciences, MIT 3 Total E&P 1DISCS'12 Workshop
2
Industry is looking for faster and more cost-effective ways to process massive amounts of data more powerful hardware more productive programming models innovative software techniques Oil and Gas Industry: Compute Needs 2DISCS'12 Workshop
3
Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 3DISCS'12 Workshop
4
Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 4DISCS'12 Workshop
5
Coarray Model in Fortran 2008 Derives from Co-Array Fortran (CAF) SPMD execution model, PGAS memory model – execution entities called images – coarrays: globally-accessible, symmetric data objects additional intrinsic subroutines/functions for querying process and data information additional statements in language for synchronization 5DISCS'12 Workshop
6
Working with Distributed Data using Coarrays ……………… … … … … … 1 2 3 4 M 1234* real:: B[M, *] B references local B B[3,4] references local B B[3,3] references B in left neighbor 6DISCS'12 Workshop
7
Working with Distributed Data using Coarrays ……………… … … … … … 1 2 3 4 M 1234* real:: B(10,10)[M, *] B(2:4,2:4) references local subarray of B B(2:4,2:4)[3,4] references local subarray of B B(2:4,2:4)[3,3] references subarray of B in left neighbor 7DISCS'12 Workshop
8
2D Halo Exchange Example with CAF real :: a(0:R+1, 0:C+1)[pR,*] … a(R+1,1)[top(1),top(2)] = a(1,1:C) a(0,1:C)[bottom(1),bottom(2)] = a(R,1:C) a(1:R,0)[right(1),right(2)] = a(1:R,C) a(1:R,C+1)[left(1),left(2)] = a(1:R,1) sync all 8DISCS'12 Workshop
9
2D Halo Exchange with MPI real :: a(0:R+1, 0:C+1) … call mpi_isend( a(1,1:C), C, mpi_real, & top(myp), TAG,...) call mpi_irecv( a(R+1,1:C), C, mpi_real, & bottom(myp), TAG,...) call mpi_isend( a(R,1:C), C, mpi_real, & bottom(myp), TAG,...) call mpi_irecv( a(0,1:C), C, mpi_real, & top(myp), TAG,...) call mpi_isend( a(1:R,C), R, mpi_real, & right(myp), TAG,...) call mpi_irecv( a(1:R,0), R, mpi_real, & left(myp), TAG,...) call mpi_isend( a(1:R,1), R, mpi_real, & left(myp), TAG,...) call mpi_irecv( a(C+1,1:R), R, mpi_real, & right(myp), TAG,...) call mpi_waitall( 8,...) 9DISCS'12 Workshop
10
Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 10DISCS'12 Workshop
11
Implementation of CAF OpenUH compiler – an industry-quality, optimizing compiler based on Open64 – features: dependence and data-flow analysis, interprocedural analysis, OpenMP – backend supports multiple targets (x86_64, IA64, IA32, MIPS, PTX) Fortran Front-End with coarray support CAF Source Code Coarray Translation Phase OpenUH CAF Runtime Library OpenUH CAF Runtime Library Loop Optimizer Global Optimizer Code Gen exec. OpenUH Compiler 11DISCS'12 Workshop
12
Runtime Support for CAF Runtime Interface (libcaf) 1-sided Communication PGAS Memory Allocation Synchronization Collectives Support (e.g. reductions) Atomics Portable Communication Substrate: GASNet or ARMCI 12DISCS'12 Workshop
13
Comparison with other Implementations CompilerCommercial/FreeFortran 2008 Coarray Support? OpenUHFreeYes G95Partially Free, No longer supported Missing Locks Support GfortranFreeIn progress Rice CAF 2.0FreePartially, but adds different features Cray FortranCommercialYes Intel FortranCommercialYes 13DISCS'12 Workshop
14
Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 14DISCS'12 Workshop
15
Seismic Subsurface Imaging: Reverse Time Migration A source wave is emitted per shot Reflected waves captured by array of sensors RTM (in time domain) uses finite difference method to numerically solve wave equation and reconstruct subsurface image (in parallel, with domain decomposition) 15DISCS'12 Workshop
16
RTM Implementations Isotropic – simplest model – assumes reflected waves propagate at same speed in every direction from a point – only swaps faces (8 swaps in halo exchange) Tilted Transverse Isotropy (TTI) – assumes waves may propagate at different speeds – swaps faces and edges (18 swaps in halo exchange) 16DISCS'12 Workshop
17
Typical Data Usage Generally several thousand shots – data parallel problem, where each shot can be processed independently in parallel – each shot handles several GB of data – so, total data to analyze is in terabytes range Handling I/O – C I/O reads in velocity and coefficient models – Shot headers read by master and distributed – Each processor writes to a distinct file, and file is merged in post-processing step 17DISCS'12 Workshop
18
Results for CAF RTM port Total Domain Size: 1024 x 768 x 512 (3.0 GB, per shot) Forward Shot Isotropic case: up to 32% faster compared to corresponding MPI implementation TTI case: competitive performance with MPI 18DISCS'12 Workshop
19
Results for CAF RTM port Total Domain Size: 1024 x 768 x 512 (3.0 GB, per shot) Backward Shot Isotropic case: performance hit at 256 procs TTI case: lagging a bit behind MPI 19DISCS'12 Workshop
20
Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 20DISCS'12 Workshop
21
Extending Fortran for Parallel I/O We are currently designing a prototype implementation for a parallel I/O language extension Fortran I/O was not yet extended to facilitate cooperative I/O to shared files – original Co-Array Fortran specified a simple extension to Fortran I/O – parallel I/O may be added in a future version of the standard 21DISCS'12 Workshop
22
Fortran I/O Fortran provides interfaces for formatted and unformatted I/O record 1 record 2 record 3 record 4 … open( 10, file=‘fn’, action=‘write’, & access=‘direct’, recl=k ) … write (10, rec=3) A open( 10, file=‘fn’, action=‘write’, & access=‘direct’, recl=k ) … write (10, rec=3) A A write file ‘fn’ connected to unit 10 22DISCS'12 Workshop
23
Current limitations of I/O Issues: 1.no defined, legal way for multiple images to access the same file 2.a file is a 1-dimensional sequence of records 3.records are read/written one at a time 4.no mechanism for collectives accesses to a shared file amongst multiple images 23DISCS'12 Workshop
24
Proposed Extension for Parallel I/O Allow a file to be “share-opened”, e.g. OPEN( 10, file=‘fn’, TEAM=‘yes’, …) – all images form a team with shared access to the same file – implicit synchronization recommended only for direct access mode FLUSH statement used to ensure changes by one image are visible to other images in team CLOSE statement has implicit image synchronization 24DISCS'12 Workshop
25
Further extensions we’re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files 1,1 … open( 10, file=‘fn’, action=‘write’, & access=‘direct’, ndim=2, & dims=(/M/), team=‘yes’, recl=k ) … open( 10, file=‘fn’, action=‘write’, & access=‘direct’, ndim=2, & dims=(/M/), team=‘yes’, recl=k ) … file ‘fn’ connected to unit 10 1,21,3… 2,12,22,3… 3,13,23,3… 4,14,24,3… 5,15,25,3… M,1M,2M,3… 25DISCS'12 Workshop
26
Further extensions we’re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files 1,1 … write (10, rec_lb=(/ 2,2 /), rec_ub=(/ 4,3 /) ) & A(1:4, 1:2) write (10, rec_lb=(/ 2,2 /), rec_ub=(/ 4,3 /) ) & A(1:4, 1:2) file ‘fn’ connected to unit 10 1,21,3… 2,1 2,2 2,3 … 3,1 3,2 3,3 … 4,1 4,2 4,3 … 5,15,25,3… M,1M,2M,3… A(1:4,1:2) write 26DISCS'12 Workshop
27
Further extensions we’re exploring Multi-dimensional view of records Read/write multiple records at a time Collective read/write operations on shared files 1,1 type(T) :: A(2,2)[3,*] … my_rec_lbs = get_rec_lbs( this_image() ) my_rec_ubs = get_rec_ubs( this_image() ) write_team( 10, rec_lb=my_rec_lbs, & rec_lb=my_rec_lbs) & A(:,:) type(T) :: A(2,2)[3,*] … my_rec_lbs = get_rec_lbs( this_image() ) my_rec_ubs = get_rec_ubs( this_image() ) write_team( 10, rec_lb=my_rec_lbs, & rec_lb=my_rec_lbs) & A(:,:) file ‘fn’ connected to unit 10 1,21,31,4 2,12,22,32,4 3,13,23,33,4 4,14,24,34,4 5,15,25,35,4 6,16,26,36,4 A(1:2,1:2)[1,1] A(1:2,1:2)[2,1] A(1:2,1:2)[1,2] A(1:2,1:2)[2,2] A(1:2,1:2)[3,1]A(1:2,1:2)[3,2] write_team 27DISCS'12 Workshop
28
Leverage Global Arrays as memory buffers for I/O Implementation in progress which utilizes global arrays (GA) as I/O buffers in memory I/O requests asynchronous disk updates compute nodes I/O nodes 28DISCS'12 Workshop
29
Outline Fortran 2008 parallel processing additions (CAF) CAF Implementation in OpenUH Fortran compiler Application port to CAF and Results Further extensions for Parallel I/O Closing Remarks 29DISCS'12 Workshop
30
In Summary Fortran coarray model may be used for processing large data sets Developed implementation that’s freely available and used it to develop RTM application Fortran’s I/O model doesn’t support parallel I/O for large-scale, multi-dimensional array data sets, and we are working on addressing this 30DISCS'12 Workshop
31
Thanks 31DISCS'12 Workshop
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.