Parallel NetCDF Library Development Formerly “Sensor Cloud Integration” Kelsey Weingartner
Overview Background information Purpose Project artifacts Final product Results Takeaways
NetCDF and MASS NetCDF Machine-independent format for representing scientific data Files stores data arranged in variables Each variable holds an array of data MASS Library for running a simulation in parallel Eases the complexity of creating and running 2D and 3D spatial simulations A simulation is a grid of “Places” that may or may not have “Agents” on them
Purpose Within MASS Make NetCDF file use simple and feasible for MASS Make NetCDF file use simple and feasible for MASS Maintain the benefits of a distributed environment running in parallel. Maintain the benefits of a distributed environment running in parallel. Real-World Applications Climate change analysis Climate change analysis
Artifacts Summer 2012 Sequential write with NetCDF Sequential write with NetCDF Worst-case parallel performance Worst-case parallel performance Fall 2013 Best-case parallel performance Best-case parallel performance File creator File creator File creator with parallel write File creator with parallel write File creator with parallel write & read File creator with parallel write & read Winter 2013 Single instance per processor file creator and parallel writer Single instance per processor file creator and parallel writer Final product Final product
Sequential Each save requires the file to only be opened once callAll() gathers agent information from each Place Master node then handles writing to the NetCDF file
Parallel - Worst-Case Each save, the file is opened by every Place object Master triggers save with callAll() Place gathers its Agents’ information and writes
JavaMPI Parallel Best-Case Select a NetCDF file to copy Master node creates a new file with same dimensions Send an equal portion of data from the chosen file to each node Each node writes their received array to the newly created NetCDF file
Final Product Single Instance per processor file creator and parallel reader/writer Extends MASS Place Creates a file for the simulation if none exists Stores file contents in a buffer to increase read/write speed Each processor holds the portion of the file relevant to them A file is only opened by the first writer Place in each partition
Results Sequential write (1 processor): 100x100, 1,000 agents, 1,000 cycles = 225,712.8 msec 100x100, 1,000 agents, 1,000 cycles = 225,712.8 msec Worst-case parallel write (1 processor): 50x50, 500 agents, 100 cycles 957,590.5 msec 50x50, 500 agents, 100 cycles 957,590.5 msec MPInetCDF results on a 50x50 file: On 4 processors: 22,114.4 msec / 246,444 bytes = B/msec On 4 processors: 22,114.4 msec / 246,444 bytes = B/msec On 6 processors: 16,470.2 msec / 246,444 bytes = B/msec On 6 processors: 16,470.2 msec / 246,444 bytes = B/msec RandomWalk using parallel NetCDF (1 processor): 100x100, 1,000 agents, 1,000 cycles = 204,997.2 msec / 472,484 bytes = B/msec 100x100, 1,000 agents, 1,000 cycles = 204,997.2 msec / 472,484 bytes = B/msec
Final Product Results RandomWalk 100 x 100 grid, 1000 agents, 100 cycles: 7,843.7 msec 100 x 100 grid, 1000 agents, 100 cycles: 7,843.7 msec RandomWalk with NetCDF 100 x 100 grid, 1000 agents, 100 cycles, writing to file every 20 cycles: 204,997.2 msec 100 x 100 grid, 1000 agents, 100 cycles, writing to file every 20 cycles: 204,997.2 msec Previous settings, but writing to file only once: 69,480.7 Previous settings, but writing to file only once: 69,480.7 Wave2DMASS 100 x 100 grid, 1000 cycles: 16,913.5 msec Wave2DMASS with NetCDF 100 x 100 grid, 1,000 cycles, writing to file every 50 cycles: 50,422.6 msec Previous settings, but writing to file only once: 22,923.9
Future Work On Parallel_NetCDF D0 array support Object datatype support Allow a whole variable to be read/written Smaller buffer After Parallel_NetCDF Conference paper for IEEE PacRim Conference
Key Lessons Working with external libraries Working with limited documentation Creating and meeting deadlines Experience with parallel and distributed systems
Questions?
Intermediate Products File Creators FileCreator Create uniform 2D or 3D grids Can create NetCDF files with an unlimited dimension. FileManipulator 1.0 Create uniform 2D or 3D grids Write 1D or 2D arrays of integer FileManipulator 2.0 Create uniform 2D or 3D grids Read or write whole variable or single value 8 datatypes supported Single Instance Iterations Single instance per processor reader Create uniform 2D or 3D grids Read or write whole variables 8 datatypes supported
Tools Used Java Eclipse IDE JavaMPI MASS Library NetCDF Library