Download presentation
Presentation is loading. Please wait.
1
Parallel NetCDF + MASS Development
By: Sanjay Bappudi
2
Table of Contents Overview of Multi-Agent Spatial Simulation
NetCDF library Integration work Spring 2013 Extension with hadoop Summer 2013
3
Spatial Simulation Simulation “places”
Each cell computes its own wave height, using its four neighboring cells’ information. z[t][i][j] = 2.0 z[t-1][i][j] – z[t-2][i][j] + c2(dt/dd)2 (z[t-1][i+1][j] + z[t-1][i-1][j] + z[t-1][i][j+1] + z[t-1][i][j-1] – 4.0 z[t-1][i][j])
4
Multi Agent Simulation
Each agent represent an independent fish (prey) and shark (predator). A model designer wants to focus on each agent design.
5
MASS
6
NetCDF NetCDF Machine-independent format for representing scientific data Files store data arranged in variables Each variable holds an array of data netCDF wave2D { dimensions: x = 4, y = 4, time = 3; variables: float wave( x, y, time ); data: 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1 }
7
Purpose Within MASS Real-World application
Allow each simulation entity to save its own data Make NetCDF file use simple and feasible for MASS Maintain the benefits of a distributed environment running in parallel Real-World application Climate change analysis
8
How it works Creates a file for the simulation if none exists
Stores file contents in a buffer to increase read/write speed Each processor holds the portion of the file relevant to them A file is only opened by the first writer Place in each partition A single instance per processor file creator and parallel reader and writer. When a simulation using paralll_netCDF is started a filename must be passed as part of the command line parameters. If this file does not exist on disc, then the parameters used to initialize the writer are used to create a file holding a uniform grid of variables for the simulation.
9
Spring 2013 Understand previous work and gather information.
Debug and verify the NetCDF + MASS integration Run performance tests, store, compare and analyze results Kelsey's work runs correctly on one processor, but fails otherwise.
10
Summer 2013 Port the MASS-parallelized NetCDF library on to Hadoop
HDFS stores data on distributed disks Useful for extremely large simulations Weather simulations produce data upwards of 40 terabytes.
11
HDFS: Hadoop distributed file system
HDFS Client block data (block id, byte range) HDFS datanode HDFS datanode HDFS datanode Linux local file system Linux local file system Linux local file system … … …
12
Final Results Running Wave2D Simulation Size 100 for 1000 cycles
With 1 Node and 1 Thread Without File Writes 1 File Write Transfer to HDFS Execution Time (s) 17.2 s 23.4 s 26.2 s Increased Overhead +6.2 s +2.8 s
13
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.