Parallel NetCDF Library Development Formerly “Sensor Cloud Integration” Kelsey Weingartner.

Slides:



Advertisements
Similar presentations
Introduction to Java 2 Programming Lecture 3 Writing Java Applications, Java Development Tools.
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
XS - Platform What is XS – Manager ?
MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.
Professional Toolkit V2.0 C:\Presentations - SmartCafe_Prof_V2.0 - bsc page 1 Professional Toolkit 2.0.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
The Functions and Purposes of Translators Code Generation (Intermediate Code, Optimisation, Final Code), Linkers & Loaders.
Verification/Simulati on –GUI for simulation and formal verification –Simulator: Exploration of dynamic behavior Checking.
Alford Academy Business Education and Computing1 Advanced Higher Computing Based on Heriot-Watt University Scholar Materials File Handling.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Faster Sorting Methods Chapter 9 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
4/26/05Han: ELEC72501 Department of Electrical and Computer Engineering Auburn University, AL K.Han Development of Parallel Distributed Computing System.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Chris Rouse CSS Cooperative Education Faculty Research Internship Winter / Spring 2014.
Distributed Computations MapReduce
HDF 1 NCSA HDF XML Activities Robert E. McGrath Mike Folk National Center for Supercomputing Applications.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Distributed Multi-Agent Management in a parallel-programming simulation and analysis environment: diffusion, guarded migration, merger and termination.
CSS Cooperative Education Faculty Research Internship Spring / Summer 2013 Richard Romanus 08/23/2013 Developing and Extending the MASS Library (Java)
Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
PowerPoint Lesson 9 Importing and Exporting Information Microsoft Office 2010 Advanced Cable / Morrison 1.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Cis303a_chapt03-2a.ppt Range Overflow Fixed length of bits to hold numeric data Can hold a maximum positive number (unsigned) X X X X X X X X X X X X X.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Scientific Computing Division A tutorial Introduction to Fortran Siddhartha Ghosh Consulting Services Group.
MapReduce How to painlessly process terabytes of data.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Document Imaging and Workflow. An electronic file cabinet Rather than maintain paper documents, Feith allows for electronic files to be stored and sorted.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
Usability Issues Facing 21st Century Data Archives Joey Mukherjee and David Winningham
ECE 353 Lab 1: Cache Simulation. Purpose Introduce C programming by means of a simple example Reinforce your knowledge of set associative caches.
The Worlds of Database Systems From: Ch. 1 of A First Course in Database Systems, by J. D. Pullman and H. Widom.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
CSC 211 Data Structures Lecture 13
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
CSC 212 – Data Structures Lecture 37: Course Review.
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
CIS 270—App Dev II Big Java Chapter 19 Files and Streams.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.
Distributed mega-scale Agent Management in MASS: diffusion, guarded migration, merger and termination Cherie Wasous CSS_700 Thesis – Winter 2014 (Feb.
Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2.Solve smaller instances.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Server Administration. [vpo_server_admin] 2 Server Administration Section Overview Controlling Management Server processes Controlling Managed Node processes.
20081 Converting workspaces and using SALT & subversion to maintain them. V1.02.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
Overview of Previous Lesson(s) Over View 3 Program.
Distributed mega-scale Agent Management in MASS: diffusion, guarded migration, merger and termination Cherie Wasous CSS_700 Thesis – Winter 2014 (Jan.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Hongbin Li 11/13/2014 A Debugger of Parallel Mutli- Agent Spatial Simulation.
WINTER 2016 – TERM PRESENTATION MICHAEL O’KEEFE. PAST RESEARCH - SUMMER 2015 Continued Jason Woodring’s research on UWCA Main issue with UWCA is the slow.
Big Data is a Big Deal!.
Thank you, chairman for the kind introduction. And hello, everyone.
Parallel NetCDF + MASS Development
Presentation transcript:

Parallel NetCDF Library Development Formerly “Sensor Cloud Integration” Kelsey Weingartner

Overview  Background information  Purpose  Project artifacts  Final product  Results  Takeaways

NetCDF and MASS NetCDF  Machine-independent format for representing scientific data  Files stores data arranged in variables  Each variable holds an array of data MASS  Library for running a simulation in parallel  Eases the complexity of creating and running 2D and 3D spatial simulations  A simulation is a grid of “Places” that may or may not have “Agents” on them

Purpose  Within MASS Make NetCDF file use simple and feasible for MASS Make NetCDF file use simple and feasible for MASS Maintain the benefits of a distributed environment running in parallel. Maintain the benefits of a distributed environment running in parallel.  Real-World Applications Climate change analysis Climate change analysis

Artifacts  Summer 2012 Sequential write with NetCDF Sequential write with NetCDF Worst-case parallel performance Worst-case parallel performance  Fall 2013 Best-case parallel performance Best-case parallel performance File creator File creator File creator with parallel write File creator with parallel write File creator with parallel write & read File creator with parallel write & read  Winter 2013 Single instance per processor file creator and parallel writer Single instance per processor file creator and parallel writer Final product Final product

Sequential  Each save requires the file to only be opened once  callAll() gathers agent information from each Place  Master node then handles writing to the NetCDF file

Parallel - Worst-Case  Each save, the file is opened by every Place object  Master triggers save with callAll()  Place gathers its Agents’ information and writes

JavaMPI Parallel Best-Case  Select a NetCDF file to copy  Master node creates a new file with same dimensions  Send an equal portion of data from the chosen file to each node  Each node writes their received array to the newly created NetCDF file

Final Product Single Instance per processor file creator and parallel reader/writer  Extends MASS Place  Creates a file for the simulation if none exists  Stores file contents in a buffer to increase read/write speed  Each processor holds the portion of the file relevant to them  A file is only opened by the first writer Place in each partition

Results  Sequential write (1 processor): 100x100, 1,000 agents, 1,000 cycles = 225,712.8 msec 100x100, 1,000 agents, 1,000 cycles = 225,712.8 msec  Worst-case parallel write (1 processor): 50x50, 500 agents, 100 cycles 957,590.5 msec 50x50, 500 agents, 100 cycles 957,590.5 msec  MPInetCDF results on a 50x50 file: On 4 processors: 22,114.4 msec / 246,444 bytes = B/msec On 4 processors: 22,114.4 msec / 246,444 bytes = B/msec On 6 processors: 16,470.2 msec / 246,444 bytes = B/msec On 6 processors: 16,470.2 msec / 246,444 bytes = B/msec  RandomWalk using parallel NetCDF (1 processor): 100x100, 1,000 agents, 1,000 cycles = 204,997.2 msec / 472,484 bytes = B/msec 100x100, 1,000 agents, 1,000 cycles = 204,997.2 msec / 472,484 bytes = B/msec

Final Product Results  RandomWalk 100 x 100 grid, 1000 agents, 100 cycles: 7,843.7 msec 100 x 100 grid, 1000 agents, 100 cycles: 7,843.7 msec  RandomWalk with NetCDF 100 x 100 grid, 1000 agents, 100 cycles, writing to file every 20 cycles: 204,997.2 msec 100 x 100 grid, 1000 agents, 100 cycles, writing to file every 20 cycles: 204,997.2 msec Previous settings, but writing to file only once: 69,480.7 Previous settings, but writing to file only once: 69,480.7  Wave2DMASS 100 x 100 grid, 1000 cycles: 16,913.5 msec  Wave2DMASS with NetCDF 100 x 100 grid, 1,000 cycles, writing to file every 50 cycles: 50,422.6 msec Previous settings, but writing to file only once: 22,923.9

Future Work On Parallel_NetCDF  D0 array support  Object datatype support  Allow a whole variable to be read/written  Smaller buffer After Parallel_NetCDF   Conference paper for IEEE PacRim Conference

Key Lessons  Working with external libraries  Working with limited documentation  Creating and meeting deadlines  Experience with parallel and distributed systems

Questions?

Intermediate Products File Creators  FileCreator Create uniform 2D or 3D grids Can create NetCDF files with an unlimited dimension.  FileManipulator 1.0 Create uniform 2D or 3D grids Write 1D or 2D arrays of integer  FileManipulator 2.0 Create uniform 2D or 3D grids Read or write whole variable or single value 8 datatypes supported Single Instance Iterations  Single instance per processor reader Create uniform 2D or 3D grids Read or write whole variables 8 datatypes supported

Tools Used   Java   Eclipse IDE   JavaMPI   MASS Library   NetCDF Library