Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory.

Slides:



Advertisements
Similar presentations
SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
Advertisements

EUFORIA FP7-INFRASTRUCTURES , Grant JRA4 Overview and plans M. Haefele, E. Sonnendrücker Euforia kick-off meeting 22 January 2008 Gothenburg.
1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
T-FLEX DOCs PLM, Document and Workflow Management.
Distributed Processing, Client/Server, and Clusters
Technical Architectures
Reference: Message Passing Fundamentals.
Brad Whitlock October 14, 2009 Brad Whitlock October 14, 2009 Porting VisIt to BG/P.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
The hybird approach to programming clusters of multi-core architetures.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Computer System Architectures Computer System Software
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
CCS APPS CODE COVERAGE. CCS APPS Code Coverage Definition: –The amount of code within a program that is exercised Uses: –Important for discovering code.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Developments in networked embedded system technologies and programmable logic are making it possible to develop new, highly flexible data acquisition system.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Nov. 14, 2012 Hank Childs, Lawrence Berkeley Jeremy Meredith, Oak Ridge Pat McCormick, Los Alamos Chris Sewell, Los Alamos Ken Moreland, Sandia Panel at.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
CHAPTER TEN AUTHORING.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Nick Draper 05/11/2008 Mantid Manipulation and Analysis Toolkit for ISIS data.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
The Software Development Process
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
1 University of Maryland Runtime Program Evolution Jeff Hollingsworth © Copyright 2000, Jeffrey K. Hollingsworth, All Rights Reserved. University of Maryland.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Wednesday NI Vision Sessions
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
VisIt Project Overview
The Distributed Application Debugger (DAD)
VisIt 2.0 Features Brad Whitlock.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock
Transitioning VisIt to CMake
In-situ Visualization using VisIt
Lawrence Livermore National Laboratory
Big Data Remote Access Interfaces for Experimental Physics
Component Frameworks:
SAMANVITHA RAMAYANAM 18TH FEBRUARY 2010 CPE 691
Database System Architectures
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA LLNL-PRES Brad Whitlock Lawrence Livermore National Laboratory Parallel In Situ Coupling of Simulation with a Fully Featured Visualization System Jean M. Favre Swiss National Supercomputing Centre Jeremy S. Meredith Oak Ridge National Laboratory

2 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA What We’re Doing  We are processing simulation data in situ to avoid the high costs of I/O associated with writing and then reading the data  We have created a library that lets simulations interface to a fully featured parallel visualization system  We have used our library to instrument a parallel cosmology code to investigate some of its performance aspects compared to doing I/O

3 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Case For Using In Situ  I/O in supercomputers has not kept pace with compute power  Some applications report 90% of time spent in I/O [Peterka et al.]  Post processing simulation files requires write then read, paying for I/O twice in different application  In Situ may let us avoid some I/O MachineYearWritable FLOPS Whole-System Checkpoint ASCI Red %300 sec ASCI Blue Pacific %400 sec ASCI White %480 sec ASCI Red Storm %660 sec ASCI Purple %500 sec NCCS XT %1400 sec Roadrunner %480 sec NCCS XT %1250 sec ASC Sequoia 2012 (planned) 0.001%3200 sec

4 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA A Marriage Between Two Fairly Inflexible Partners… Simulation Visualization and Analysis Application Layer In Between Layer In Between We are presenting a systems research paper that describes how a general purpose visualization tool can be flexibly coupled with a simulation.

5 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA In Situ Processing Strategies We find 3 main strategies for in situ processing: In Situ StrategyDescriptionNegative Aspects Loosely coupledVisualization and analysis run on concurrent resources and access data over network 1)Data movement costs 2)Requires separate resources Tightly coupledVisualization and analysis have direct access to memory of simulation code 1)Very memory constrained 2)Large potential impact (performance, crashes) HybridData is reduced in a tightly coupled setting and sent to a concurrent resource 1)Complex 2)Shares negative aspects (to a lesser extent) of others

6 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Loosely Coupled In Situ Processing  I/O layer stages data into secondary memory buffers, possibly on other compute nodes  Visualization applications access the buffers and obtain data  Separates visualization processing from simulation processing  Copies and moves data Simulation data Memory buffer data I/O Layer Possible network boundary Visualization tool read

7 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Tightly Coupled Custom In Situ Processing  Custom visualization routines are developed specifically for the simulation and are called as subroutines Create best visual representation Optimized for data layout  Tendency to concentrate on very specific visualization scenarios  Write once, use once Simulation data Visualization Routines images, etc

8 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Tightly Coupled General In Situ Processing  Simulation uses data adapter layer to make data suitable for general purpose visualization library  Rich feature set can be called by the simulation  Operate directly on the simulation’s data arrays when possible  Write once, use many times images, etc Simulation data Data Adapter General Visualization Library

9 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Tightly Coupled Loosely Coupled Hybrid Custom General ✖ Which Strategy is Appropriate? There have been many excellent papers and systems in this space. Different circumstances often merit different solutions. ✖ ✖✖

10 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Design Philosophy  Visualization and analysis will be done in the same memory space as the simulation on native data to avoid duplication  Maximize features and capabilities  Minimize code modifications to simulations  Minimize impact to simulation codes  Allow users to start an in situ session on demand instead of deciding before running a simulation Debugging Computational steering

11 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Selecting an In Situ Strategy  Our strategy is tightly coupled, yet general  Fully featured visualization code connects interactively to running simulation Allows live exploration of data for when we don’t know visualization setup a priori Opportunities for steering  We chose VisIt as the visualization code VisIt runs on several HPC platforms VisIt has been used at many levels of concurrency We know how to develop for VisIt

12 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Visualization Tool Architecture  Clients runs locally and display results computed on the server network connection Vis Server MPI Data Plugin Parallel ClusterLocal VisIt ClientsFiles Data Vis Server Filter Data Flow Network  Server runs remotely in parallel, handling data processing for client  Data processed in data flow networks  Filters in data flow networks can be implemented as plug-ins

13 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Coordination Among Filters Using Contracts Source Filter Update Execute Contract 1 Contract 0 data

14 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Libsim Runtime Libsim Runtime Coupling of Simulations and VisIt  We created Libsim, a library that simulations use to let VisIt connect and access their data Simulation Libsim Front End Data Access Code Libsim Front End Libsim Front End Data Access Code Data Source Filter

15 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA A Simulation Using Libsim network connection Libsim Runtime MPI Front end Parallel ClusterLocal VisIt Clients Data Simulation Code Data Access Code Libsim Runtime Front end Data Access Code Libsim Runtime Front end Data Access Code  Front end library lets VisIt connect  Runtime library processes the simulation’s data  Runtime library obtains data on demand through user- supplied Data Access Code callback functions

16 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA In Situ Processing Workflow 1.The simulation code launches and starts execution 2.The simulation regularly checks for connection attempts from visualization tool 3.The visualization tool connects to the visualization 4.The simulation provides a description of its meshes and data types 5.Visualization operations are handled via Libsim and result in data requests to the simulation

17 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Instrumenting a Simulation Additions to the source code are usually minimal, and follow three incremental steps: Initialize Libsim and alter the simulation’s main iterative loop to listen for connections from VisIt. Create data access callback functions so simulation can share data with Libsim. Add control functions that let VisIt steer the simulation.

18 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Adapting the Main Loop Exit Solve Next Step Check for convergence, end of loop Visualization Request Complete VisIt Connection Process VisIt Commands Process Console Input VisItDetectInput Initialize  Libsim opens a socket and writes out connection parameters  VisItDetectInput checks for: Connection request VisIt commands Console input

19 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Sharing Data  VisIt requests data on demand through data access callback functions Return actual pointers to your simulation’s data (nearly zero-copy) Return alternate representation that Libsim can free Written in C, C++, Fortran, Python

20 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Sharing Data Example // Example Data Access Callback visit_handle GetVariable(int domain, char *name, void *cbdata) { visit_handle h = VISIT_INVALID_HANDLE; SimData_t *sim = (SimData_t *)cbdata; if(strcmp(name, "pressure") == 0) { VisIt_VariableData_alloc(&h); VisIt_VariableData_setDataD(h, VISIT_OWNER_SIM, 1, sim->nx*sim->ny, sim->pressure); } return h; } SimData_t Nx=6 Ny=8 pressure Pass simulation buffer to Libsim Simulation Buffer

21 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Supported Data Model  Mesh Types Structured meshes Point meshes CSG meshes AMR meshes Unstructured & Polyhedral meshes  Materials  Species  Variables 1 to N components Zonal and Nodal

22 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Adding Control Functions  The simulation provides commands to which it will respond  Commands generate user interface controls in Simulations Window

23 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Custom User Interfaces  Simulation can provide UI description for more advanced computational steering

24 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Libsim in Practice  We conducted our experiments on a 216 node visualization cluster Two, 6 core 2.8GHz Intel Xeon 5660 processors 96Gb of memory per node InfiniBandQDR high-speed interconnect Lustre parallel file system  We measured the impact of Libsim on a simulation’s main loop, without connecting to VisIt  We measured memory usage after loading VisIt  We instrumented GADGET-2, a popular cosmology code, with Libsim and measured performance

25 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Impact on the Main Loop  Measure cost of calling Libsim in the main loop  Instrumenting the main loop for a parallel simulation requires calling VisItDetectInput and MPI_Bcast We timed how long it took to call both using 512 cores 10K main loop iterations CoresVisItDetectInput overhead MPI_Bcast overhead Overhead loading VisIt runtime libraries 5122μs8μs8μs1s (1 time cost)

26 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Impact on Memory Usage  Measure memory used before and after VisIt is connected  Measured our updateplots example program  Read values from /proc/ /smaps EventSizeResident Set Size Simulation startup8.75 Mb512 Kb After Libsim Initialization8.75 Mb614 Kb After Loading VisIt222 Mb43.5 Mb

27 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Impact on GADGET-2 Simulation  GADGET-2, a distributed-memory parallel code for cosmological simulations of structure formation  We measured in situ performance versus I/O performance at 3 levels of concurrency Render 2048*2048 pixel image Collective I/O and file-per-process I/O 16 million particles and 100 million particles  Results show that in situ using a fully featured visualization system can provide performance advantages over I/O We relied on VisIt’s ability to scale (runs up to 65,356 cores have been demonstrated)

28 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Libsim/GADGET-2 Timing Results 16M particles32 cores256 cores I/O 1 file2.76s4.72s I/O N files0.74s0.31s In situ0.77s0.34s 100M particles32 cores256 cores512 cores I/O 1 file24.45s26.7s25.27s I/O N files0.69s1.43s2.29s In situ1.70s0.46s0.64s  In situ competitive or faster than single file I/O with increasing cores  It should be possible to do several in situ operations in the time needed for I/O  Time savings compared to simulation followed by post processing I/O results are the average of 5 runs per test case In Situ results are averaged from timing logs for multiple cores

29 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA VisIt Connected Interactively to GADGET-2

30 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Additional Results  We have recently instrumented additional simulations to investigate Libsim’s scaling properties on a Cray XE6 using up to 4224 cores  We identified and corrected a performance bottleneck in Libsim’s environment detection functions Time to Detect Environment

31 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Additional Results  Simulation was run on 11, 22, 44, 88 and 176 nodes (24 cores/node)  Each MPI task had a 512x512x100 block of data to isocontour at 10 different thresholds  Parallel I/O to disk was done with netCDF-4, in files of size 27, 55, 110, 221, and 442 Gb per iteration Time per iteration

32 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Additional Results  We completed a Python language binding to Libsim using SWIG Demonstrates that our choice to provide a pure C API for Libsim simplifies generation of language bindings Functions that accept a callback function pointer do require some special care and we are thinking of ways to mitigate

33 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Limitations of Implementation  Memory intensive Runtime library cost is larger than with static-linking since we use the whole feature set Filters may use intermediate memory Zero-copy is not fully implemented  We currently require an interactive session though with some changes we could avoid this

34 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Future Work  Provide additional functions for setting up visualization so in situ processing can be less user-driven  Further limit resources consumed by the VisIt runtime libraries in order to lessen the impact that in situ analysis has on the simulation  Characterize performance costs of using shared libraries on larger scale runs

35 LLNL-PRES Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Conclusion  We have implemented Libsim, an easy to use library that enables in situ computations Provides access to a fully featured, parallel visualization and analysis tool that excels at scale Minimizes impact to simulation performance Minimizes the amount of new code that must be written Fully integrated with the open-source distribution of VisIt