Kai Li, Allen D. Malony, Sameer Shende, Robert Bell

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
C++ Training Datascope Lawrence D’Antonio Lecture 11 UML.
Allen D. Malony, Sameer Shende, Robert Bell Department of Computer and Information Science Computational Science Institute, NeuroInformatics.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Annual SERC Research Review - Student Presentation, October 5-6, Extending Model Based System Engineering to Utilize 3D Virtual Environments Peter.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
DEVSView: A DEVS Visualization Tool Wilson Venhola.
Magnetic Field Measurement System as Part of a Software Family Jerzy M. Nogiec Joe DiMarco Fermilab.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
GYTE - Bilgisayar Mühendisliği Bölümü Bilgisayar Mühendisliği Bölümü GYTE - Bilgisayar Mühendisliği Bölümü AN ARCHITECTURE FOR NEXT GENERATION MIDDLEWARE.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
34 Copyright © 2007, Oracle. All rights reserved. Module 34: Siebel Business Services Siebel 8.0 Essentials.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
FlowLevel Client, server & elements monitoring and controlling system Message Include End Dial Start.
ECG Simulation NCRR Overview Technology for the ECG Simulation project CardioWave BioPSE project background Tools developed to date Tools for the next.
Marcelo R.N. Mendes. What is FINCoS? A Java-based set of tools for data generation, load submission, and performance measurement of event processing systems;
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
SOFTWARE TESTING TRAINING TOOLS SUPPORT FOR SOFTWARE TESTING Chapter 6 immaculateres 1.
Productive Performance Tools for Heterogeneous Parallel Computing
Working in the Forms Developer Environment
Course Outcomes of Object Oriented Modeling Design (17630,C604)
Object-Oriented Analysis and Design
Self Healing and Dynamic Construction Framework:
Smart Ethernet I/O P2P and GCL Introduction
Cross Platform Development using Software Matrix
INTRODUCING Adams/CHASSIS
Unified Modeling Language
In-situ Visualization using VisIt
System Design.
J. Michael, M. Shing M. Miklaski, J. Babbitt Naval Postgraduate School
Recap: introduction to e-science
Initial Adaptation of the Advanced Regional Prediction System to the Alliance Environmental Hydrology Workbench Dan Weber, Henry Neeman, Joe Garfield and.
University of Technology
#01 Client/Server Computing
Allen D. Malony, Sameer Shende
Programmable Logic Controllers (PLCs) An Overview.
Some remarks on Portals and Web Services
Introduction to Software Testing
Document Visualization at UMBC
NASPAC 2.0 Architecture January 27, 2010
Chapter 20 Object-Oriented Analysis and Design
Appendix A Object-Oriented Analysis and Design
What is Concurrent Programming?
Different Levels of Testing
Analysis models and design models
An Introduction to Software Architecture
Chapter 7 –Implementation Issues
Allen D. Malony Computer & Information Science Department
Outline Introduction Motivation for performance mapping SEAA model
AIMS Equipment & Automation monitoring solution
Model, View, Controller design pattern
TAU Performance DataBase Framework (PerfDBF)
Appendix A Object-Oriented Analysis and Design
Appendix A Object-Oriented Analysis and Design
♪ Embedded System Design: Synthesizing Music Using Programmable Logic
#01 Client/Server Computing
Presentation transcript:

Real-time Performance Visualization and Analysis of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research Laboratory, University of Oregon We designed and prototyped a system architecture for online performance access, analysis, and visualization in a large-scale parallel environment. The architecture consists of four components. The ”performance data integrator" component is responsible for interfacing with a performance monitoring system to merge parallel performance samples into a synchronous data stream for analysis. The ”performance data reader” component reads the external performance data into internal data structures of the analysis and visualization system. The "performance analyzer" component provides the analysis developer a programmable framework for constructing analysis modules that can be linked together for different functionality. The "performance visualizer" component can also be programmed to create different displays modules. Our prototype is based the TAU performance system, the Uintah computational framework, and the SCIRun computational steering and visualization system. Parallel profile data from a Uintah simulation are sampled and written to profile files during execution. A profile reader, implemented as a SCIRun module, saves profile samples in SCIRun memory. SCIRun provides a programmable system for building and linking the analysis and visualization components. We have developed two analysis modules and three visualization modules to demonstrate how parallel profile data from large-scale Uintah applications are processed online. Parallel performance tools offer the program developer insights into the execution behavior of an application and are a valuable component in the cycle of the application development and deployment. However, most tools do not work well with large-scale parallel applications where the performance data generated comes from thousands of processes. Not only can the data be difficult to manage and the analysis complex, but existing performance display tools are mostly restricted to two dimensions and lack the customization and display interaction to support full data investigation. In addition, it is increasingly important that performance tools be able to function online, making it possible to control and adapt long-running applications based on performance feedback. Again, large-scale parallelism complicates the online access and management of performance data, and it may be desirable to integrate performance analysis and visualization in existing computational steering infrastructure. The coupling of advanced three-dimensional visualization with large-scale, online performance data analysis could enhance application performance evaluation. The challenge is to develop a framework where the tedious work, such as access to the performance data and graphics rendering, is supported by the underlying system, leaving tool developers to focus on the high level design of the analysis and visualization capabilities. Uintah Application Performance Steering Visualizer Analyzer Data Reader TAU System Data Integrator SCIRun // performance data streams data output (Parallel) File system • sample sequencing • reader synchronization accumulated samples socket Performance Visualizer TAU Performance System EPILOG Paraver The role of the performance visualizer component is to read the Field objects generated from performance analysis and show graphical representations of performance results. We have built three visualization modules to demonstrate the display of 2D and 3D data fields. The Terrain visualizer shows ImageMesh data as a surface. The user can select the resolution of the X and Y dimensions in the Terrain control panel. A TerrainDenotator module was developed to mark interesting points in the visualization. A different display of 2D field data is produced by the KiviatTube visualizer. Here a “tube” surface is created where the distance of points from the tube center axis is determine by metric values and the tube length correlates with the sample. The visualization of PointCloudMesh data is accomplished by the PointCloud visualizer module. The SCIRun program graph below shows how the data reader, analyzer, and visualizer modules are connected to process parallel profile samples from a small Uintah testcase. A visualization from a 500 processor run follows. Performance Data Integrator Sample 1 Sample 2 Sample k … The performance data integrator reads the performance profile files, generated for each profile sample for each thread, and merges the files into a single, synchronized profile sample dataset. Each profile sample file is assigned a sequence number and the whole dataset is sequenced and timestamped. A socket-based protocol is maintained with the performance data reader to inform it of the availability of new profile samples and to coordinate dataset transfer. Performance Data Reader … … Sample 1 Sample i Sample k The performance data reader is implemented as a SCIRun module. It inputs the merged profile sample dataset sent by the data integrator and stores the dataset in an internal C++ object structure. A profile sample dataset is organized in a tree-like manner according to TAU profile hierarchy: node  context  thread  profile data Each object in the profile tree has a set of attribute access methods and a set of offspring access methods. These methods are the interface for the performance analyzer. n,c,t i,j,k n: node i c: context j t: thread k … n: 1 n: 2 n: j c: 1 c: 2 c: k … c: 1 c: 2 c: k … … … t: 1 t: 2 t: l t: 1 t: 2 t: l p p p p p p attribute and offspring access methods Performance Analyzer Performance analysis of the profile sample data can take various forms. Using the access methods on the profile tree object, all performance profile data, including cross-sample data, is available for analysis. A library of performance analysis modules can be developed, some simple and others more sophisticated. We have implemented two generic profile analysis modules: Gen2DField and Gen3DField. The modules provide a user control panel that allows them to be customized with respect to events, data values, number of samples, and filter options. Ultimately, the output of the analysis modules must be in a form that can be visualized. The Gen2DField and Gen3DField modules are so named because they produce 2D and 3D Field data, respectively. SCIRun has different geometric meshes available for Fields. We use an ImageMesh for 2D fields and a PointCloudMesh for 3D fields. TAU Parallel Performance System The TAU parallel performance system is an integrated toolkit for performance instrumentation, measurement, analysis, and visualization of large-scale parallel applications. TAU provides robust support for multi-threaded, message passing, or mixed-mode programs written in C, C++, Fortran 77/90, and OpenMP. Multiple instrumentation options are available, including automatic source instrumentation. The TAU measurement system supports parallel profiling and tracing. It has been ported to all ASCI platforms and is used extensively in ASCI code development projects. TAU is being integrated in the Uintah Computational Framework (UCF) and will be applied to evaluation and optimization of UCF and C-SAFE simulations. Dr. Allen D. Malony, PI Kai Li, Ph.D. graduate student Sameer Shende, Post-doc research associate Robert Bell, Research associate Performance Research Laboratory University of Oregon {malony,likai,sameer,bertie}@cs.uoregon.edu ImageMesh PointCloudMesh