MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.

Slides:



Advertisements
Similar presentations
Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.
Advertisements

ICS103 Programming in C Lecture 1: Overview of Computers & Programming
Lecture 1: Overview of Computers & Programming
Automated Instrumentation and Monitoring System (AIMS)
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
Introduction to a Programming Environment
Software Development Unit 6.
Intel Trace Collector and Trace Analyzer Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding.
Copyright © 2001 by Wiley. All rights reserved. Chapter 1: Introduction to Programming and Visual Basic Computer Operations What is Programming? OOED Programming.
Overview of Eclipse Parallel Tools Platform Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523.
PAPI Tool Evaluation Bryan Golden 1/4/2004 HCS Research Laboratory University of Florida.
“SEMI-AUTOMATED PARALLELISM USING STAR-P " “SEMI-AUTOMATED PARALLELISM USING STAR-P " Dana Schaa 1, David Kaeli 1 and Alan Edelman 2 2 Interactive Supercomputing.
Learning Objectives Data and Information Six Basic Operations Computer Operations Programs and Programming What is Programming? Types of Languages Levels.
Topics Introduction Hardware and Software How Computers Store Data
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
MpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
London April 2005 London April 2005 Creating Eyeblaster Ads The Rich Media Platform The Rich Media Platform Eyeblaster.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
11 July 2005 Tool Evaluation Scoring Criteria Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko,
VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.
MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
CS 3131 Introduction to Programming in Java Rich Maclin Computer Science Department.
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
SvPablo Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
1 Blue Gene Simulator Gengbin Zheng Gunavardhan Kakulapati Parallel Programming Laboratory Department of Computer Science.
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
Quality of System requirements 1 Performance The performance of a Web service and therefore Solution 2 involves the speed that a request can be processed.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
® IBM Software Group © 2006 IBM Corporation PurifyPlus on Linux / Unix Vinay Kumar H S.
KOJAK Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
UPC Performance Tool Interface Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant.
HPCToolkit Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Renesas Technology America Inc. 1 M16C Seminars Lab 3 Creating Projects Using HEW4 14 March 2005 M16C Seminars Lab 3 Creating Projects Using HEW4 Last.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Chapter One An Introduction to Programming and Visual Basic.
Overview of dtrace Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive.
TAU Evaluation Report Adam Leko, Hung-Hsun Su UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
NetLogger Using NetLogger for Distributed Systems Performance Analysis of the BaBar Data Analysis System Data Intensive Distributed Computing Group Lawrence.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.
Exploring Parallelism with Joseph Pantoga Jon Simington.
1 Advanced MPI William D. Gropp Rusty Lusk and Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory.
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
Profiling/Tracing Method and Tool Evaluation Strategy Summary Slides Hung-Hsun Su UPC Group, HCS lab 1/25/2005.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.
Lecture 1 Page 1 CS 111 Summer 2013 Important OS Properties For real operating systems built and used by real people Differs depending on who you are talking.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
MPE Logging/nupshot. Included with MPICH 1.1 distribution Distributed separately from rest of MPICH from PTLIB MPE logging library produces trace files.
MASS Java Documentation, Verification, and Testing
Distributed Shared Memory
Experience with jemalloc
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Programming Languages
Chapter One: An Introduction to Programming and Visual Basic
Presentation transcript:

MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

2 Basic Information Name: MPE/Jumpshot Developer: Argonne National Labratory Current versions:  MPE 1.26  Jumpshot-4 Website: Contacts:  Anthony Chan  David Ashton  Rusty Lusk  William Gropp

3 What Is MPE/Jumpshot? The “quintessential” MPI logging and post-mortem visualization toolset MPE – Multi-Processing Environment  A software package for MPI programmers  Has three main parts: A tracing library that outputs all MPI calls to stdout A shared-display parallel X graphics and animation library A logging library for logging events Note: MPE/Jumpshot “logging” -> what we call tracing Jumpshot  A visualization tool for logfiles created by the MPE package  Written in Java (crossplatform)  Provides a “time line” (GANTT) view of MPI and program events  Also has basic search and summary (histogram) functionality

4 Logfiles: What’s In A Format? Much thought has been put into logfile formats  “Traditional” tracing results in large trace files  Trace file format can play a large part in visualization tool’s response time ALOG: original format (Argonne LOGging format)  Text-based format  Visualization tool: Upshot An X-windows application using the Athena widget toolset Later rewritten using Tcl/Tk for easy coding  Turned out to be too slow  Parts rewritten in C (“Nupshot”) but Tcl->C interface kept changing BLOG: intermediary format CLOG  Binary file format created to improve upon ALOG and BLOG  Visualization tools Jumpshot-1  Complete rewrite of Upshot/Nupshot  Coded in Java/AWT for cross-platformness  Bad performance, not widely used Jumpshot-2  Improved version using Java/SWING  Slightly better performance By default, MPE still outputs logfiles in CLOG  Low overhead  Can be easily converted to other formats as needed SLOG: “scalable” format  State-based logging format  Visualization tool: Jumpshot-3 Rewrite of Jumpshot-2 to use SLOG Can scale to ~GB logfiles SLOG-2: Current logfile format  Next-generation SLOG file format  “Graphical” logfile format to speed logfile parsing  Visualization tool: Jumpshot-4

5 MPE Overview Tracing capability  Automatic instrumentation: mpicc –mpitrace Writes to stdout at every MPI call, eg  [1] Starting MPI_Send with count = 28, dest = 0, tag = 0...  [1] Ending MPI_Send  Equivalent “manual” method: printf  Very simple & intuitive Parallel graphics ability  Automatic instrumentation: mpicc -mpianim -L/usr/X11R6/lib - lX11 –lm Displays graphics on one machine Circle for each process, arrow indicate sends/receives Slows down execution considerably  Graphics are also available via library calls Calls seem relatively easy to use: MPE_Draw_string, MPE_Draw_circle, MPE_Update, etc Probably not all that useful

6 MPE Overview (2) Logging ability  Automatic instrumentation: mpicc -mpilog  Logs start and stop of events  Can overlap starting and stopping of events  Can add “custom” events Easy to do using library calls  MPE_Log_get_event_number: create a new event  MPE_Describe_state: gives name and color to event  MPE_Log_event: records event in logfile, uses MPI_Wtime to get global time Custom events show up in Jumpshot-4 just like events from automatic instrumentation  Conventions Automatic instrumentation uses all caps (SEND, RECV) Manual instrumentation uses mixed case

7 MPE Overhead All programs executed correctly when instrumented Expect about 5% overhead of “real-world” applications Barrier recording mechanism has a lot of overhead  Most applications don’t use a bunch of barriers, though

8 MPE Overhead: Barriers Programs that have large measurement overhead shown below Tons of barriers! (yellow) PPerfMark: diffuse procedurePPerfMark: random barrier

9 Jumpshot Overview Jumpshot-4 supports two types of visualizations for metrics  Timeline (right, top)  Histogram (right, bottom) Visualization is dependant on SLOG- 2 format and Data model  Real drawables State – Single timeline ID, start/end timestamp Arrow – Pair of timeline IDs, start/end timestamp Event – Single timeline ID, single timestamp  Preview drawables Amalgamation of real drawables One corresponding type for each of the real drawables Serve to optimize performance of visualization Timeline view Histogram view

10 Jumpshot Overview (2) Emphasis on providing useful profile analysis from  High-level (entire program execution) view  Low-level (individual events) view Nice features  Intuitive interface  Automatically converts from CLOG to SLOG-2  Very good support for zooming and scrolling  User manual very thorough Things that could use improvement  Java application -> uses a lot of memory (~ MB during typical runs) Memory uses seems to scale nicely with logfile size though  No direct support for non-event-based data (running averages, time-varying histograms for cache miss numbers, etc)  Documentation a little unclear/excessively technical in some places Timeline view Histogram view

11 Bottleneck Identification Test Suite Testing metric: what did trace visualization tell us (automatic instrumentation)? CAMEL: PASSED  Identified large number of small messages at beginning of program execution  Also identified sequential parts of algorithm (sort on node 0, etc)  No other problems visible from trace NAS LU (“W” workload): PASSED  Showed communication bottlenecks very clearly Large(!) number of small messages Illustrated time taken for repartitioning data Shows sensitivity to latency for processors waiting on data from other processors

12 Bottleneck Identification Test Suite (2) Big message: PASSED  Traces illustrated large amount of time spent in send and receive Diffuse procedure: PASSED  Traces illustrated a lot of synchronization with one process doing more work  Since no source code correlation, hard to tell why problem existed Hot procedure: FAILED  CLOG trace file conversion failed (no communication events)  Even if trace loaded, no communication problems Intensive server: PASSED  Traces showed that other nodes were waiting on node 0 Ping pong: PASSED  Traces illustrated that the application was very latency-sensitive  Much time being spent on waiting for messages to arrive Random barrier: PASSED  Traces showed that one noe was doing more work than the others Small messages: PASSED  Traces illustrated a large number of messages being sent to node 0 System time: FAILED  CLOG trace file conversion failed (no communication events)  Even if trace loaded, no communication problems Wrong way: PASSED  First receive took a long time for message to arrive in trace

13 NAS LU (Class W) Visualization Much time taken for data redistribution Large number of small messages

14 General Comments Good things  Jumpshot-4 represents a well-written, scalable event-based tracefile viewer  Formats used by Jumpshot are well-defined  Low measurement overhead in MPICH  Mature GUI, few bugs, has been around for a long time in one form or another  To leverage, just need to write logfile in a specific format Things that could use improvement  Adding support for metrics other than events would require hacking SLOG-2 format E.g., how to support showing L-2 miss rates as time increases? Seems like it would be best used as part of our toolkit  Automatic instrumentation really necessary to make tool useful Jumpshot-4 can fit in our toolkit as an event-based tracefile viewer if we can easily write to a format it understands

15 Adding UPC/SHMEM Support At a minimum, need mechanism to output CLOG trace files  CLOG library currently uses many MPI calls E.g., MPI_Wtime for timing information Therefore, cannot just insert MPE logging calls and use the MPE library unmodified   However, CLOG format is defined Could (relatively) easily create a C implementation that used UPC calls instead of MPI calls Would need to come up with our own buffering scheme though  Can’t write files as data comes in, too slow  Should be able to steal a lot of code from MPE source  Not necessarily a problem, since we will most likely have to come up with a method if we go the tracing route anyways Could also use slog2sdk SDK kit for writing to SLOG-2 files directly, but  API in Java only  SLOG-2 may have larger creation overhead than simple event-based formats such as CLOG Several examples (and example C code) given for converting logfiles of arbitrary format to SLOG-2 format using slog2sdk  Can use our own log file format if needed!  Recommend going with CLOG though, so we can steal existing code

16 Evaluation (1) Available metrics: 1/5  Only communication-based metrics (timeline + histograms) available  Restricted to recording event-based metrics Cost: free 5/5 Documentation quality: 3.5/5  Jumpshot-4 has a very good but lengthy user’s manual  slog2sdk (SDK for reading/writing SLOG-2 files) is not very clear, although SLOG-2 is also described in a lengthy paper Extensibility: 3.5/5  Jumpshot-4 written in Java (easy to find Java coders at UF)  Can easily add new events using MPE library calls  Adding time-varying metrics (histograms, etc) would require writing code from scratch Filtering and aggregation: 3/5  Can restrict event types being displayed from trace  Preview drawables and histograms provide aggregation abilities  Does not filter or aggregate data directly when recording data Hardware support: 4/5  64-bit Linux (Opteron, Itanium), Tru64 (AlphaServer), IRIX, IBM SP (AIX), Cray MPI  Can be used with any MPICH or LAM installation many more Heterogeneity support: 0/5 (not supported)

17 Evaluation (2) Installation: 5/5  About as easy as you could expect  Zero effort if using MPICH already, compiling from source also easy Interoperability: 0.5/5  No way provided to export SLOG-2 files to other viewers  Example code provided in slog2sdk on how to convert existing formats into SLOG-2 format Learning curve: 4.5/5  Easy to learn, well-written documentation  MPE really easy to use ( mpicc -mpilog ) Manual overhead: 1/5  All MPI calls automatically instrumented for you when linking against MPE  Adding other events requires manual work (not much though)  No way to turn on/off tracing in places without recompilation Measurement accuracy: 5/5  CAMEL overhead < 1%  Correctness of programs not affected  Measurements seem accurate to millisecond (relies on MPI_Wtime resolution though)  Only large numbers of messages (10 6 or more back-to-back) or frequent barriers seem to introduce any appreciable overhead

18 Evaluation (3) Multiple executions: 0/5 (not supported) Multiple analyses & views: 2/5  Only shows timeline and histograms (but does both very well)  Excellent zooming and scrolling features (scalable to GB logfiles) Performance bottleneck identification: 4.5/5  No automatic methods supported  Traces do very good job of showing communication and synchronization bottlenecks  Can also use custom events to indirectly determine some types of bottlenecks (e.g., load imbalance) Profiling/tracing support: 3/5  Only supports tracing  Trace format compact & scalable so viewer can comfortably show GB logfiles  Automatic tracing is either entirely on or entirely off  Turning on/off manual tracing requires code modification and recompilation

19 Evaluation (4) Response time: 2/5  No results until after run  For an 850MB CLOG tracefile: Converting to SLOG-2 took 5 minutes Opening up 350MB SLOG-2 file took about 10 seconds  However, large trace files will be slower than a method that incorporates more filtering and aggregation Limitation of tracing method, not tool implementation Software support: 3/5  Supports C & Fortran  Tied closely to MPI applications  Supports linking with any library supported by GCC/platform C compiler, but linked libraries will not be profiled unless they contain MPI calls Source code correlation: 1/5  Not directly supported  Can correlate indirectly by using custom events at function entry/exit points Searching: 1.5/5  Only a simple search function available

20 Evaluation (5) System stability: 4.5/5  MPE very stable (no problems observed)  Jumpshot-4 has very few bugs (small ones exist but do not get in the way)  Extremely good for a freely-downloadable research project Technical support: 4/5  Jumpshot-4 does give very good error messages  Developers responded within 24 hours  Developers willing to help point us in the right direction for writing SLOG-2 files using their APIs