HPCToolkit Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Configuration management
Intel® performance analyze tools Nikita Panov Idrisov Renat.
® IBM Software Group © 2010 IBM Corporation What’s New in Profiling & Code Coverage RAD V8 April 21, 2011 Kathy Chan
Automated Instrumentation and Monitoring System (AIMS)
Programming Logic and Design, Third Edition Comprehensive
MPI Program Performance. Introduction Defining the performance of a parallel program is more complex than simply optimizing its execution time. This is.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Intel Trace Collector and Trace Analyzer Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding.
Overview of Eclipse Parallel Tools Platform Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
A GUIDE TO SHAREPOINT 2007 CUSTOMIZATION OPTIONS Heather Solomon, WSS MVP.
Introduction to High-Level Language Programming
Introduction to Systems Analysis and Design Trisha Cummings.
High-Speed, High Volume Document Storage, Retrieval, and Manipulation with Documentum and Snowbound March 8, 2007.
A First Program Using C#
PAPI Tool Evaluation Bryan Golden 1/4/2004 HCS Research Laboratory University of Florida.
COMPUTER SOFTWARE Section 2 “System Software: Computer System Management ” CHAPTER 4 Lecture-6/ T. Nouf Almujally 1.
Where Innovation Is Tradition SYST699 – Spec Innovations Innoslate™ System Engineering Management Software Tool Test & Analysis.
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
MpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida.
MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Tutorial 121 Creating a New Web Forms Page You will find that creating Web Forms is similar to creating traditional Windows applications in Visual Basic.
Grob Systems, Inc., the customer, requires an industrial computer system that will have a function of acquiring raw data, processing the data, presenting.
Configuration Management (CM)
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
Component-Based Programming with Streams Philip Garcia University of Wisconsin - Madison Johannes Helander Microsoft Research.
11 July 2005 Tool Evaluation Scoring Criteria Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko,
VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.
March 17, 2005 Roadmap of Upcoming Research, Features and Releases Bart Miller & Jeff Hollingsworth.
MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
Software Quality Assurance
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
SvPablo Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Tutorial 8 Designing a Web Site with Frames. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Explore the uses of frames.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
CS CS CS IA: Procedural Programming CS IB: Object-Oriented Programming.
® IBM Software Group © 2006 IBM Corporation PurifyPlus on Linux / Unix Vinay Kumar H S.
KOJAK Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Overview of dtrace Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive.
TAU Evaluation Report Adam Leko, Hung-Hsun Su UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
J.P. Wellisch, CERN/EP/SFT SCRAM Information on SCRAM J.P. Wellisch, C. Williams, S. Ashby.
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Spring ’05 Independent Study Midterm Review Hans Sherburne HCS Research Laboratory University of Florida.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
Profiling/Tracing Method and Tool Evaluation Strategy Summary Slides Hung-Hsun Su UPC Group, HCS lab 1/25/2005.
Visual Programming Borland Delphi. Developing Applications Borland Delphi is an object-oriented, visual programming environment to develop 32-bit applications.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
SOFTWARE TESTING TRAINING TOOLS SUPPORT FOR SOFTWARE TESTING Chapter 6 immaculateres 1.
Distributed Shared Memory
Performance Analysis and optimization of parallel applications
Presentation transcript:

HPCToolkit Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

2 Basic Information Name: HPCToolkit Developer: Rice University Current versions:  HPCView: Website:  Contact:  John Mellor-Crummey  Rob Fowler

3 HPCToolkit Overview HPCToolkit - A suite of tools that aid the programmer in collecting, organizing, and displaying profile data Consists of  hpcviewer Sorts by any collected metric, from any processes displayed Displays samples at various levels in call hierarchy through “flattening” Allows user to focus in on interesting sections of the program through “zooming”  hpcquick Simplifies process by integrating hpcprof and hpcview  hpcview Creates “browsable” performance databases in html, or for use in hpcviewer  bloop Relate samples to loops, even if significant changes have been made by optimization  hpcprof Relates samples to source lines  hpcrun collects profiles by sampling hardware performance counters

4 Available Metrics in HPCToolkit Metrics, obtained by sampling/profiling  PAPI Hardware counters  Any other source for data profiles that can output data in “profile-like input format” (not tested) Wallclock time (WALLCLK)  However, can’t get PAPI metrics and Wallclock time in a single run Derived metrics  Combination of existing metrics created by specifying a mathematical formula in an XML configuration file. Source Code Correlation  Metrics reflect exclusive time spent in function based on counter overflow events  Metrics correlated at the source line level and the loop level  Metrics are related back to source code loops (even if code has been significantly altered by optimization) (“bloop”)

5 Main Window in hpcviewer

6 Testing Notes Used LAM instead of mpich for testing  When MPICH mpirun used with hpcrun, hpcrun complains about a “– p” option, even though it was not given  Needed to reduce size of message in big-message.c because of LAM  Unable to get NBP - LU to run using LAM Major stumbling blocks of hpctoolkit bottleneck identification  Since profile data is not related back to the callsite in the user’s code, but rather the actual function, it is difficult to determine where in the user’s code the problem lies  Profiling recording wallclock time was glitchy, some profiles contained very little useful information

7 HPCToolkit Overhead All programs executed correctly when instrumented < 20 % overhead on all benchmarks when recording just PAPI_TOT_CYC (default option)

8 Bottleneck Identification Test Suite Testing metric: what did profile data tell us? CAMEL: TOSS-UP  Profile showed work equally distributed across the processes  Unable to determine communication costs from PAPI hardware counters NAS LU: NOT TESTED  Unable to get LU benchmark to run successfully using LAM  needed to use LAM because could not get MPICH to work with hpcrun Big message: TOSS-UP  Profiling wallclock time didn’t produce a profile with information in it  Cycle count is misleading and doesn’t reveal time spent in communication

9 Bottleneck Identification Test Suite (2) Diffuse procedure: PASSED  Profile showed large amount of time spent in bottleneck procedure  Time is diffused across processes Hot procedure: PASSED  Profile showed large amount of time spent in bottleneck procedure Intensive server: TOSS-UP  Profile showed large amount of time spent in waste_time on on one process  The other processes show time spent in functions outside of user code, which is difficult to use for bottleneck identification Ping pong: TOSS-UP  From profile it’s clear that within user code, the time is spent in two different loops  Profile shows time spent in functions outside of user code, which is difficult to use for bottleneck identification Random barrier: TOSS-UP  Profile shows lots of time spent in waste_time  Profile does not show communication pattern amongst processes Small messages: TOSS-UP  Profile reveals only one process spends time in Grecv_messages  Profile shows time spent in functions outside of user code, which is difficult to use for bottleneck identification System time: TOSS-UP  Profile show lots of time spent in kill and execlp  It’s difficult to relate this information back to the call site in waste-time Wrong way: FAIL  Profile does not show communication pattern amongst processes  Profile shows time spent in functions outside of user code, which is difficult to use for bottleneck identification

10 General Comments Good notes  The components of HPCToolkit work well for sequential code.  Have access to available (native event) PAPI counters on the system.  Can derive new metrics from sampled metrics using hpcview  Data is correlated with source code Things that could use improvement  Only simple display of profiled metrics and source code correlation is provided  Whether a metric should be created, hidden, or shown in hpcviewer must be specified before it is run  Collection of multiple metrics may require multiple runs  Parallel code may be difficult to analyze Different methods for launching parallel programs achieve varying levels of ease and usefulness with hpcrun Requires that line mapping information be present in all executables/libraries to be analyzed (“-g” option in many compilers) The ability to display inclusive time spent at callsites in user code, rather than exclusive time spent in all (library) functions, would increase the usefulness of the tool tremendously

11 Evaluation (1) Available metrics: 2/5  Uses hardware counters only (PAPI)  New metrics can be derived from existing ones  No statistics regarding communication are provided Cost: 5/5  HPCToolkit is freely available Documentation quality: 2.5/5  Documentation is in the form of a ppt presentation, and man pages  One comprehensive user manual would be helpful Extensibility: 3.5/5  HPCToolkit source code is freely available  No tracing support  Very good source code correlation  Requires the use of PAPI for hpcrun (profile creation) Filtering and aggregation: 3.25/5  User can add and hide columns  Filtering requires manual editing and can only be done on a per-node basis

12 Evaluation (2) Hardware support: 2/5  64-bit Linux (Opteron and Itanium) w/PAPI, IRIX, AlphaServer (Tru64) Heterogeneity support: 0/5 (not supported) Installation:4/5  Installation on Linux platform not bad  Requires PAPI to be installed Interoperability: 3/5  Profile data stored in XML format  Works with SGI’s ssrun and Compaq’s uprofile Learning curve: 3.5/5  The interface is fairly intuitive, but takes some use to get comfortable with the notion of “flattening”  The separation of the tools for platform support causes increase user overhead Manual overhead: 4/5  Default instrumentation (only option available) has the same effect as instrumenting all functions, loops, MPI calls, and function calls  It is fairly straightforward to measure at the source line and loop level  It is not possible to turn on and off sampling for selected parts of the source code  Specifying derived functions in XML is awkward Measurement accuracy: 2/5  CAMEL overhead: 17%  Overhead is less than 20% when recording a single PAPI hardware counter

13 Evaluation (3) Multiple executions: 3/5  Comparison of metrics from multiple runs is possible  There is not built-in scalability or optimization comparison, but one can be created using MathML expressions Multiple analyses & views: 2/5  A single view of profile data correlated with source code is provided  Only profile data (not trace data) is viewable  Comparison and ordering of hardware counter values is the only form of analysis Performance bottleneck identification: 1/5  All metrics can be sorted in increasing or decreasing order  “Flattening” approach increases ease of comparison some  Bottleneck identification requires significant user insight when selecting which hardware counters to use, and in locating points for improvement  MPI time sometimes not attributed to MPI callsites (instead was attributed to internal LAM MPI routines)  Seems better suited to sequential programs Profiling/tracing support: 2.5/5  Only profiling is supported  Hardware counters must be used  Profiling is done on source line, and loop level  Communication profiling is not available  Data from routines inside third-party libraries can be recorded

14 Evaluation (4) Response time: 2.5/5  Data is not available in HPCToolkit until after execution completes and performance data is processed Searching: 0/5 (not supported) Software support: 4/5  Supports sequential and parallel programs  Difficulty running with MPICH, though it is mentioned in tutorial presentation  Profile information will show up for all binaries with debugging information present Source code correlation: 5/5  Source code correlation of profile data is the main view offered System stability: 4/5  Hpcviewer works well  Did not work well with MPICH Technical support: 4/5  Received timely & helpful response from developers

15 References 1. HPCToolkit website  2. HPCToolkit SC Tutorial Presenation 