MpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida.

Slides:



Advertisements
Similar presentations
Μπ A Scalable & Transparent System for Simulating MPI Programs Kalyan S. Perumalla, Ph.D. Senior R&D Manager Oak Ridge National Laboratory Adjunct Professor.
Advertisements

Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
FIGURE 1-1 A Computer System
Intel Trace Collector and Trace Analyzer Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding.
C++ Crash Course Class 1 What is programming?. What’s this course about? Goal: Be able to design, write and run simple programs in C++ on a UNIX machine.
Introduction to High-Level Language Programming
PAPI Tool Evaluation Bryan Golden 1/4/2004 HCS Research Laboratory University of Florida.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.
Bug Localization with Machine Learning Techniques Wujie Zheng
11 July 2005 Tool Evaluation Scoring Criteria Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko,
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.
MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
CS 584. Performance Analysis Remember: In measuring, we change what we are measuring. 3 Basic Steps Data Collection Data Transformation Data Visualization.
Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
SvPablo Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
KOJAK Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat.
CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
Belgrade, 25 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Performance analysis Tools: a case study of NMMB on Marenostrum.
Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.
The Software Development Process
HPCToolkit Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
11 July 2005 Briefing on Tool Evaluations Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Overview of dtrace Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive.
TAU Evaluation Report Adam Leko, Hung-Hsun Su UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
Computer Software Types Three layers of software Operation.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
So, You Need to Look at a New Application … Scenarios:  New application development  Analyze/Optimize external application  Suspected bottlenecks First.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
MTA EXAM Software Testing Fundamentals : OBJECTIVE 6 Automate Software Testing.
August 12, 2004 UCRL-PRES Aug Outline l Motivation l About the Applications l Statistics Gathered l Inferences l Future Work.
Spring ’05 Independent Study Midterm Review Hans Sherburne HCS Research Laboratory University of Florida.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
UPC Status Report - 10/12/04 Adam Leko UPC Project, HCS Lab University of Florida Oct 12, 2004.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Introduction to HPC Debugging with Allinea DDT Nick Forrington
Tuning Threaded Code with Intel® Parallel Amplifier.
Profiling/Tracing Method and Tool Evaluation Strategy Summary Slides Hung-Hsun Su UPC Group, HCS lab 1/25/2005.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Performance Analysis Tools
Performance Analysis and optimization of parallel applications
Introduction to Computers
Presentation transcript:

mpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida

2 Basic Information Name:  mpiP Developer:  Jeffrey Vetter (ORNL), Chris Chambreau (LLNL) Current versions:  mpiP v2.8 Website:  Contacts:  Jeffrey Vetter:  Chris Chambreau:

3 mpiP: Lightweight, Scalable MPI Profiling mpiP is a simple lightweight tool for profiling Gathers information through the MPI profiling layer  Probably not good candidate to be extended for UPC or SHMEM Supports many platforms running Linux, Tru64, AIX, UNICOS, IBM BG/L Very simple to use, and output file is very easy to understand  Provides statistics for the top twenty MPI calls based on time spent in call, and total size of messages sent, also provides statistics for MPI I/O  Callsite traceback depth is variable to allow user to differentiate between and examine the behavior of routines that are wrappers for MPI calls A mpip viewer, Mpipview, is available as part of Tool Gear Some of its functionality is extended to developers through an API: stackwalking, address-to-source translation, symbol demangling, timing routines, accessing the name of the executable  These functions might be useful is source-code correlation is to be included in a UPC or SHMEM tool

4 What is mpiP Useful For? The data collect by mpiP is useful for analyzing the scalability of parallel applications. By examining the aggregate time and rank correlation of the time spent in each MPI call versus the total time spent in MPI calls while increasing the number of tasks, one can locate flaws in load balancing and algorithm design. This technique is described in [1] “Statistical Scalability Analysis of Communication Operations in Distributed Applications” –Vetter, J. & McCracken, M. The following are courtesy of [1]:

5 The Downside… mpiP does provide the measurements of aggregate callsite time, and total MPI call time necessary for computing the rank correlation coefficient mpiP does NOT automate the process of computing the rank correlation, which must utilize data from multiple experiments Equations for calculation of coefficients of correlation (linear and rank), care of [1]:

6 Partial Sample of mpiP Output

7 Information Provided by mpiP Information displayed in terms of task assignments, and callsites, which correspond to machines and MPI calls in source code arranged in the following sections:  Time per task (AppTime, MPITime, MPI%)  Location of callsite in source code (callsite, line#, parent function, MPI call)  Aggregate time per callsite (top twenty) (time, app%, MPI%, variance)  Aggregate sent message size per callsite (top twenty) (count, total, avg. MPI%)  Time statistics per callsite per task (all) (max, min, mean, app%, MPI%)  Sent message size statistics per callsite per task (all) (count, max, min, mean, sum)  I/O statistics per callsite per task (all) (count, max, min, mean, sum)

8 mpiP Overhead

9 Source Code Correlation in Mpipview

10 Bottleneck Identification Test Suite Testing metric: what did profile data tell us? CAMEL: TOSS-UP  Profile showed that MPI time is a small percentage of overall application time  Profile reveals some imbalance in the amount of time spent in certain calls, but doesn’t help the user understand the cause  Profile does not provide information about what occurs when execution is not in MPI calls.  Difficult to grasp overall program behavior from profiling information alone NAS LU: TOSS-UP  Profile reveals that a MPI function calls consume a significant portion of application time  Profile reveals some imbalance in the amount of time spent in certain calls, but doesn’t help the user understand the cause  Profile does not provide information about what occurs when execution is not in MPI calls.  Difficult to grasp overall program behavior from profiling information alone

11 Bottleneck Identification Test Suite (2) Big message: PASSED  Profile clearly shows that Send and Recv dominate the application time  Profiles shows a large number of bytes transfered Diffuse procedure: FAIL  Profile showed large amount of time spent in barrier  Time is diffused across processes  Profile does not show that in each barrier a single process is always delaying completion Hot procedure: FAIL  No profile output, due to no MPI calls (other than setup and breakdown Intensive server: PASSED  Profile showed one process spent very little time in MPI calls, while the remaining processes spent nearly all their time in Recvs  Profile showed one process sent an order of magnitude more data than the others, and spent far more time in Send Ping pong: PASSED  Profile showed time spent in MPI function calls dominated the total application time  Profile showed excessive number of Sends and Recvs with little load imbalance Random barrier: PASS  Profile shows that the majority of execution time is spent in Barrier called by processes not holding “potato” Small messages: PASS  Profile clearly shows single process spends almost all of the total application time in Recv, and recvs an excessive amount of messages sent by all the other processes System time: FAIL  No profile output, due to no MPI calls (other than setup and breakdown Wrong way: TOSS-UP  One process spends most of the execution time in sends the other spends most of the execution time in receives  Profile does not reveal the improperly ordered communication pattern

12 Evaluation (1) Available metrics: 1/5  Only provides a handful statistical information about time, message size, and frequency of MPI calls  No hardware counter support Cost: free 5/5 Documentation quality: 4/5  Though brief (a single webpage), documentation adequately covers installation and available functionality Extensibility: 2/5  mpiP is designed to use the MPI profiling layer so they would not be readily adapted to UPC or SHMEM and so it would be of little use  The source code correlation functions work well Filtering and aggregation: 2/5  mpiP was designed to be lightweight, and presents statistics for the top twenty callsites  Output size grows with number of tasks (machines) Hardware support: 5/5  64-bit Linux (Itanium and Opteron), IBM SP (AIX), AlphaServer (Tru64), Cray X1, Cray XD1, SGI Altix, IBM BlueGene/L Heterogeneity support: 0/5 (not supported)

13 Evaluation (2) Installation: 5/5  About as easy as you could expect Interoperability: 1/5  mpiP has it’s own output format Learning curve: 4/5  Easy to use  Simple statistics are easily understood Manual overhead: 1/5  All MPI calls automatically instrumented for you when linking against mpiP library  No way to turn on/off tracing in places without relinking Measurement accuracy: 4/5  CAMEL overhead: ~5%  Correctness of programs is not affected  Overhead is low (less than 7% for all test suite programs)

14 Evaluation (3) Multiple executions: 0/5 (not supported) Multiple analyses & views: 2/5  Statistics regarding MPI calls are displayed in output file  Source code location to callsite correlation provided by Mpipview Performance bottleneck identification: 2.5/5  No automatic methods supported  Some bottlenecks could be deduced by examining gathered statistics  Lack of trace information makes some bottlenecks impossible to detect Profiling/tracing support: 2/5  Only supports profiling  Profiling can be enabled for various regions of code by editing source code  Turning on/off profiling requires recompilation (a runtime environment variable for deactivating profiling is given in documentation, and acknowledged in the profile output file when set, but profiling is not disabled)

15 Evaluation (4) Response time: 3/5  No results until after run  Quickly assembles report at the end of experimentation run Searching: 0/5 (not supported) Software support: 3/5  Supports C, C++, Fortran  Supports a large number of compilers  Tied closely to MPI applications Source code correlation: 4/5  Line numbers of source code provided for each MPI callsites in output file  Automatic source code correlation provided by Mpipview System stability: 5/5  mpiP and Mpipview work very reliably Technical support: 5/5  Co-author, Chris Chambreau, responded quickly, and provided good information allowing us to correct a problem with one of our benchmark apps