CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.

Slides:

Advertisements

Similar presentations

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Advertisements

Automated Instrumentation and Monitoring System (AIMS)

MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.

Tools for applications improvement George Bosilca.

TRACK 2™ Version 5 The ultimate process management software.

1 CS 106, Winter 2009 Class 4, Section 4 Slides by: Dr. Cynthia A. Brown, Instructor section 4: Dr. Herbert G. Mayer,

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.

MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.

TRACK 3™ The ultimate process management software.

Intel Trace Collector and Trace Analyzer Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding.

1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.

Software design and development Marcus Hunt. Application and limits of procedural programming Procedural programming is a powerful language, typically.

C++ Crash Course Class 1 What is programming?. What’s this course about? Goal: Be able to design, write and run simple programs in C++ on a UNIX machine.

The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.

PAPI Tool Evaluation Bryan Golden 1/4/2004 HCS Research Laboratory University of Florida.

© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.

1 Shawlands Academy Higher Computing Software Development Unit.

Topics Introduction Hardware and Software How Computers Store Data

Chapter 8: Systems analysis and design

MCTS Guide to Microsoft Windows 7

UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.

What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.

MpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida.

MPE/Jumpshot Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.

© 2013 IBM Corporation IBM SmartCloud Monitoring Capacity Planner PowerVM Placement – TOPOLOGY VIEW Girish B Chandrasekhar.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

CprE 588 Embedded Computer Systems Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – System-Level.

Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.

9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.

11 July 2005 Tool Evaluation Scoring Criteria Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko,

MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.

Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,

Software Project Planning Defining the Project Writing the Software Specification Planning the Development Stages Testing the Software.

Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.

SvPablo Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:

Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.

The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.

240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.

KOJAK Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.

Belgrade, 25 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Performance analysis Tools: a case study of NMMB on Marenostrum.

Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:

Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.

The Software Development Process

HPCToolkit Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:

Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.

Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:

Overview of dtrace Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive.

Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:

M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.

Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.

21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.

Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.

1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.

UPC Status Report - 10/12/04 Adam Leko UPC Project, HCS Lab University of Florida Oct 12, 2004.

Project Planning Defining the project Software specification Development stages Software testing.

Observing the Current System Benefits Can see how the system actually works in practice Can ask people to explain what they are doing – to gain a clear.

1 CSC160 Chapter 1: Introduction to JavaScript Chapter 2: Placing JavaScript in an HTML File.

Profiling/Tracing Method and Tool Evaluation Strategy Summary Slides Hung-Hsun Su UPC Group, HCS lab 1/25/2005.

Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.

Lecture 1 Page 1 CS 111 Summer 2013 Important OS Properties For real operating systems built and used by real people Differs depending on who you are talking.

( ) 1 Chapter # 8 How Data is stored DATABASE.

SQL Database Management

MASS Java Documentation, Verification, and Testing

Topics Introduction Hardware and Software How Computers Store Data

Topics Introduction Hardware and Software How Computers Store Data

Presentation transcript:

CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

2 Basic Information Name: Dimemas, MPITrace, Paraver Developer: European Center for Parallelism of Barcelona Current versions:  MPITrace 1.1  Paraver 3.3  Dimemas 2.3 Website: Contact:  Judit Gimenez

3 DiP Overview DiP = Dimemas, Paraver  Toolset used for improving performance of parallel programs  Created by CEPBA ca. 1992/93, still in development  Has three main components: Trace collection  MPITrace for MPI programs  OMPTrace for OpenMP programs (not evaluated)  OMPITrace for hybrid OpenMP/MPI programs (not evaluated) Trace visualization: Paraver Trace simulation: Dimemas  Uses MPIDTrace for instrumentation Workflow encouraged by DiP  “Measure-modify” approach  Pictured right Write code Examine tracefile (Paraver) Hypothesize about bottlenecks Verify via simulation (Dimemas) Instrument (MPITrace) Fix bottlenecks Test new hypothesis

4 MPITrace Overview Automatically profiles all MPI commands using MPI profiling interface  Compilation command: mpicc -L/path/to/mpitrace/libs \ -L/path/to/papi/libs -lmpitrace -lpapi \  Can record other information too Hardware counters via PAPI ( MPItrace_counters ) Custom events ( MPItrace_event ) Requires special runtime wrapper script to produce tracefile  Command: mpitrace mpirun  mpitrace requires license to run mpitrace must be started from machine listed in license file

5 MPITrace Overview (2) After running mpitrace, several.mpit files are created (one per MPI process)  Collect them into a single tracefile with command: mpi2prv –syn *.mpit -syn flag necessary to line up events correctly (not mentioned in docs [1])  This command creates a single logfile (.prv ) and Paraver config file (.pcf ) .pcf file also contains names and colors of custom events Tracefile format  ASCII (plain text), well-documented (see [1])  Can get to be quite large.prv files can be converted to faster-loading, platform-dependent, undocumented binary format via prv2log command Was never able to get hardware counters working  Took several tries to get any tracefile to be created  PAPI installed with no problems on Kappas 1-8  No errors but no hardware counter events in tracefile!  Rest of review assumes that this can be fixed given enough time

6 MPITrace Overhead All programs executed correctly when instrumented Benchmarks marked with a star had high variability in execution time  Readings with stars probably not accurate Based on LU benchmark, expect ~ 30% tracing overhead  More communication == more overhead Wasn’t able to test overhead of hardware counter instrumentation

7 Paraver Overview Four main pieces of Paraver (see right):  Filtering  Semantic module  Visualization Graphical timeline Text Analysis (1D/2D) Complex piece of software!  Had to review several documents to get a feel for how to use [2, 3, 4, 5]  Tutorial short but not too clear  Reference manual best documentation, but lengthy Image courtesy [2]

8 Paraver: Process/Resource Models Process model (courtesy [3]) Resource model (courtesy [3])

9 Paraver: Graphical Timeline Graphic display uses standard timeline view  Event view similar to Jumpshot, Upshot, etc. (right, top)  Can also display time-varying data like global CPU utilization (right, bottom)  Tool can display more than one trace file at a time Uses “tape” metaphor instead of scrolling  Play, pause, rewind to beginning, fast forward to end  Cumbersome and nonintuitive  Breaks intuition of what scroll bars do (scroll bars do not scroll window)  Moving window creates animations which slows things down compared to regular scrolling  Interface is workable, but takes some getting used to Zooming always brings up another window  Quickly results in many open windows  This complexity handled by adding the a save/restore open windows function  Save/restore windows nice feature Interface is generally snappy Uses ugly widget set by today’s standards

10 Paraver: Text Views Provide very detailed information about trace files  Textual listing of events  Which events happen when Access by clicking on graphical timeline

11 Paraver: 1D/2D Analysis 1D Analysis (right, top)  Shows statistics about various types of events  Shown per thread as text or histogram 2D Analysis (right, bottom)  Shows statistics for 1 event type between Pairs of threads Item chosen by semantic module  Uses color to encode information (high variance, max/min) Analysis mode takes into account filter and semantic modules (described next)  Very complex and user-unfriendly, but  Allows complicated analyses to be performed, can easily reconstruct most “normal” profiling information

12 Paraver: Filter Module Filter module allows filtering of events before they are  Shown in the timeline  Processed by the semantic module  Analyzed by the 1D/2D analyzers Can filter events by communication parameters  Who sends/receives the message  Message tag (MPI tag)  Logical times (when send/receive functions are called) or physical times (when send/receive actually takes place)  Combination of ANDs/ORs from the above Also by user events  Type and/or value Interface for filtering events is straightforward

13 Paraver: Semantic Module Interface between raw tracefile data what user sees  Sits above filter, below visualization modules  Makes heavy use of the runtime/process model Uses 3 different methods for getting values  Work with the process model (next slide) Application, task, thread, and workload levels  Work with the available system resources (next slide) Node, CPU, and system levels  Combine different existing views E.g., combine TLB misses with loads for average TLB miss ratios In a few words: controls how trace file information is displayed  Flexible way of being able to display disparate types of information (communication vs. hardware counters)  Can take a lot of work to get Paraver to show what information you’re looking for  Saved window configurations can help greatly here (perform steps only once, use for all traces later on) Easily the most confusing aspect of Paraver  Documentation doesn’t necessarily help with this

14 Dimemas Overview Uses generic “network of SMPs” model to perform trace- driven simulation Outputs trace files that can be directly visualized by Paraver Uses different tracefile format for input than Paraver Was never able to get this to work  “dimemas” GUI crashed Java version works, but other problems exist….  “Dimemas” complained about missing license even though one was in $DIMEMAS_HOME/etc/license.dat  Need MPIDTrace?  Rest of evaluation based on available documentation [4, 5, 6]

15 Dimemas: Architectural/Process Model Simulated architecture: network of SMPs Parameters for interconnection network  Number of buses (models resource contention)  Bisection bandwidth of network  Full duplex/half duplex links (from node to bus) Parameters for nodes  Bandwidth and latency for intra-node communication  Latency for inter-node communication  Processor speed (uses linear speedup model) Parameters for existing systems are collected (manually) via microbenchmarks Uses the same process model as Paraver  Application (Ptask), task, thread levels  Can model MPI, OMP, hybrid models with this model Image courtesy [5]

16 Dimemas: Communication Model Figures to right illustrate timing information that is simulated Point-to-point communication model  Shown right top  Straightforward model based on latencies, bandwidth, and contention (bus model) Collective communication model  Shown right bottom  Implicit barrier before all collective operations  Two phases: Fan in Fan out  Collective communication time represented 3 ways (selected by user) Constant Linear Logarithmic User specifies parameters  Located in special Dimemas “database” text files  Existing set covers IBM SP, SGI Origin 2000, and a few others Image courtesy [5]

17 Dimemas: Accuracy, Other Features Accuracy  On trivial applications (ping-pong), expected error with correct parameters is less than 12% [4]  Collective communication model for MPI verified in [6] on NAS benchmark suite Most applications within 30% accuracy (IS.A.8 jumped to over 150% error) Other features  Critical path selection Starts at end, shows dependency path back to beginning of critical path  Sensitivity analysis (factorial analysis, vary parameters within 10%)  “What-if” analysis Can adjust the time taken for each function call to see what would happen if you could write a faster version Can also answer questions like “what would happen if we double our bandwidth?” Simulation time: unknown (not reported in any documentation)  Only communication events are simulated  Therefore, assume simulation time is proportional to amount of communication  Also, uses simple (coarse bus-based) contention model, so simulation times should be reasonable

18 Bottleneck Identification Test Suite Testing metric: what did trace visualization tell us (automatic instrumentation)?  Assumed a fully-functional installation of Paraver and Dimemas CAMEL: PASSED  Identified large number of small messages at beginning of program execution  Assuming hardware counters worked, could also identify sequential parts of algorithm (sort on node 0, etc) NAS LU (“W” workload): PASSED  Showed communication bottlenecks very clearly Large(!) number of small messages Illustrated time taken for repartitioning data Shows sensitivity to latency for processors waiting on data from other processors  Could use Dimemas to pinpoint latency problem by testing on ideal network with no/little latency  Moderately-sized trace file (62MB), loaded slowly (> 60 seconds) in Paraver

19 Bottleneck Identification Test Suite (2) Big message: PASSED  Traces illustrated large amount of time spent in send and receive Diffuse procedure: PASSED  Traces illustrated a lot of synchronization with one process doing more work  Since no source code correlation, hard to tell why problem existed Hot procedure: TOSS-UP  Assuming hardware counters work, would be easy to see extra CPU utilization  No source code correlation would make it difficult to pinpoint problem Intensive server: PASSED  Traces showed that other nodes were waiting on node 0 Ping pong: PASSED  Traces illustrated that the application was very latency-sensitive  Much time being spent on waiting for messages to arrive Random barrier: PASSED  Traces showed that one was doing more work than the others Small messages: PASSED  Traces illustrated a large number of messages being sent to node 0  Also illustrated overhead of instrumentation for writing tracefile information System time: FAILED  No way to tell system time vs. user time Wrong way: PASSED  First receive took a long time for message to arrive in trace

20 General Comments Very large learning curve  Complex software with lots of concepts  Concepts must be totally understood or The software doesn’t make sense The software seems like it has no functionality  Some “common” actions (e.g., view TLB cache misses) can be very difficult to do at first in Paraver Stored window configuration helps with this Older tools  Seem to have grown and gained features as the need for them arose  Lots of “cruft” and strange ways of presenting things  User interface clunky by today’s standards  User interface complicated by anyone’s standards!

21 General Comments (2) Trace-driven simulation: useful?  Can be useful for performing “what-if” studies and sensitivity analyses  But, still limited on what you can explore without modifying the application Can see what happens when a function can run twice as fast Can’t see effect of different algorithms without rerunning application Tools provide little guidance on what user should do next  Heavily reliant on skill of user to make efficient use of tools

22 Adding UPC/SHMEM Support Commercial tool!  No way to explicitly add support into Dimemas or Paraver for UPC or SHMEM  However, tools written using modular design Existing process and resource models can be used to model UPC and SHMEM applications Paraver and Dimemas do not need to explicitly support UPC and SHMEM, just trace files Assuming we have methods for instrumenting UPC and SHMEM code, all that is required is writing to the.prv file format  Documented!  Not sure about Dimemas’ trace file format…

23 Evaluation (1) Available metrics: 5/5  Can use PAPI and existing hardware counters  Paraver can combine trace information and give you just about any metric you can think of Cost: 1/5  For Paraver, Dimemas, and MPITrace, 1 seat: 2000 Euros (~$2,600) Documentation quality: 1/5  MPITrace: Inadequate documentation for Linux  Dimemas: Only tutorial available unless you want to read through conference papers and PhD theses  Paraver: User manual very thorough but technical and unclear  Many grammar errors impair reading! “temporal files” -> temporary files Many more… *Note: evaluated Linux version

24 Evaluation (2) Extensibility: 0/5  Commerical (no source), but  Can add new functions to semantic module for Paraver  Flexible design lets you support a wide variety of programming paradigms by using documented trace file format Filtering and aggregation: 5/5  Paraver has powerful filtering & aggregation capability  Filtering & aggregation only post-mortem, however Hardware support: 3/5  AlphaServer (Tru64), 64-bit Linux (Opteron, Itanium), IBM SP (AIX), IRIX, HP-UX  Most everything supported: Linux, AIX, IRIX, HP-UX  No Cray support Heterogeneity support: 0/5 (not supported)

25 Evaluation (3) Installation: 1/5  Linux installation riddled with errors and problems  PAPI dependency for hardware counters complicates things (needs kernel patch)  Have had the software over 2 months, still not working correctly  According to our contact, this is not normal, but other tools nowhere near as hard to install Interoperability: 1/5  No export interoperability with other tools  Apparently tools exist to import SDDF and other formats (but I couldn’t find them)  Can import UTE traces Learning curve: 1/5  All graphical interfaces have unintuitive interfaces  Software is complex, and tutorials do not lessen learning curve very much Manual overhead: 1/5  MPITrace only records MPI events  Linux needs extra instructions in source code to get hardware counter information  Need to relink or recode to turn tracing on or off Measurement accuracy: 4/5  CAMEL overhead: ~8%  Tracing overhead not negligible, but within acceptable limits  Dimemas accuracy decent, but good enough to do what Dimemas is intended for

26 Evaluation (4) Multiple executions: 1/5  Paraver supports displaying multiple tracefiles at the same time  This lets you relate different runs (with different parameters) to each other relatively easily Multiple analyses & views: 4/5  Semantic modules provide a convenient (if awkward) way of displaying different types of data  Semantic modules also allow the displaying of the same type of data in different ways  Analysis modules show statistical summary information over time ranges Performance bottleneck identification: 4.5/5  No automatic bottleneck identification  All the information you need to identify a bottleneck should be available between Paraver and Dimemas  However, much manual effort is needed to determine where bottlenecks are  Also, no information is related back to the source code level Profiling/tracing support: 2/5  Only supports tracing  Trace files can be quite large and can take some time to open Response time: 3/5  No data at all until after run has completed and tracefile has been opened  Dimemas requires simulation to fully finish and Paraver to open up the generated tracefile before information is shown to user

27 Evaluation (5) Searching: 3/5  Search features provided by Dimemas Software support: 3.5/5  MPI profiling library allows linking against any existing libraries  OpenMP, OpenMP+MPI programs also supported via add-on instrumentation libraries Source code correlation: 0/5  Not supported directly, can use user events to identify program phases System stability: 3/5  MPITrace stable (had no problems other than installation)  Paraver crashed relatively often (>= 1 time per hour)  Dimemas stability not tested Technical support: 3/5  Responses from contact within hours  Some problems not resolved quickly, though

28 References [1] “MPITrace tool version 1.1: User’s guide,” November [2] “Paraver version 2.1: Tutorial,” November [3] “Paraver version 3.1: Reference manual (DRAFT),” October [4] “DiP: A Parallel Program Development Environment,” Jesús Labarta et al. In proc. of 2 nd International EuroPar Conference (EuroPar 96), Lyon (France), August 1996.

29 References (2) [5] “Performance Prediction and Evaluation Tools,” Sergi Turell. PhD thesis, Universitat Politecnica de Catalunya, March [6] “Validation of Dimemas communication model for collective MPI communications,” S. Girona et al. In proc. of EuroPVM/MPI 2000, Balatonfüred, Lake Balaton, Hungary, September [7] “Introduction to Dimemas,” (tutorial).