Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Automated Instrumentation and Monitoring System (AIMS)
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
Input-output and Communication Prof. Sin-Min Lee Department of Computer Science.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
MPI Program Performance. Introduction Defining the performance of a parallel program is more complex than simply optimizing its execution time. This is.
Instrumentation and Measurement CSci 599 Class Presentation Shreyans Mehta.
Intel Trace Collector and Trace Analyzer Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding.
Page 1 © 2001 Hewlett-Packard Company Tools for Measuring System and Application Performance Introduction GlancePlus Introduction Glance Motif Glance Character.
System Calls 1.
Tool Visualizations, Metrics, and Profiled Entities Overview Adam Leko HCS Research Laboratory University of Florida.
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
MpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida.
Survey of Performance Evaluation Tools Last modified: 10/18/05.
Waleed Alkohlani 1, Jeanine Cook 2, Nafiul Siddique 1 1 New Mexico Sate University 2 Sandia National Laboratories Insight into Application Performance.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
Review of Memory Management, Virtual Memory CS448.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
11 July 2005 Tool Evaluation Scoring Criteria Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko,
The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D
VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.
MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.
CS 584. Performance Analysis Remember: In measuring, we change what we are measuring. 3 Basic Steps Data Collection Data Transformation Data Visualization.
Application performance and communication profiles of M3DC1_3D on NERSC babbage KNC with 16 MPI Ranks Thanh Phung, Intel TCAR Woo-Sun Yang, NERSC.
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
Learningcomputer.com SQL Server 2008 – Profiling and Monitoring Tools.
SvPablo Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
1 Performance Analysis with Vampir ZIH, Technische Universität Dresden.
Using parallel tools on the SDSC IBM DataStar DataStar Overview HPM Perf IPM VAMPIR TotalView.
Performance Analysis Tool List Hans Sherburne Adam Leko HCS Research Laboratory University of Florida.
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
KOJAK Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Belgrade, 25 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Performance analysis Tools: a case study of NMMB on Marenostrum.
Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
HPCToolkit Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
NUG Meeting Performance Profiling Using hpmcount, poe+ & libhpm Richard Gerber NERSC User Services
Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Morgan Kaufmann Publishers
CPE 631 Project Presentation Hussein Alzoubi and Rami Alnamneh Reconfiguration of architectural parameters to maximize performance and using software techniques.
TAU Evaluation Report Adam Leko, Hung-Hsun Su UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
UNIX Unit 1- Architecture of Unix - By Pratima.
So, You Need to Look at a New Application … Scenarios:  New application development  Analyze/Optimize external application  Suspected bottlenecks First.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
Projections - A Step by Step Tutorial By Chee Wai Lee For the 2004 Charm++ Workshop.
Parallel Computing Presented by Justin Reschke
1 Advanced MPI William D. Gropp Rusty Lusk and Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory.
Presented by Jack Dongarra University of Tennessee and Oak Ridge National Laboratory KOJAK and SCALASCA.
Tuning Threaded Code with Intel® Parallel Amplifier.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
SLC/VER1.0/OS CONCEPTS/OCT'99
Chapter 3: Process Concept
CSCI1600: Embedded and Real Time Software
Understanding Performance Counter Data - 1
CSCI1600: Embedded and Real Time Software
What Are Performance Counters?
Presentation transcript:

Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

2 Summary Give characteristics of existing tools to aid our design discussions  Metrics (what is recorded, any hardware counters, etc)  Profiled entities  Visualizations Most information & some slides taken from tool evaluations Tools overviewed  TAU  Paradyn  MPE/Jumpshot  Dimemas/Paraver/MPITrace  mpiP  Dynaprof  KOJAK  Intel Cluster Tools (old Vampir/VampirTrace)  Pablo  MPICL/Paragraph

3 TAU Metrics recorded  Two modes: profile, trace Profile mode  Inclusive/exclusive time spent in functions  Hardware counter information  PAPI/PCL: L1/2/3 cache reads/writes/misses, TLB misses, cycles, integer/floating point/load/store/stalls executed, wall clock time, virtual time  Other OS timers (gettimeofday, getrusage)  MPI message size sent Trace mode  Same as profile (minus hardware counters?)  Message send time, message receive time, message size, message sender/recipient(?) Profiled entities  Functions (automatic & dynamic), loops + regions (manual instrumentation)

4 TAU Visualizations  Profile mode Text-based: pprof, shows a summary of profile information Graphical: racy (old), jracy a.k.a. paraprof  Trace mode No built-in visualizations Can export to CUBE (see KOJAK), Jumpshot (see MPE), and Vampir format (see Intel Cluster Tools)

5 Paradyn Metrics recorded  Number of CPUs, number of active threads, CPU and inclusive CPU time  Function calls to and by  Synchronization (# operations, wait time, inclusive wait time)  Overall communication (# messages, bytes sent and received), collective communication (# messages, bytes sent and received), point-to-point communication (# messages, bytes sent and received)  I/O (# operations, wait time, inclusive wait time, total bytes)  All metrics recorded as “time histograms” (fixed-size data structure) Profiled entities  Functions only (but includes functions linked to in existing libraries)

6 Paradyn Visualizations  Time histograms  Tables  Barcharts  “Terrains” (3-D histograms)

7 MPE/Jumpshot Metrics collected  MPI message send time, receive time, size, message sender/recipient  User-defined event entry & exit Profiled entities  All MPI functions  Functions or regions via manual instrumentation and custom events Visualization  Jumpshot: timeline view (space-time diagram overlaid on Gantt chart), histogram

8 Dimemas/Paraver/MPITrace Metrics recorded (MPITrace)  All MPI functions  Hardware counters (2 from the following two lists, uses PAPI) Counter 1  Cycles  Issued instructions, loads, stores, store conditionals  Failed store conditionals  Decoded branches  Quadwords written back from scache(?)  Correctible scache data array errors(?)  Primary/secondary I-cache misses  Instructions mispredicted from scache way prediction table(?)  External interventions (cache coherency?)  External invalidations (cache coherency?)  Graduated instructions  Counter 2 Cycles Graduated instructions, loads, stores, store conditionals, floating point instructions TLB misses Mispredicted branches Primary/secondary data cache miss rates Data mispredictions from scache way prediction table(?) External intervention/invalidation (cache coherency?) Store/prefetch exclusive to clean/shared block

9 Dimemas/Paraver/MPITrace Profiled entities (MPITrace)  All MPI functions (message start time, message end time, message size, message recipient/sender)  User regions/functions via manual instrumentation Visualization  Timeline display (like Jumpshot) Shows Gantt chart and messages Also can overlay hardware counter information  Clicking on timeline brings up a text listing of events near where you clicked  1D/2D analysis modules

10 mpiP Metrics collected  Start time, end time, message size for each MPI call Profiled entities  MPI function calls + PMPI wrapper Visualization  Text-based output, with graphical browser that displays statistics in-line with source  Displayed information: Overall time (%) for each MPI node Top 20 callsites for time (MPI%, App%, variance) Top 20 callsites for message size (MPI%, App%, variance) Min/max/average/MPI%/App% time spent at each call site Min/max/average/sum of message sizes at each call site  App time = wall clock time between MPI_Init and MPI_Finalize  MPI time = all time consumed by MPI functions  App% = % of metric in relation to overall app time  MPI% = % of metric in relation to overall MPI time

11 Dynaprof Metrics collected  Wall clock time or PAPI metric for each profiled entity  Collects inclusive, exclusive, and 1-level call tree % information Profiled entities  Functions (dynamic instrumentation) Visualizations  Simple text-based  Simple GUI (shows same info as text-based)

12 KOJAK Metrics collected  MPI: message start time, receive time, size, message sender/recipient  Manual instrumentation: start and stop times  1 PAPI metric / run (only FLOPS and L1 data misses visualized) Profiled entities  MPI calls (MPI wrapper library)  Function calls (automatic instrumentation, only available on a few platforms)  Regions and function calls via manual instrumentation Visualizations  Can export traces to Vampir trace format (see ICT)  Shows profile and analyzed data via CUBE

13 Intel Cluster Tools (ICT) Metrics collected  MPI functions: start time, end time, message size, message sender/recipient  User-defined events: counter, start & end times  Code location for source-code correlation Instrumented entities  MPI functions via wrapper library  User functions via binary instrumentation(?)  User functions & regions via manual instrumentation Visualizations  Different types: timelines, statistics & counter info

14 Pablo Metrics collected  Time inclusive/exclusive of a function  Hardware counters via PAPI  Summary metrics computed from timing info Min/max/avg/stdev/count Profiled entities  Functions, function calls, and outer loops  All selected via GUI Visualizations  Displays derived summary metrics color-coded and inline with source code

15 MPICL/Paragraph Metrics collected  MPI functions: start time, end time, message size, message sender/recipient  Manual instrumentation: start time, end time, “work” done (up to user to pass this in) Profiled entities  MPI function calls via PMPI interface  User functions/regions via manual instrumentation Visualizations  Many, separated into 4 categories: utilization, communication, task, “other”

16 ParaGraph visualizations Utilization visualizations  Display rough estimate of processor utilization  Utilization broken down into 3 states: Idle – When program is blocked waiting for a communication operation (or it has stopped execution) Overhead – When a program is performing communication but is not blocked (time spent within MPI library) Busy – if execution part of program other than communication  “Busy” doesn’t necessarily mean useful work is being done since it assumes (not communication) := busy Communication visualizations  Display different aspects of communication  Frequency, volume, overall pattern, etc.  “Distance” computed by setting topology in options menu Task visualizations  Display information about when processors start & stop tasks  Requires manually instrumented code to identify when processors start/stop tasks Other visualizations  Miscellaneous things