Paradyn
Paradyn Goals Performance measurement tool that –scales to long-running programs on large parallel and distributed systems –automates much of the search for performance bottlenecks –avoids space and time overhead of trace-based tools
Paradyn Approach Dynamically instrument application Automatically control instrumentation in search of performance problems Look for high level problems (e.g., too much synchronization blocking, I/O blocking, or memory delays) using small amount of instrumentation Once general problem is found, selectively insert more instrumentation to find specific causes
Paradyn Components Front end and user interface that allow user to –display performance visualization –use the Performance Consultant to find bottlenecks –start and stop the application –monitor status of the application Paradyn daemons –monitor and instrument application processes –pvmd, mpid, winntd
Using Paradyn Program preparation: –Future releases will be able to instrument unmodified binary files –Current release 2.0 requires linking applications with Paradyn instrumentation libraries –Static linking is required on IBM AIX platforms –Application must be compiled with -g flag
Paradyn Run-time Analysis Paradyn is designed to either start up application processes and kill them upon exit, or to attach to and detach from running (or stopped) processes. –Attaching to a running process is currently implemented on Solaris. –Paradyn currently does not detach but only kills upon exit.
Metric-Focus Pairs Metric-focus grid based on two vectors –list of performance metrics (e.g., CPU time, blocking time, message rates, I/O rates) –list of program components (e.g., procedures, processes, message channels, barrier instances) Cross product forms matrix from which user selects metric-focus pairs Elements of matrix can be single-valued (e.g., current value, average, min, max) or time-histograms Time-histogram is a fixed size data structure that records behavior of a metric over time
“Where” Axis After loading program, Paradyn adds entries for program resources to Where Axis window –files –procedures –processes –machines
Multiple foci selection on Where Axis
Performance Visualizations Before or while running a program, the user can define performance visualizations in terms of metric-focus pairs –select focus from Where Axis –select metrics from Metrics Dialog Box –select visualization from Start Visualization Menu
Metrics Dialog Box
Start Visualization Menu
Paradyn Phases Contiguous time intervals within an application’s execution Two kinds – global phase starts at beginning of program execution and extends to current time –local phases non-overlapping subintervals of the global phase
Paradyn Phases (cont.) Data collection for new phase occurs at finer granularity than for global phase. Visualizations can show data for either local phase or global phase. Performance Consultant can simultaneously search both local phase and global phase.
Performance Consultant Based on W3 Search Model –“Why” - type of performance problems –“Where” - where in the program these problems occur –“When” - time during execution during which problems occur
Performance Consultant (cont.) Automatically locates potential bottlenecks in your application –Contains definitions of a set of performance problems in terms of hypotheses - e.g., PerfMetricX > Specified Threshold –Continually selects and refines which performance metrics are enabled and for which foci –Reports bottlenecks that exist for significant portion of phase being measured
Why Axis ExcessiveIOBlockingTime TopLevelHypothesis TooManySmallIOOps ExcessiveSyncWaitingTime CPUBound
Why Axis (cont.) CPUBound: Compares CPU time to the tunable constant PC_CPUThreshold ExcessiveSyncTime: Compares total synchronization waiting time to the tunable constant PC_SyncThreshold ExcessiveIOBlockingTime: Compares total I/O waiting time to the tunable constant PC_IOThreshold TooManySmallIOOps: Compares average number of bytes per I/O operation to PC_IOThreshold
Search History Graph DAG with (hypothesis : focus) pairs as nodes Top node represents (TopLevelHypothesis : WholeProgram) Child nodes represent possible refinements Search is expanded dantime a (hypothesis : focus) pair tests true
Search History Graph (cont.) Node status given by color –green background indicates Unknown status –white foreground indicates active test –pink background indicates hypothesis tested false –blue background indicates hypothesis tested true –yellow line represents Why Axis refinement –purple line represents Where Axis refinement
Search History Graph as search begins
Refinement to CPUbound hypothesis
Further refinement of CPUbound hypothesis
Two searches in progress
Final refinement
Tunable Constants PC_CPUThreshold: used for hypothesis CPUBound PC_SyncThreshold: used for hypothesis ExcessiveSyncWaitingTime PC_IOThreshold: used for hypothesis ExcessiveIOBlockingTime MinObservationTime: all tests will be continued for at least this interval of time before any conclusions are drawn. costLimit: determines an upper bound on the total amount of instrumentation that can be active at a given time.
Visualization Modules (visi’s) External processes that use VisiLib RPC interface to access performance data in real time Visi’s provided with Paradyn –time-histogram –bar chart –table –3-d terrain
Time Histogram with Actions and View menus expanded
Barchart Visualization
Table Visualization
3-d Histogram Visualization
Dyninst API Machine-independent interface for runtime program instrumentation Insertion and removal of instrumentation code into and from running processes Process and OS independent specification of instrumentation code C++ library interface Can be used to build debuggers, performance measurement tools, simulators, and computation steering systems
Dyninst API (cont.) Currently supported platforms –SPARC SunOS and Solaris –x86 Solaris and NT –IBM AIX/SP –DEC Alpha Planned for near future –SGI Origin 2000
Dyninst Terminology point - location in a program where instrumentation can be inserted snippet - representation of a bit of executable code to be inserted into a program at a point e.g., To record number of times a procedure is invoked: –point - first instruction in the procedure –snippet - statement to increment a counter
Dyninst Terminology (cont.) thread - thread of execution, which may be a normal process or a lightweight thread image - static representation of a program on disk application - process being modified mutator - program that uses the API to modify the application
Using the dyninst API Declare single object of class Bpatch Identify application process to be modified –appThread = bpatch.createProcess(pathname, argv); –appThread = bpatch.attachProcess(pathname, processId) Define snippet and points where it should be inserted
Dyninst Example Bpatch_image *appImage; Bpatch_Vector(Bpatch_point*) *points; // Open the program image associated with the thread and return a handle to it. appImage = appThread->getImage(); // find and return the entry point to the “InterestingProcedure”. Points = appImage->findProcedurePoint(“InterestingProcedure”, Bpatch_entry); // create a counter variable (but first get a handle to the correct type). Bpatch_variableExpr *intCounter = appThread->malloc(*appImage->findType(“int”)); // create a code block to increment the integer by one. // intCounter = intCounter + 1 // Bpatch_arithExpr addone(Bpatch_assign, *intCounter, Bpath_arithExpr(Bpatch_plus, *intCounter, Bpatch_constExpr(1))); // insert the snippet of code into the application. appThread->insertBlock(addone, *points);
DAIS Dynamic Application Instrumentation System Proposed by Douglas Pase at IBM Platform-independent client-server library for building debugging and performance tools Based on dyninst
DAIS (cont.) Support proposed for –code patches –periodic instrumentation –inferior remote procedure calls (IRPCs) –remote memory reads and writes –dynamic subroutine placement –process control for debugging Planned demo tools –dynamic printf –trace capture for MPI