5/30/00CSE 225 Performance Analysis Tools Nadya Williams Spring, 2000 UCSD.

5/30/00CSE 225 Performance Analysis Tools Nadya Williams Spring, 2000 UCSD

5/30/00CSE 225 Outline Background Performance measurement SvPablo Autopilot Paradyn XPVM

5/30/00CSE 225 Background Goal - high performance computing for applications that are distributed: –by design, e.g. collaborative environments, distributed data analysis, computer-enhanced instruments –by implementation, e.g. metacomputing, high- throughput computing Goal - to achieve & maintain performance guarantees in heterogeneous, dynamic environments

5/30/00CSE 225 Background Performance-robust grid applications need to Identify resources required to meet application performance requirements Select from problem specification, algorithm & code variants Establish hierarchical performance contracts Select and manage adaptation strategies when performance contracts are violated

5/30/00CSE 225 Viz Engine Viz Engine Computational grids Visualization and Steering MPP Real-time Data Analysis Network Shared resources –computation, network, and data archives

5/30/00CSE 225 Complexity Emerging applications are dynamic –time varying resource demands –time varying resource availability –heterogeneous execution environments –geographically distributed Display and analysis hierarchy –code, thread, process, processor –system and local area network –national/international network

5/30/00CSE 225 Grid performance challenges Wide area infrastructure Many resource models Behavioral variability –complex applications, diverse systems and networks –irreproducible behavior Heterogeneous applications –multilingual and multimodel –real-time constraints and shared resources Prediction & scheduling

5/30/00CSE 225 Performance analysis The ability to –capture –analyze –present –optimize Multiple analysis levels –hardware –system software –runtime systems –libraries –applications Good tools must accommodate all

5/30/00CSE 225 Real-time Multilevel Analysis Multilevel Drilldown –multiple sites –multiple metrics –real-time display Problems –uncertainty and perturbation –confusion of cause and effect

5/30/00CSE 225 Guidelines Design for locality –regardless of programming model –threads, MPI, data parallel -- it’s the same Recognize historical models –large codes develop over time –assumptions change Think about more than FLOPS –I/O, memory, networking, user interfaces

5/30/00CSE 225 Initial steps Develop infrastructure for structural and performance information Provide instrumentation of end-user applications & communication libraries Study performance characteristics of real grid applications

5/30/00CSE 225 Peak and Sustained Performance Peak performance –perfect conditions Actual performance –considerably less Environment dictates performance –locality really matters –we must design for performance stability –more of less may be better than less of more

5/30/00CSE 225 Instrumentation approaches At least four major techniques –profiling –counting –interval timing –event tracing Each strikes a different balance –detail and insight –measurement perturbation Understand overheads and benefits

5/30/00CSE 225 Measurement developments Hardware counters –once rare (Cray), now common (Sun, IBM, Intel, Compaq) –metrics operation types memory stalls Object code patching –run-time instrumentation Compiler integration –inverse compiler transformations –high-level language analysis

5/30/00CSE 225 Correlating semantic levels Performance measurements –capture behavior of executing software –reflect output of multi-level transformations Performance tools –must relate data to “user” semantic model cache miss ratios cannot help a MATLAB user message counts cannot help an HPF user –should suggest possible performance remedies

5/30/00CSE 225 Analysis developments Visualization techniques –traces and statistics Search and destroy –AI suggestions and consultants –critical paths and zeroing Data reduction and processing –statistical clustering/projection pursuit –neural net, and time series classification Real-time control –sensor/actuator models

5/30/00CSE 225 Performance tool checkpoint An incomplete view … –representative techniques and tools Major evolution –from architectural views/post-mortem analysis –to deeper correlation and derived metrics Key open problems –adaptivity –scale –semantic correlation

5/30/00CSE 225 Representative vendor tools IBM VT™ “ParaGraph” trace display and statistical metrics Silicon Graphics Speedshop™ R10000, R12000 hardware counter tools Pallas Vampir™ event tracing and display tools Cray ATExpert™ (autotasking) basic AI suggestions for tuning Intel SPV™ ParaGraph and hardware counter displays TMC/SUN Prism™ data parallel and message passing analysis

5/30/00CSE 225 Representative research tools Illinois SvPablo™ –performance data metaformat –Globus integration (sensor/actuator control) Illinois Autopilot™ –performance steering Wisconsin Paradyn™ –runtime code patching –performance consultant Oak Ridge National Lab XPVM –X Windows based, graphical console and monitor for PVM

5/30/00CSE 225 Outline Background Performance measurement SvPablo Autopilot Paradyn

5/30/00CSE 225 Department of Computer Science University of Illinois at Urbana-Champaign SvPablo: Graphical source code browser for performance tuning and visualization

5/30/00CSE 225 SvPablo Outline Background SvPablo overview SvPablo model Automatic/Interactive instrumentation of programs The Pablo Self-Defining Data Format

5/30/00CSE 225 SvPablo Background Motivations –emerging high-level languages (HPF and HPC++) –aggressive code transformations for parallelism –large semantic gap between user and code Goals –relate dynamic performance data to source –hide semantic gap –generate instrumented executable/simulated code –support performance scalability predictions

5/30/00CSE 225 Background Tools should provide the performance data and suggestions for performance improvements at the level of an abstract, high-level program Tools should integrate dynamic performance data with information recorded by the compiler that describes the mapping from the high-level source to the resulting low-level explicitly parallel code

5/30/00CSE 225 SvPablo overview A graphical user interface tool for: source code instrumentation browsing runtime performance data Two major components: performance instrumentation libraries performance analysis and presentation Provides: performance data capture analysis presentation

5/30/00CSE 225 SvPablo overview Instrumentation –automatic HPF (from PGI) –interactive ANSI C Fortran 77 Fortran 90 Data capture –dynamic software statistics (no traces) –SGI R10000 counter values

5/30/00CSE 225 SvPablo overview Source code instrumentation –HPF: PGI runtime system invokes instrumentation each procedure call each HPF source line –C and Fortran programs: interactively instrumented outer loops function calls Instrumentation maintains statistical summary Summaries correlated across processors Correlated summary input to browser

5/30/00CSE 225 SvPablo overview Architectures: –any system with the PGI HPF compile –any system with F77 or F90 –C applications supported on single processor Unix workstations network of Unix workstations using MPI Intel Paragon Meiko CS2 GUI supports: –Sun (Solaris) –SGI (IRIX)

5/30/00CSE 225 Statistics metrics For procedures : –count –exclusive / inclusive duration –send / receive message duration (HPF only) For lines: –count –duration –exclusive duration –message send and message receive (HPF only) duration count size – event counters (SGI) Mean, STD, Min, Max

5/30/00CSE 225 SvPablo model...

5/30/00CSE 225 New project dialog box

5/30/00CSE 225 HPF performance analysis data flow SvPabloCombine Parallel Architecture SvPablo data capture library Per-process performance files HPF source code instrumented object code Linker instrumented executable PGI HPF compiler performance file Graphical performanc e browser

5/30/00CSE 225 HPF instrumentation pghpf -c -Mprof=lines source1.F pghpf -c -Mprof=lines source2.F pghpf -Mstats -o prog source1.o source2.o /usr/local/SvPablo/lib/pghpf2SDDF.o prog -pghpf -np 8 SvPabloCombine HPF_SDDF*

5/30/00CSE 225 Performance visualization Metrics : count & exclusive duration

5/30/00CSE 225 Performance metric selection dialog

5/30/00CSE 225 C / F77/ F90 data flow Parallel Architecture SvPablo data capture library SvPabloCombine per-process performance files performance file instrumented source code instrumented object code Linker instrumented executable compiler create or edit project visualize performance file Instrument C or Fortran files

5/30/00CSE 225 Interactive instrumentation Instrumentable Constructs (function calls and outer loops)

5/30/00CSE 225 Generating an instrumented executable program mpicc -c file1.Context1.inst.c mpicc -c file2.Context1.inst.c mpicc -c Context1/InstrumentationInit.c mpicc -o instFile InstrumentationInit.o file1.Context1.inst.o file2.Context1.inst.o svPabloLib.a

5/30/00CSE 225 SDDF: a medium of exchange Self-Defining Data Format –data meta-format language for performance data description –specifies both data record structures and data record instances –separates data structure and semantics –allows the definition of records containing scalars and arrays –supported by the Pablo SDDF library

5/30/00CSE 225 SDDF files: classes of records Command: conveys action to be taken Stream Attribute: gives information pertinent to the entire file Record Descriptor: declares record structure Record Data: encapsulates data values

5/30/00CSE 225 Record descriptors Describe record layout Each Record Descriptor contains: A unique tag and record name An optional Record Attribute Field Descriptors, each one containing: an optional Field Attribute field type specifier field name optional field dimension

5/30/00CSE 225 SDDF: record descriptor & data #300: // "description" "PGI Line-Based Profile Record" "PGI Line Profile" { int "Line Number"; int "Processor Number“[]; int "Procedure ID"; int "Count"; double "Inclusive Seconds"; double "Exclusive Seconds"; int "Send Data Count"; int "Send Data Byte"; double "Send Data Seconds"; int "Receive Data Count"; int "Receive Data Byte"; double "Receive Data Seconds"; }; "PGI Line Profile" {359, [2]{7,9}, 4, 399384, 31.071, 31.071, 0, 0, 0, 0, 0, 0};; tag record name field descriptors

5/30/00CSE 225 SvPablo language transparency Meta-format for performance data language defined by line and byte offsets metrics defined by mapping to offsets SDDF records performance mapping information performance measurements Result language independent performance browser mechanism for scalability model integration

5/30/00CSE 225 SvPablo conclusions Versatility – yes analysis GUI is quite versatile, provides the ability to define new modules, but steep learning curve theoretically, any type of view could be constructed from the toolkit provided Portability – not quite Intended for wide range of parallel platforms and programming languages, reality is different – (SUN, SGI) Scalability - some Pablo trace library monitors and dynamically alters the volume, frequency, and types of event data recorded not clear how: automatically or by user at low level? need to integrate predictions

5/30/00CSE 225 Autopilot - a performance steering toolkit Provides flexible infrastructure for real-time adaptive control of parallel and distributed computing resources Department of Computer Science University of Illinois at Urbana-Champaign

5/30/00CSE 225 Autopilot outline Background Autopilot overview Autopilot components Conclusions

5/30/00CSE 225 Autopilot background HPC: from single parallel systems to distributed collections of heterogeneous sequential and parallel systems. emerging applications are irregular –have complex, data dependent execution behavior –dynamic, with time varying resource demands failure to recognize that resource allocation and management must evolve with applications Consequence: small changes in application structure can lead to large changes in observed performance.

5/30/00CSE 225 Autopilot background interactions between application and system resources change –across applications –during a single application's execution Autopilot approach : create adaptable –runtime libraries –resource management policies

5/30/00CSE 225 Autopilot overview After the integration of –dynamic performance instrumentation –on-the-fly performance data reduction –configurable, malleable resource management algorithms – real-time adaptive control mechanism Have adaptive resource management infrastructure Given: –application request patterns –observed system performance Automatically choose & configure resource management algorithms: –increase portability –increase achieved performance

5/30/00CSE 225 Autopilot components 1.Autopilot - implements the core features of the Autopilot system. 2.Fuzzy Library - needed to build the classes supporting the fuzzy logic decision procedure infrastructure 3.Autodriver - provides a graphical user interface (written in Java) 4.Performance Monitor - provides tools to retrieve and record various system performance statistics on a set of machines.

5/30/00CSE 225 1 Autopilot component libAutopilot.a – creation, registration, and use –sensors –actuators (enable and configure resource management policies) –decision procedures AutopilotManager - a utility program which displays the sensors and actuators currently registered with the Autopilot Manager

5/30/00CSE 225 2 Fuzzy library component Fuzzy Rules to C++ translator related classes used by the Autopilot fuzzy logic decision procedure infrastructure.

5/30/00CSE 225 3 Autodriver component Autopilot Adapter program – provides a Java interface to Autopilot (must run on UNIX) JAVA GUI –talks to Autopilot through the Adapter –allows a user to monitor and interact with live sensors and actuators. (runs on any platform that supports Java)

5/30/00CSE 225 4 Performance monitor component two kinds of processes Collectors –run on the machines to be monitored –capture quantitative application and system performance data Recorders –compute performance metrics. –record or output it. communicate via Autopilot component

5/30/00CSE 225 Closed loop adaptive control Knowledge Repository Fuzzy SetsRules Fuzzy Logic Decision Process Fuzzifier D efuzzifier Inputs Outputs System Sensors Actuators Illinois Autopilot Toolkit (Reed et al) Real-time measurement Globus integration

5/30/00CSE 225 Autopilot conclusions Goal is creation of an infrastructure for building resilient, distributed and parallel applications. allow the creation of software that can change its behavior and optimize its performance in response to real-time data –on software dynamics –performance. order of magnitude performance improvements

5/30/00CSE 225 Outline Background Performance measurement SvPablo Autopilot Paradyn

5/30/00CSE 225 Paradyn performance measurement tool for parallel and distributed programs Computer Science, University of Wisconsin

5/30/00CSE 225 Paradyn outline Motivations Approach Performance Consultant Conclusions

5/30/00CSE 225 Paradyn motivations provide a performance measurement tool that scales to long-running programs on large parallel and distributed systems automate much of the search for performance bottlenecks avoid the space and time overhead typically associated with trace-based tools. go beyond post-mortem analysis

5/30/00CSE 225 Paradyn approach Dynamic instrumentation –based on dynamically controlling what performance data is to be collected. –allows data collection instructions to be inserted into an application program during runtime. Paradyn dynamically instruments the application automatically controls the instrumentation in search of performance problems

5/30/00CSE 225 Paradyn model the Paradyn front-end and user interface –display performance visualizations –use the Performance Consultant to find bottlenecks –start and stop the application –monitor the status of the application the Paradyn daemons –monitor and instrument the application processes.

5/30/00CSE 225 Performance consultant module automatically directs the placement of instrumentation has a knowledge base of performance bottlenecks and program structure can associate bottlenecks with specific causes and with specific parts of a program.

5/30/00CSE 225 Paradyn runtime Concepts for performance data analysis/presentation 1.metric-focus grid – cross-product of two vectors –list of performance metrics (CPU time, blocking time…) –list of program components (procedures, processors, disks) –elements of the matrix can be single-valued (e.g., current –value, average, min, or max) or time-histograms 2.time-histogram – fixed-size data structure recording behavior of a metric as it varies over time Performance data granularity –global phase –local phase

5/30/00CSE 225 Performance consultant Wisconsin Paradyn Toolkit (Miller et al) unknowntruefalse

5/30/00CSE 225 Performance consultant Wisconsin Paradyn Toolkit (Miller et al)

5/30/00CSE 225 XPVM Graphical console and monitor for PVM developed at the Oak Ridge National Lab Provides a graphical user interface to the PVM console commands Provides several animated views to monitor the execution of PVM programs

5/30/00CSE 225 XPVM overview Xpvm generates trace records during PVM program execution. The resulting trace file is used to "playback" a program's execution. The xpvm views provide information about the interactions among tasks in a parallel PVM program, to assist in debugging and performance tuning. Xpvm writes a Pablo self-defining trace file

5/30/00CSE 225 XPVM menus Host menu permits to configure a parallel virtual machine by adding/removing hosts Tasks menu enables to spawn, signal, or kill PVM processes, can monitor selected PVM system tasks, such as the group server process

5/30/00CSE 225 XPVM menus Reset menu resets parallel virtual machine, xpvm views, or trace file Help menu provides help features Views permits selection of any of the five xpvm displays for monitoring program execution

5/30/00CSE 225 XPVM menus Trace file play back controls - play, step forward, stop or reset the execution trace file Trace file selection window - displays the name of the current trace file

5/30/00CSE 225 XPVM views (5) Network Displays high-level activity on each node in the virtual machine Each host is represented by an icon image showing host name and architecture Icons are color illuminated to indicate status –Active - at least one task on that host is doing useful work –System - no tasks are doing user work and at least one task is busy executing PVM system routines – No tasks

5/30/00CSE 225 Network

5/30/00CSE 225 Space time Shows status of all tasks as they execute across all hosts Computing - executing useful user computations Overhead - executing PVM system routines for communication, task control, etc. Waiting - waiting for messages from other tasks Message - indicates communications between tasks

5/30/00CSE 225 Space time

5/30/00CSE 225 Utilization Summarizes the Space-Time view at each instant by showing the aggregate number of tasks computing, in overhead or waiting for a message. Shares same horizontal time scale as the Space- Time view Zooming-in Zooming-out

5/30/00CSE 225 Utilization

5/30/00CSE 225 Call trace Displays each tasks' most recent PVM call Changes as program executes Useful for debugging Clicking on a task in the scrolling task list will display that task's full name and TID

5/30/00CSE 225 Call trace

5/30/00CSE 225 Task output Provides a view of output (stdout) generated by tasks in a scrolling window Can be saved to a file at any point

5/30/00CSE 225 Concluding remarks System complexity is rising fast –computational grids –multidisciplinary applications –performance tools There are many open problems –adaptive optimization –performance prediction –compiler/tool integration –performance “quality of service” (QoS)

5/30/00CSE 225 Concluding remarks –the software problems are large & cannot be solve in isolation –open source collaboration –vendors, laboratories, and academics –technology assessment

5/30/00CSE 225 Performance Analysis Tools Nadya Williams Spring, 2000 UCSD.

Similar presentations

Presentation on theme: "5/30/00CSE 225 Performance Analysis Tools Nadya Williams Spring, 2000 UCSD."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

5/30/00CSE 225 Performance Analysis Tools Nadya Williams Spring, 2000 UCSD.

Similar presentations

Presentation on theme: "5/30/00CSE 225 Performance Analysis Tools Nadya Williams Spring, 2000 UCSD."— Presentation transcript:

Similar presentations

About project

Feedback