Paradyn. Paradyn Goals Performance measurement tool that –scales to long-running programs on large parallel and distributed systems –automates much of.

Slides:

Advertisements

Similar presentations

Parallel Virtual Machine Rama Vykunta. Introduction n PVM provides a unified frame work for developing parallel programs with the existing infrastructure.

Advertisements

Chapter 6: Process Synchronization

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.

Automated Instrumentation and Monitoring System (AIMS)

MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.

Operating Systems Parallel Systems (Now basic OS knowledge)

Processes CSCI 444/544 Operating Systems Fall 2008.

A Guide to Oracle9i1 Introduction To Forms Builder Chapter 5.

Concurrency CS 510: Programming Languages David Walker.

Process Management. External View of the OS Hardware fork() CreateProcess() CreateThread() close() CloseHandle() sleep() semctl() signal() SetWaitableTimer()

1 Chapter 4 Threads Threads: Resource ownership and execution.

Chapter 6 Implementing Processes, Threads, and Resources.

OllyDbg Debuger.

Instrumentation and Measurement CSci 599 Class Presentation Shreyans Mehta.

Chapter Ten Performance Tuning. Objectives Create a performance baseline Create a performance baseline Understand the performance and monitoring tools.

Network File System (NFS) in AIX System COSC513 Operation Systems Instructor: Prof. Anvari Yuan Ma SID:

Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.

Chapter 8: String Manipulation

Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.

MCTS Guide to Microsoft Windows 7

University of Maryland parseThat: A Robust Arbitrary-Binary Tester for Dyninst Ray Chen.

1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.

Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.

University of Maryland The DPCL Hybrid Project James Waskiewicz.

Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.

Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

Chapter 34 Java Technology for Active Web Documents methods used to provide continuous Web updates to browser – Server push – Active documents.

Paradyn Evaluation Report Adam Leko, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:

VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

CSC 230: C and Software Tools Rudra Dutta Computer Science Department Course Introduction.

Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,

Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.

1 Workshop Topics - Outline Workshop 1 - Introduction Workshop 2 - module instantiation Workshop 3 - Lexical conventions Workshop 4 - Value Logic System.

CS 346 – Chapter 4 Threads –How they differ from processes –Definition, purpose Threads of the same process share: code, data, open files –Types –Support.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

1 Threads Chapter 11 from the book: Inter-process Communications in Linux: The Nooks & Crannies by John Shapley Gray Publisher: Prentice Hall Pub Date:

® IBM Software Group © 2006 IBM Corporation PurifyPlus on Linux / Unix Vinay Kumar H S.

Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,

University of Maryland Using Dyninst to Measure Floating-point Error Mike Lam, Jeff Hollingsworth and Pete Stewart.

© 2001 Barton P. MillerParadyn/Condor Week (12 March 2001, Madison/WI) The Paradyn Port Report Barton P. Miller Computer Sciences Department.

Silberschatz, Galvin and Gagne  Operating System Concepts Process Concept An operating system executes a variety of programs:  Batch system.

Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.

Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Rewriting with Dyninst Madhavi Krishnan and Dan McNulty.

Module 2.0: Threads.

Overview of Previous Lesson(s) Over View 3 Program.

Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.

© 2002 IBM Corporation Confidential | Date | Other Information, if necessary Copyright © 2009 Ericsson, Made available under the Eclipse Public License.

A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.

Dynamic Tuning of Parallel Programs with DynInst Anna Morajko, Tomàs Margalef, Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week, March.

OSSIM Technology Overview Mark Lucas. “Awesome” Open Source Software Image Map (OSSIM)

Chapter 2: The Visual Studio.NET Development Environment Visual Basic.NET Programming: From Problem Analysis to Program Design.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.

Efficient Instrumentation for Code Coverage Testing

Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.

Hands-On Microsoft Windows Server 2008

MCTS Guide to Microsoft Windows 7

CompSci 725 Presentation by Siu Cho Jun, William.

Capriccio – A Thread Model

A configurable binary instrumenter

Multithreading.

Lecture Topics: 11/1 General Operating System Concepts Processes

Threads Chapter 4.

Efficient x86 Instrumentation:

Introduction to Data Structure

CS510 Operating System Foundations

Dynamic Binary Translators and Instrumenters

Presentation transcript:

Paradyn

Paradyn Goals Performance measurement tool that –scales to long-running programs on large parallel and distributed systems –automates much of the search for performance bottlenecks –avoids space and time overhead of trace-based tools

Paradyn Approach Dynamically instrument application Automatically control instrumentation in search of performance problems Look for high level problems (e.g., too much synchronization blocking, I/O blocking, or memory delays) using small amount of instrumentation Once general problem is found, selectively insert more instrumentation to find specific causes

Paradyn Components Front end and user interface that allow user to –display performance visualization –use the Performance Consultant to find bottlenecks –start and stop the application –monitor status of the application Paradyn daemons –monitor and instrument application processes –pvmd, mpid, winntd

Using Paradyn Program preparation: –Future releases will be able to instrument unmodified binary files –Current release 2.0 requires linking applications with Paradyn instrumentation libraries –Static linking is required on IBM AIX platforms –Application must be compiled with -g flag

Paradyn Run-time Analysis Paradyn is designed to either start up application processes and kill them upon exit, or to attach to and detach from running (or stopped) processes. –Attaching to a running process is currently implemented on Solaris. –Paradyn currently does not detach but only kills upon exit.

Metric-Focus Pairs Metric-focus grid based on two vectors –list of performance metrics (e.g., CPU time, blocking time, message rates, I/O rates) –list of program components (e.g., procedures, processes, message channels, barrier instances) Cross product forms matrix from which user selects metric-focus pairs Elements of matrix can be single-valued (e.g., current value, average, min, max) or time-histograms Time-histogram is a fixed size data structure that records behavior of a metric over time

“Where” Axis After loading program, Paradyn adds entries for program resources to Where Axis window –files –procedures –processes –machines

Multiple foci selection on Where Axis

Performance Visualizations Before or while running a program, the user can define performance visualizations in terms of metric-focus pairs –select focus from Where Axis –select metrics from Metrics Dialog Box –select visualization from Start Visualization Menu

Metrics Dialog Box

Start Visualization Menu

Paradyn Phases Contiguous time intervals within an application’s execution Two kinds – global phase starts at beginning of program execution and extends to current time –local phases non-overlapping subintervals of the global phase

Paradyn Phases (cont.) Data collection for new phase occurs at finer granularity than for global phase. Visualizations can show data for either local phase or global phase. Performance Consultant can simultaneously search both local phase and global phase.

Performance Consultant Based on W3 Search Model –“Why” - type of performance problems –“Where” - where in the program these problems occur –“When” - time during execution during which problems occur

Performance Consultant (cont.) Automatically locates potential bottlenecks in your application –Contains definitions of a set of performance problems in terms of hypotheses - e.g., PerfMetricX > Specified Threshold –Continually selects and refines which performance metrics are enabled and for which foci –Reports bottlenecks that exist for significant portion of phase being measured

Why Axis ExcessiveIOBlockingTime TopLevelHypothesis TooManySmallIOOps ExcessiveSyncWaitingTime CPUBound

Why Axis (cont.) CPUBound: Compares CPU time to the tunable constant PC_CPUThreshold ExcessiveSyncTime: Compares total synchronization waiting time to the tunable constant PC_SyncThreshold ExcessiveIOBlockingTime: Compares total I/O waiting time to the tunable constant PC_IOThreshold TooManySmallIOOps: Compares average number of bytes per I/O operation to PC_IOThreshold

Search History Graph DAG with (hypothesis : focus) pairs as nodes Top node represents (TopLevelHypothesis : WholeProgram) Child nodes represent possible refinements Search is expanded dantime a (hypothesis : focus) pair tests true

Search History Graph (cont.) Node status given by color –green background indicates Unknown status –white foreground indicates active test –pink background indicates hypothesis tested false –blue background indicates hypothesis tested true –yellow line represents Why Axis refinement –purple line represents Where Axis refinement

Search History Graph as search begins

Refinement to CPUbound hypothesis

Further refinement of CPUbound hypothesis

Two searches in progress

Final refinement

Tunable Constants PC_CPUThreshold: used for hypothesis CPUBound PC_SyncThreshold: used for hypothesis ExcessiveSyncWaitingTime PC_IOThreshold: used for hypothesis ExcessiveIOBlockingTime MinObservationTime: all tests will be continued for at least this interval of time before any conclusions are drawn. costLimit: determines an upper bound on the total amount of instrumentation that can be active at a given time.

Visualization Modules (visi’s) External processes that use VisiLib RPC interface to access performance data in real time Visi’s provided with Paradyn –time-histogram –bar chart –table –3-d terrain

Time Histogram with Actions and View menus expanded

Barchart Visualization

Table Visualization

3-d Histogram Visualization

Dyninst API Machine-independent interface for runtime program instrumentation Insertion and removal of instrumentation code into and from running processes Process and OS independent specification of instrumentation code C++ library interface Can be used to build debuggers, performance measurement tools, simulators, and computation steering systems

Dyninst API (cont.) Currently supported platforms –SPARC SunOS and Solaris –x86 Solaris and NT –IBM AIX/SP –DEC Alpha Planned for near future –SGI Origin 2000

Dyninst Terminology point - location in a program where instrumentation can be inserted snippet - representation of a bit of executable code to be inserted into a program at a point e.g., To record number of times a procedure is invoked: –point - first instruction in the procedure –snippet - statement to increment a counter

Dyninst Terminology (cont.) thread - thread of execution, which may be a normal process or a lightweight thread image - static representation of a program on disk application - process being modified mutator - program that uses the API to modify the application

Using the dyninst API Declare single object of class Bpatch Identify application process to be modified –appThread = bpatch.createProcess(pathname, argv); –appThread = bpatch.attachProcess(pathname, processId) Define snippet and points where it should be inserted

Dyninst Example Bpatch_image *appImage; Bpatch_Vector(Bpatch_point*) *points; // Open the program image associated with the thread and return a handle to it. appImage = appThread->getImage(); // find and return the entry point to the “InterestingProcedure”. Points = appImage->findProcedurePoint(“InterestingProcedure”, Bpatch_entry); // create a counter variable (but first get a handle to the correct type). Bpatch_variableExpr *intCounter = appThread->malloc(*appImage->findType(“int”)); // create a code block to increment the integer by one. // intCounter = intCounter + 1 // Bpatch_arithExpr addone(Bpatch_assign, *intCounter, Bpath_arithExpr(Bpatch_plus, *intCounter, Bpatch_constExpr(1))); // insert the snippet of code into the application. appThread->insertBlock(addone, *points);

DAIS Dynamic Application Instrumentation System Proposed by Douglas Pase at IBM Platform-independent client-server library for building debugging and performance tools Based on dyninst

DAIS (cont.) Support proposed for –code patches –periodic instrumentation –inferior remote procedure calls (IRPCs) –remote memory reads and writes –dynamic subroutine placement –process control for debugging Planned demo tools –dynamic printf –trace capture for MPI