Performance Analysis and Debugging Tools Performance analysis and debugging intimately connected since they both involve monitoring of the software execution.

Slides:



Advertisements
Similar presentations
Code Tuning Strategies and Techniques
Advertisements

Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Tools for applications improvement George Bosilca.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
1 Lecture 6 Performance Measurement and Improvement.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
MPI Program Performance. Introduction Defining the performance of a parallel program is more complex than simply optimizing its execution time. This is.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
1 CHAPTER 4 LANGUAGE/SOFTWARE Hardware Hardware is the machine itself and its various individual equipment. It includes all mechanical, electronic.
1 1 Profiling & Optimization David Geldreich (DREAM)
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Spring 2014 SILICON VALLEY UNIVERSITY CONFIDENTIAL 1 Introduction to Embedded Systems Dr. Jerry Shiao, Silicon Valley University.
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Compilers, Interpreters and Debuggers Ruibin Bai (Room AB326) Division of Computer Science.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
1 Computing Software. Programming Style Programs that are not documented internally, while they may do what is requested, can be difficult to understand.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
CS 584. Performance Analysis Remember: In measuring, we change what we are measuring. 3 Basic Steps Data Collection Data Transformation Data Visualization.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Performance.
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Debugging and Profiling With some help from Software Carpentry resources.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Operating System Principles And Multitasking
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Introduction to ECE 454 Computer Systems Programming Topics: Lecture topics and assignments Profiling rudiments Lab schedule and rationale Cristiana Amza.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Programming Fundamentals Lecture No. 2. Course Objectives Objectives of this course are three fold 1. To appreciate the need for a programming language.
Lecture 2a: Performance Measurement. Goals of Performance Analysis The goal of performance analysis is to provide quantitative information about the performance.
The Process From bare bones to finished product. The Steps Programming Debugging Performance Tuning Optimization.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Chapter – 8 Software Tools.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Lecture #20: Profiling NetBeans Profiler 6.0.
Parallel Computing Presented by Justin Reschke
Introduction to HPC Debugging with Allinea DDT Nick Forrington
If you have a transaction processing system, John Meisenbacher
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Performance Analysis Tools
Code Optimization.
14 Compilers, Interpreters and Debuggers
Lecture Topics: 11/1 Processes Process Management
Green Software Engineering Prof
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Processes Hank Levy 1.
Min Heap Update E.g. remove smallest item 1. Pop off top (smallest) 3
Andy Wang CIS Computer Systems Performance Analysis
Processes Hank Levy 1.
Performance.
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Performance Analysis and Debugging Tools Performance analysis and debugging intimately connected since they both involve monitoring of the software execution. Just different goals:  Debugging -- achieve correct code behaviour  Performance analysis -- achieve performance criteria Both are prey to the pragmatic pitfalls of experimental science:  They must measure but not unduly perturb the measured system (e.g. debugging threads can perturb the scheduling of events; saved counters can shit memory alignment)  However, data collected must be sufficiently detailed to capture the phenomenon of interest. Part of the problem is that the application developer is “once-removed” from the REAL code: i.e. developer sees the machine hardware architecture through the lens of the programming model. Compilers may dramatically change the programming model code to fit the architecture. Developers can only fiddle with the model. This is particularly noticeable with the higher level programming systems (OpenMP vs. MPI) Therefore, runtime measures from data collection must relate the measurements to the model code to be useful I.e. the “inverse transformation” must be applied. Tools must be simple and easy to use despite this, or else people won’t take the time to learn them and use them.

Performance Analysis Object is to minimise wall time in general. Hopefully, did performance analysis earlier to remove major problems and therefore just looking for:  Hotspots -- places where the code spends a lot of time necessarily -- make these as efficient as possible  Bottlenecks -- places where the code spends a lot of time unnecessarily -- remove these Four basic performance data collection techniques:  Timers : simple command line timing utilities, Fortran/C timing utilities  Profilers record the amount of time spent in different parts of a program. Profiles typically are gathered automatically.  Counters record either frequencies of events or cumulative times. The insertion of counters may require some programmer intervention (extrinsic or intrinsic)  Event tracers record each occurrence of various specified events, thus typically producing a large amount of data. Tracers can be produced either automatically or with programmer intervention (xtrinsic and intrinsic) Then often require data transformation -- extract useful information from vast amount of data collected. And then data visualisation to interpret the distilled results. Generally, an iterative process!

Performance Analysis When selecting a tool for a particular task, the following issues should be considered: 1. Accuracy. In general, performance data obtained using sampling techniques are less accurate than data obtained by using counters or timers. In the case of timers, the accuracy of the clock must be taken into account (low clock accuracy, high clock overhead). 2. Simplicity. The best tools in many circumstances are those that collect data automatically, with little or no programmer intervention, and that provide convenient analysis capabilities. 3. Intrusiveness. Unless a computer provides hardware support, performance data collection inevitably introduces some overhead. We need to be aware of this overhead and account for it when analyzing data. Not just time shift: e.g. setting breakpoint in MPI can change the order in which receives gets sent mssages. “Uncertainty principle”.

Performance Analysis # The most important goal of performance tuning is to reduce a program's wall clock execution time. Reducing resource usage in other areas, such as memory or disk requirements, may also be a tuning goal. # Performance tuning is an iterative process used to optimize the efficiency of a program. It usually involves finding your program's hot spots and eliminating the bottlenecks in them. * Hot Spot: An area of code within the program that uses a disproportionately high amount of processor time. * Bottleneck: An area of code within the program that uses processor resources inefficiently and therefore causes unnecessary delays. # Performance tuning usually involves profiling - using software tools to measure a program's run- time characteristics and resource utilization. # Use profiling tools and techniques to learn which areas of your code offer the greatest potential performance increase BEFORE you start the tuning process. Then, target the most time consuming and frequently executed portions of a program for optimization.

Performance Analysis # Consider optimizing your underlying algorithm: an extremely fine-tuned O(N * N) sorting algorithm may perform significantly worse than a untuned O(N log N) algorithm. # Finally, know when to stop - there are diminishing returns in successive optimizations. Consider a program with the following breakdown of execution time percentages for the associated parts of the program: A 20% increase in the performance of procedure3() results in a 10% performance increase overall. A 20% increase in the performance of main() results in only a 2.6% performance increase overall. Procedure% CPU Time main()13% procedure1()17% procedure2()20% procedure3()50%

Timers Shell: time timex mon (mon -top, mon -smp), ps, netstat C/C++: gettimeofday read_real_time Fortran: rtc irtc etime MPI: MPI_WTIME

Standard Unix Profilers Prof Basically, percentage of CPU used by each procedure Compile with the “-p” option Produces files “mon.out.*” Type “prof -m mon.out.*” (-m = merge) or “prof mon.out.n” for individual stats on each processor Gprof Same thing except profiles according to call graphs i.e. it gives percentages for procedure and all child procedures -- timing includes timing for embedded calls further down the call tree. Compile with “-pg” Produces files “gmon.out.*” Type “gprof -m mon.out.*” or “gprof mon.out.n” Xprofiler (not standard? IBM only?) Xwindows graphical processor Compile and link with “-g -pg” Produces gmon files Type “xprofiler myprog gmon.out.*” (uncluster functions, hide libraries, zoom in)

Counters Lot of vendors provide “on-chip” hardware counters these days and these can be accessed and read. e.g. Cray/IBM Power series -- Hardware performance monitor HPM: Gives clock cycles, instructions completed, load/store cache misses … Run executable with “hpmcount -o outputfile myprog” Examine results with “perf outfile”

Commercial/vendor tracing tools Vampir -- event-trace visualisation using MPI traces (regular MPI) Jumpshot -- event-trace visualisation using MPI traces (MPICH/MPE) MPI_trace Paragraph Paraver/Dimemas -- MPI/OpenMP, Linux Pablo -- Instrumentation and perf analysis of C, F77, F90 and HPF IBM AIX parallel environment Paradyn Pgprof Peekperf

Debugging Tools Parallel computing means that software bugs may arise from the complex interactions of the large number of parallel software components. These bugs may have high latency -- i.e. they don’t manifest themselves until sometime after they were caused. They may also be subtly dependent on timing conditions that are not easily reproducible.  Debugging can be extremely time consuming and tedious! Some tips for avoiding having to debug:  Implicit none  Comment/indent/etc -- write readable code  Use CVS -- version update system Tools: Print statements still often the best way to go! Breakpoint debuggers: Dbx -- ouch Totalview -- all singing, all dancing commercial software. Pray it is on your system!

Visualisation Tool : VAPOR Visualisation and Analysis Platform for Oceanographic/atmospheric/solar-physics Research John Clyne, Alan Norton: NCAR Scientific Visualization Lab Volume renderer: Assigns a colour value and an opacity value to each value of a scalar variable. Creates a translucent “cloud” of the variable. Much better than isosurfaces. Contours and isosurfaces too Flow visualisation with streamlines, streaklines, magnetic field lines. Animation for movies ** Ability to interact directly with IDL to create new fields etc.