SE-292 High Performance Computing Profiling and Performance R. Govindarajan

Slides:



Advertisements
Similar presentations
Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
Advertisements

SE-292 High Performance Computing Profiling and Performance R. Govindarajan
CSC 360- Instructor: K. Wu Overview of Operating Systems.
Process management Information maintained by OS for process management  process context  process control block OS virtualization of CPU for each process.
Chapter 2 Machine Language.
Lecture 11: Operating System Services. What is an Operating System? An operating system is an event driven program which acts as an interface between.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
INSE - Lectures 19 & 20 SE for Real-Time & SE for Concurrency  Really these are two topics – but rather tangled together.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
Computer Organization and Architecture 18 th March, 2008.
Yevgeny Petrilin Shay Dan Shadi Ibrahim. GUI : Graphical User Interface DAQ :Data Acquisition Data Acquisition device  a self-powered system that communicated.
CSC 501 Lecture 2: Processes. Von Neumann Model Both program and data reside in memory Execution stages in CPU: Fetch instruction Decode instruction Execute.
1 Lecture 6 Performance Measurement and Improvement.
Concurrency. What is Concurrency Ability to execute two operations at the same time Physical concurrency –multiple processors on the same machine –distributing.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
1 CS 333 Introduction to Operating Systems Class 2 – OS-Related Hardware & Software The Process Concept Jonathan Walpole Computer Science Portland State.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
Midterm Tuesday October 23 Covers Chapters 3 through 6 - Buses, Clocks, Timing, Edge Triggering, Level Triggering - Cache Memory Systems - Internal Memory.
CSCI2413 Lecture 6 Operating Systems Memory Management 2 phones off (please)
1 Tuning PL/SQL procedures using DBMS_PROFILER 20-August 2009 Tim Gorman Evergreen Database Technologies, Inc. Northern California Oracle.
Control Structures - Repetition Chapter 5 2 Chapter Topics Why Is Repetition Needed The Repetition Structure Counter Controlled Loops Sentinel Controlled.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
MCTS Guide to Microsoft Windows 7
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
CSC 501 Lecture 2: Processes. Process Process is a running program a program in execution an “instantiation” of a program Program is a bunch of instructions.
Free Running Counter & Real Time Control
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
Programmer's view on Computer Architecture by Istvan Haller.
Timing and Profiling ECE 454 Computer Systems Programming Topics: Measuring and Profiling Cristiana Amza.
1. Definition and General Structure 2. Small Example 1 3. Simplified Structure 4. Short Additional Examples 5. Full Example 2 6. Common Error The for loop.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
Algorithms and Algorithm Analysis The “fun” stuff.
The Structure of Processes (Chap 6 in the book “The Design of the UNIX Operating System”)
Lecture 8 February 29, Topics Questions about Exercise 4, due Thursday? Object Based Programming (Chapter 8) –Basic Principles –Methods –Fields.
Timer Timer is a device, which counts the input at regular interval (δT) using clock pulses at its input. The counts increment on each pulse and store.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 15 Scheduling Read Ch.
Time Management.  Time management is concerned with OS facilities and services which measure real time, and is essential to the operation of timesharing.
CE Operating Systems Lecture 7 Threads & Introduction to CPU Scheduling.
Adv. UNIX: Profile/151 Advanced UNIX v Objectives –introduce profiling based on execution times and line counts Special Topics in Comp.
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
SNU IDB Lab. Ch4. Performance Measurement © copyright 2006 SNU IDB Lab.
(a) What is the output generated by this program? In fact the output is not uniquely defined, i.e., it is not necessarily the same in each execution. What.
O PERATING S YSTEM. What is an Operating System? An operating system is an event driven program which acts as an interface between a user of a computer,
Lecture 2a: Performance Measurement. Goals of Performance Analysis The goal of performance analysis is to provide quantitative information about the performance.
 In computer programming, a loop is a sequence of instruction s that is continually repeated until a certain condition is reached.  PHP Loops :  In.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Structure and Role of a Processor
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Networked Embedded Systems Pengyu Zhang & Sachin Katti EE107 Spring 2016 Lecture 4 Timers and Interrupts.
Lecturer 5: Process Scheduling Process Scheduling  Criteria & Objectives Types of Scheduling  Long term  Medium term  Short term CPU Scheduling Algorithms.
Mergesort example: Merge as we return from recursive calls Merge Divide 1 element 829.
1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.
Performance Analysis Tools
Chapter 6: CPU Scheduling
Timer and Interrupts.
Agenda Make Utility Command Line Arguments in Unix
Lecture 2d1: Quality of Measurements
Iteration: Beyond the Basic PERFORM
PROCESS MANAGEMENT Information maintained by OS for process management
Lecture 2 Part 3 CPU Scheduling
8253 – PROGRAMMABLE INTERVAL TIMER (PIT). What is a Timer? Timer is a specialized type of device that is used to measure timing intervals. Timers can.
Parameters that affect it How to improve it and by how much
Concurrency: Threads, Address Spaces, and Processes
Presentation transcript:

SE-292 High Performance Computing Profiling and Performance R. Govindarajan

2 Performance Measurement and Tuning  Tools to help you measure the performmance of program  Determining program execution time % time a.out real 0m0.019s user 0m0.014s system 0m0.002s Gives elapse time, user, and system  Tools to identify the important parts of your program for perf. Improvement Concentrate optimization efforts on those parts

3 Amdahl’s Law Which part of the program to optimize? Amdahl’s Law: Speedup is limited by the part of program which does not benefit by the optimization IOW, Sp ≈ 1/s ! Implies concentrate on part of the program where maximum time is spent!

4 Timing Timing: measuring the time spent in specific parts of your program Examples of `parts’: Functions, loops, … Recall: Different kinds of time that can be measured (real/wallclock/elapsed vs virtual/CPU) 1.Decide which time you are interested in measuring at what granularity 2.Find out what mechanisms are available and their granularity of measurement

5 Timing Mechanisms gettimeofday  Real time in seconds and microseconds since 00:00 1/1/1970  Q: Overflow of 32b second value? getrusage times system call High resolution timers  Example: gethrtime

6 Profiling Profiler: tool that helps you identify the `important’ parts of your program to concentrate your optimization efforts Profile: breakup (of execution time) across different parts of the program Can be done by adding statements to your program (instrumentation) -- so that during execution, data is gathered, outputted and possibly processed later Automation: where a profiling tool adds those instructions into your program for you

7 Profiling Mechanisms Levels of Granularity typically supported  Function level  Statement level  Basic block level: A basic block is a sequence of contiguous instructions in a program with a single entry point (the first instruction in the basic block) and a single exit point (the last instruction in the basic block) Two kinds of profile data  execution time  execution counts We will look at examples of profiling mechanisms at the function and basic block level

8 Prof: UNIX Function Level Profiling  Usage % cc –p program.c /generates instrumented a.out % a.out / execution; instrumentation / generates data and mon.out % prof / processing of profile data  Output gives a function by function breakup of execution time  Useful in identifying which functions to concentrate optimization efforts on

9 Output: %TimeSecondsCumSecs#Calls Name _baz _bar _foo … _main _strcpy

10 Prof: How it Works Instrumentation does three things 1. At entry of each function: increment an execution count for that function 2. At program entry: make a call to system call profil to get execution times 3. At program exit: write profile data to output file that can later be processed by prof profil(): execution time profiler  Generates an execution time histogram, execution time in each function

11 Profil: What it does One of the parameters in call to profil is a buffer Used as an array of counters initialized to 0 Array elements are associated with contiguous regions of program text During execution, PC value is sampled (once every clock tick, default: 10 msec); triggered on timer interrupt Corresponding buffer element is incremented Later associated with a function; time weight of 10 msec used to estimate CPU times

12 Using prof From how it works, we understand that  Granularity is at best 10 msec  Generated profile could differ for multiple runs of a program with same input!  Could be completely wrong; observe that there could be a particular function that just happens to be running each time the timer interrupt occurs Some usage guidelines  Run under light load conditions  Run a few times and see if results vary a lot  Note that function execution counts are exact, while execution times are estimates