Timing and Profiling ECE 454 Computer Systems Programming Topics: Measuring and Profiling Cristiana Amza.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing Profiling and Performance R. Govindarajan
Advertisements

Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Evaluating Performance
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
SE-292 High Performance Computing Profiling and Performance R. Govindarajan
Computer Organization and Architecture 18 th March, 2008.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Performance See: P&H 1.4.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
1 Lecture 6 Performance Measurement and Improvement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Embedded Computing From Theory to Practice November 2008 USTC Suzhou.
Computer Architecture Lecture 2 Instruction Set Principles.
Chapter 4 Assessing and Understanding Performance
Performance Improvement
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 1 Profiling & Optimization David Geldreich (DREAM)
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Spring 2014 SILICON VALLEY UNIVERSITY CONFIDENTIAL 1 Introduction to Embedded Systems Dr. Jerry Shiao, Silicon Valley University.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
GNU gcov (1/4) [from Wikipedia] gcov is a source code coverage analysis and statement- by-statement profiling tool. gcov generates exact counts of the.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
Performance.
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Computer Architecture
Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Adv. UNIX: Profile/151 Advanced UNIX v Objectives –introduce profiling based on execution times and line counts Special Topics in Comp.
CET Gannod1 Chapter 1 Fundamentals of Computer Design.
1 Announcements  Homework 4 out today  Dec 7 th is the last day you can turn in Lab 4 and HW4, so plan ahead.
Introduction to ECE 454 Computer Systems Programming Topics: Lecture topics and assignments Profiling rudiments Lab schedule and rationale Cristiana Amza.
CPE 631 Project Presentation Hussein Alzoubi and Rami Alnamneh Reconfiguration of architectural parameters to maximize performance and using software techniques.
Lecture 2a: Performance Measurement. Goals of Performance Analysis The goal of performance analysis is to provide quantitative information about the performance.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Advanced Topics: Prefetching ECE 454 Computer Systems Programming Topics: UG Machine Architecture Memory Hierarchy of Multi-Core Architecture Software.
EGRE 426 Computer Organization and Design Chapter 4.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring.
Confessions of a Performance Monitor Hardware Designer Workshop on Hardware Performance Monitor Design HPCA February 2005 Jim Callister Intel Corporation.
Measuring Performance Based on slides by Henri Casanova.
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Performance. Moore's Law Moore's Law Related Curves.
Two notions of performance
EE380, Fall 2010 Hank Dietz Chapter 2 EE380, Fall 2010 Hank Dietz
CSCE 212 Chapter 4: Assessing and Understanding Performance
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Parameters that affect it How to improve it and by how much
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
Presentation transcript:

Timing and Profiling ECE 454 Computer Systems Programming Topics: Measuring and Profiling Cristiana Amza

– 2 – “It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories instead of theories to suit facts.” - Sherlock Holmes

– 3 – Measuring Programs and Computers

– 4 – Why Measure a Program/Computer? To compare two computers/processors Which one is better/faster? Which one should I buy? To optimize a program Which part of the program should I focus my effort on? To compare program implementations Which one is better/faster? Did my optimization work? To find a bug Why is it running much more slowly than expected?

– 5 – Basic Measurements IPS: instructions per second MIPS: millions of IPS BIPS: billions of IPS FLOPS: floating point operations per second megaFLOPS: 10 6 FLOPS gigaFLOPS: 10 9 FLOPS teraFLOPS: FLOPS petaFLOPS: FLOPS Eg: playstation3 capable of 20 GFLOPS IPC: instructions per processor-cycle CPI: cycles per instruction CPI = 1 / IPC

– 6 – How not to compare processors Clock frequency (MHz)? IPC for the two processors could be radically different Megahertz Myth Started from 1984 Apple II CPU: MOS Technology LD: 2 cycles (2 microseconds) IBM PC CPU: Intel LD: 25 cycles (5.24 microseconds)

– 7 – How not to compare processors Clock frequency (MHz)? IPC for the two processors could be radically differentCPI/IPC? dependent on instruction sets used dependent on efficiency of code generated by compilerFLOPS? only if FLOPS are important for the expected applications also dependent on instruction set used

– 8 – How to measure a processor Use wall-clock time (seconds) time = IC x CPI x ClockPeriod IC = instruction count (total instructions executed) CPI = cycles per instruction ClockPeriod = 1 / ClockFrequency = (1 / MHz)

– 9 – Amdahl’s Law: Optimizing part of a program speedup = OldTime / NewTime Eg., my program used to take 10 minutes now it only takes 5 minutes after optimization speedup = 10min/5min = 2.0 i.e., 2x faster If only optimizing part of a program (on following slide): let f be the fraction of execution time that the optimization applies to (1.0 > f > 0) let s be the improvement factor (speedup of the optimization)

Amdhal’s Law Visualized f 1-f f/s 1-f Optimization OldTime NewTime the best you can do is eliminate f; 1-f remains

– 11 – Amdahl’s Law: Equations let f be the fraction of execution time that the optimization applies to (1.0 > f > 0) let s be the improvement factor NewTime = OldTime x [(1-f) + f/s] speedup = OldTime / (OldTime x [(1-f) + f/s]) speedup = 1 / (1 – f + f/s)

– 12 – Example1: Amdahl’s Law If an optimization makes loops go 3 times faster, and my program spends 70% of its time in loops, how much faster will my program go? speedup = 1 / (1 – f + f/s) = 1 / (1 – /3.0) = 1 / (1 – /3.0) = 1/( ) = 1/( ) = = My program will go times faster.

– 13 – Example2: Amdahl’s Law If an optimization makes loops go 4 times faster, and applying the optimization to my program makes it go twice as fast, what fraction of my program is loops?

– 14 – Implications of Amdahl’s Law Uncommon Common Uncommon Optimization optimize the common case the common case may change!

– 15 – Tools for Measuring and Understanding Software

– 16 – Tools for Measuring/Understanding  Software Timers  C library and OS-level timers  Hardware Timers and Performance Counters  Built into the processor chip  Instrumentation  Decorates your program with code that counts & measures  gprof  gcov GNU: “Gnu is Not Unix” --- Founded by Richard Stallman

– 17 – Software Timers: Command Line Example: /usr/bin/time  Measures the time spent in user code and OS code  Measures entire program (can’t measure a specific function)  Not super-accurate, but good enough for many uses $ time ls user & sys --- CPU time /usr/bin/time gives you more information

– 18 – Software Timers: Library: Example can measure within a program used in HW2 #include // C library functions for time unsigned get_seconds() { struct tms t; times(&t); // fills the struct return t.tms_utime; // user program time // (as opposed to OS time) } … unsigned start_time, end_time, elapsed_time; start_time = get_seconds(); do_work(); // function to measure end_time = get_seconds(); elapsed_time = end_time - start_time;

Hardware: Cycle Timers can be more accurate than library (if used right) used in HW2  Programmer can access on-chip cycle counter  Eg., via the x86 instruction: rdtsc (read time stamp counter)  We use this in hw2:clock.c:line94 to time your solutions  Example use:  start_cycles = get_tsc(); // executes rdtsc  do_work();  end_cycles = get_tsc();  total_cycles = end_cycles – start_cycles;  Can be used to compute #cycles to execute code  Watch out for multi-threaded program!

– 20 – Hardware: Performance Counters Special on-chip event counters  Can be programmed to count low-level architecture events  Eg., cache misses, branch mispredictions, etc. Can be difficult to use Require OS support Counters can overflow Must be sampled carefully Software packages can make them easier to use  Eg: Intel’s VTUNE, perf (recent linux) perf used in HW2

– 21 – Instrumentation  Compiler/tool inserts new code & data-structures  Can count/measure anything visible to software  Eg., instrument every load instruction to also record the load address in a trace file.  Eg., instrument every function to count how many times it is called  “Observer effect”:  can’t measure system without disturbing it  Instrumentation code can slow down execution  Example instrumentors (open/freeware):  Intel’s PIN: general purpose tool for x86  Valgrind: tool for finding bugs and memory leaks  gprof: counting/measuring where time is spent via sampling

– 22 – Instrumentation: Using gprof gprof: how it works Periodically (~ every 10ms) interrupt program Determine what function is currently executing Increment the time counter for that function by interval (e.g., 10ms) Approximates time spent in each function, #calls made Note: interval should be random for rigorous sampling! Usage: compile with “-pg” to enable gcc –O2 –pg prog.c –o prog./prog Executes in normal fashion, but also generates file gmon.out gprof prog Generates profile information based on gmon.out used in HW1

– 23 – Instrumentation: Using gcov Gives profile of execution within a function Eg., how many times each line of C code was executed Can decide which loops are most important Can decide which part of if/else is most important Usage: compile with “-g -fprofile-arcs -ftest-coverage” to enable gcc -g -fprofile-arcs -ftest-coverage file.c –o file.o./prog Executes in normal fashion Also generates file.gcda and file.gcno for each file.o gcov –b progc Generates profile output in file.c.gcov used in HW1