Measuring Performance Based on slides by Henri Casanova.

Slides:



Advertisements
Similar presentations
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Advertisements

TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Evaluating Performance
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
Gordon Moore Gordon Moore, cofounder of Intel 1965: 2 x trans. per chip/year After 1970: 2 x trans. per chip/1.5year 摩爾定律.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Measuring Performance Chris Clack B261 Systems Architecture.
1/18/02CSE Performance I Measuring Performance Part I.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
2016/5/26\ELEG323-08F\Topic4.ppt1 Topics 4: Performance Measurement Introduction to Computer Systems Engineering (CPEG 323)
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
COD Ch. 1 Introduction + The Role of Performance.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
Measuring Performance II and Logic Design
Computer Organization
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
Lecture 3: MIPS Instruction Set
September 2 Performance Read 3.1 through 3.4 for Tuesday
How do we evaluate computer architectures?
Defining Performance Which airplane has the best performance?
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Defining Performance Section /14/2018 9:52 PM.
Computer Performance He said, to speed things up we need to squeeze the clock.
CMSC 611: Advanced Computer Architecture
Performance of computer systems
Process & its States Lecture 5.
Performance of computer systems
Lecture 3: MIPS Instruction Set
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
CS2100 Computer Organisation
Presentation transcript:

Measuring Performance Based on slides by Henri Casanova

2 Performance as Time Time between the start and the end of an operation  Also called running time, elapsed time, wall-clock time, response time, latency, execution time,...  Most straightforward measure: “my program takes 12.5s on a Pentium 3.5GHz”  Can be normalized to some reference time Must be measured on a “dedicated” machine

3 Performance as Rate Used often so that performance can be independent on the “size” of the application  e.g., compressing a 1MB file takes 1 minute. compressing a 2MB file takes 2 minutes. The performance is the same. Millions of instructions / sec (MIPS)  But “Instructions Set Architectures” are not equivalent 1 CISC instruction = many RISC instructions Different programs use different instruction mixes May be ok for same program on same architectures

4 Performance as Rate Millions of floating point operations /sec: (MFlops)  MFlops is application-specific  Very popular, but can be misleading  e.g., A high MFlops rate in a stupid algorithm could have poor application performance Application-specific:  Millions of frames rendered per second (graphics)frames rendered  Millions of amino-acid compared per second (bio-computing)  Millions of HTTP requests served per second (web) Application-specific metrics are often preferable and others may be misleading  For instance: I want to add two n-element vectors This requires n Floating Point Operations Therefore MFlops is a good measure

5 Measuring Performance Rates How do we measure performance rates? Time a section of code Count how many “items” are done in that section of the code: e.g. floating point operations Compute the rate as the number of items divided by the measured time

6 Measuring Performance Rates: II Example: start_timer(...) for (i=0; i< ; i++) x = y * z + a stop_timer(...)  Number of Mflop= 2 2 million floating point operations: additions, multiplications  Number of MFlops: 2 / time

7 “Peak” Performance? Computer vendors always talk about peak performance rate  computed based on specifications of the machine  For instance: I build a machine with 2 floating point units Each unit can do an operation in 2 cycles My CPU is at 1000 MHz = 1GHz Therefore I have a (1000M cycles/sec ) * (2 floating point operations / 2 cycles) = 1000 Mflops Machine (1GFlops Machine)

8 “Peak” Performance? II Problem:  In real code you will never be able to use the two floating point units constantly (all the time)  Data needs to come from memory and cause the floating point units to be idle Typically, real code achieves often only a small fraction of the peak performance

9 Benchmarks Since many performance metrics turn out to be misleading, people have designed benchmarks These benchmarks are typically a collection of several codes that come from “real-world software” Example: SPEC Benchmark  Integer benchmark  Floating point benchmark The question “what is a good benchmark?” is difficult  If the benchmarks do not correspond to what you’ll do with the computer, then the benchmark results are not relevant to you

10 How About GHz? This is often the way in which people say that a computer is better than another  More instruction per seconds for higher clock rate Faces the same problems as MIPS But usable within a specific architecture ProcessorClock RateSPEC FP2000 Benchmark IBM Power3450 MHz434 Intel PIII1.4 GHz456 Intel P42.4GHz833 Itanium-21.0GHz1356

11 Program Performance In this course we’re not really concerned with determining the performance of a compute platform (whichever way the performance is defined) Instead we’re concerned with improving a program’s performance

12 For a given platform, take a given program, run it and measure its wall-clock time Enhance it, run it again and quantify the performance improvement  i.e., the reduction in wall-clock time For each enhanced program version compute its performance  preferably as a relevant performance rate  so that you can say: the best implementation we have so far goes “this fast” (perhaps a % of the peak performance) Improving a Program’s Performance

13 The UNIX time Command You can put time in front of any UNIX command you invoke When the invoked command completes, time prints out timing (and other) information % time ls /home/casanova/ -la -R 0.520u 1.570s 0: % 0+0k io 0pf+0w  0.520u0.52 seconds of user time  1.570s1.57 seconds of system time  0: seconds of wall-clock time  10.1%10.1% of CPU was used  0+0kmemory used (text + data)  io570 input, 105 output (file system I/O)  0pf+0w0 page faults and 0 swaps

14 User, System, Wall-Clock? User Time: time that the code spends executing user code (i.e., non system calls) System Time: time that the code spends executing system calls Wall-Clock Time: time from start to end Wall-Clock ≥ User + System  in our example: ≥ Why?  because the process can be suspended by the O/S due to contention for the CPU by other processes  because the process can be blocked waiting for I/O

15 Using time Wall-clock - system - user ≈ I/O + suspended  If the system is dedicated, suspended ≈ 0  Therefore one can estimate the cost of I/O  If I/O is really high, one may want to look at reducing I/O or doing I/O better Therefore, time can give us insight into bottlenecks and gives us wall-clock time

16 Drawbacks of UNIX time The time command has poor resolution  “Only” milliseconds  Sometimes we want a higher precision, especially if our performance improvements are in the 1-2% range time times the whole code  Sometimes we’re only interested in timing some part of the code, for instance the one that we are trying to optimize  Sometimes we want to compare the execution time of different sections of the code

17 Dedicated Systems Measuring the performance of a code must be done dedicated system  No other user can start a process  The user measuring the performance only runs the minimum amount of processes Nevertheless, one should always present measurement results as averages over several experiments  Because the (small) load imposed by the O/S is not deterministic In your assignments, always show averages over 10 experiments, or more if asked to do so explicitly