Lecture 2: Performance Measurement

Slides:

Advertisements

Similar presentations

11 Measuring performance Kosarev Nikolay MIPT Feb, 2010.

Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.

CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.

TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.

Performance Analysis of Multiprocessor Architectures

Lecture 2c: Benchmarks. Benchmarking Benchmark is a program that is run on a computer to measure its performance and compare it with other machines Best.

Chapter 4 M. Keshtgary Spring 91 Type of Workloads.

Copyright 2004 David J. Lilja1 Performance metrics What is a performance metric? Characteristics of good metrics Standard processor and system metrics.

Copyright 2004 David J. Lilja1 What Do All of These Means Mean? Indices of central tendency Sample mean Median Mode Other means Arithmetic Harmonic Geometric.

Copyright 2004 David J. Lilja1 Errors in Experimental Measurements Sources of errors Accuracy, precision, resolution A mathematical model of errors Confidence.

Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.

CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.

Chapter 4 Assessing and Understanding Performance Bo Cheng.

CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.

CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.

1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.

Chapter 4 Assessing and Understanding Performance

Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.

Copyright 2004 David J. Lilja1 Measuring Computer Performance: A Practitioner’s Guide David J. Lilja Electrical and Computer Engineering University of.

1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.

CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.

1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.

CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.

Using Standard Industry Benchmarks Chapter 7 CSE807.

CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.

Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.

1 Computer Performance: Metrics, Measurement, & Evaluation.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.

Memory/Storage Architecture Lab Computer Architecture Performance.

Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.

10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.

1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.

1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.

1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

Computer Architecture

1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.

Lecture 2d: Performance Comparison. Quality of Measurement Characteristics of a measurement tool (timer) Accuracy: Absolute difference of a measured value.

Lecture 2a: Performance Measurement. Goals of Performance Analysis The goal of performance analysis is to provide quantitative information about the performance.

1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Performance Performance

1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:

September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!

Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.

Lec2.1 Computer Architecture Chapter 2 The Role of Performance.

L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study

EGRE 426 Computer Organization and Design Chapter 4.

CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.

Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,

CSE 340 Computer Architecture Summer 2016 Understanding Performance.

June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.

Algorithm Complexity is concerned about how fast or slow particular algorithm performs.

Lecture 2: Performance Evaluation

4- Performance Analysis of Parallel Programs

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

September 2 Performance Read 3.1 through 3.4 for Tuesday

ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance

Lecture 2d1: Quality of Measurements

CSCE 212 Chapter 4: Assessing and Understanding Performance

Performance of computer systems

Performance of computer systems

Computer Organization and Design Chapter 4

Presentation transcript:

Lecture 2: Performance Measurement

Performance Evaluation The primary duty of software developers is to create functionally correct programs Performance evaluation is a part of software development for well-performing programs

Performance Analysis Cycle Have an optimization phase just like testing and debugging phase Code Development Functionally complete and correct program Measure Analyze Modify / Tune Complete, correct and well-performing program Usage

Goals of Performance Analysis The goal of performance analysis is to provide quantitative information about the performance of a computer system

Goals of Performance Analysis Compare alternatives When purchasing a new computer system, to provide quantitative information Determine the impact of a feature In designing a new system or upgrading, to provide before-and-after comparison System tuning To find the best parameters that produce the best overall performance Identify relative performance To quantify the performance relative to previous generations Performance debugging To identify the performance problems and correct them Set expectations To determine the expected capabilities of the next generation

Performance Evaluation Performance Evaluation steps: Measurement / Prediction What to measure? How to measure? Modeling for prediction Simulation Analytical Modeling Analysis & Reporting Performance metrics

Performance Measurement Interval Timers Hardware Timers Software Timers Lilja pg 86 7

Performance Measurement Hardware Timers Counter value is read from a memory location Time is calculated as Tc Clock Counter n bits to processor memory bus Lilja pg 86 Time = (x2 - x1) x Tc 8

Performance Measurement Software Timers Interrupt-based When interrupt occurs, interrupt-service routine increments the timer value which is read by a program Time is calculated as Tc Clock Prescaling Counter T’c to processor interrupt input Lilja pg 86 Time = (x2 - x1) x T’c 9

Performance Measurement Timer Rollover Occurs when an n-bit counter undergoes a transition from its maximum value 2n – 1 to zero There is a trade-off between roll over time and accuracy T’c 32-bit 64-bit 10 ns 42 s 5850 years 1 ms 1.2 hour 0.5 million years 49 days 0.5 x 109 years 10

Timers Solution: With 32-bit, the roll over is over 100 years Use 64-bit integer (over half a million year) Timer returns two values: One represents seconds One represents microseconds since the last second With 32-bit, the roll over is over 100 years 11

Performance Measurement Interval Timers T0  Read current time Event being timed (); T1  Read current time Time for the event is: T1-T0 12

Performance Measurement Timer Overhead Initiate read_time Current time is read Event begins Event ends; Initiate read_time Measured time: Tm = T2 + T3 + T4 Desired measurement: Te = Tm – (T2 + T4) = Tm – (T1 + T2) since T1 = T4 Timer overhead: Tovhd = T1 + T2 Te should be 100-1000 times greater than Tovhd . T1 T2 T3 T4 13

Performance Measurement Timer Resolution Resolution is the smallest change that can be detected by an interval timer. nT’c < Te < (n+1)T’c If Tc is large relative to the event being measured, it may be impossible to measure the duration of the event. 14

Performance Measurement Measuring Short Intervals Te < Tc Tc  1 Te Tc  0 Te 15

Performance Measurement Measuring Short Intervals Solution: Repeat measurements n times. Average execution time: T’e = (m x Tc) / n m: number of 1s measured Average execution time: T’e = (Tt / n ) – h Tt : total execution time of n repetitions h: repetition overhead Tc Te Tt 16

Performance Measurement Time Elapsed time / wall-clock time / response time Latency to complete a task, including disk access, memory access, I/O, operating system overhead, and everything (includes time consumed by other programs in a time-sharing system) CPU time The time CPU is computing, not including I/O time or waiting time User time / user CPU time CPU time spent in the program System time / system CPU time CPU time spent in the operating system performing tasks requested by the program 17

Performance Measurement UNIX time command 90.7u 12.9s 2:39 65% Drawbacks: Resolution is in milliseconds Different sections of the code can not be timed User time Elapsed time Percentage of elapsed time System time (90.7 seconds + 12.9 seconds)/(2 minute + 39 seconds) = 65% 18

Timers Timer is a function, subroutine or program that can be used to return the amount of time spent in a section of code. zero = 0.0; t0 = timer(&zero); … < code segment > t1 = timer(&t0); time = t1; t0 = timer(); … < code segment > t1 = timer(); time = t1 – t0; 19

Timers Read Wadleigh, Crawford pg 130-136 for: time, clock, gettimeofday, etc. 20

Measuring Timer Resolution Timers Measuring Timer Resolution main() { . . . zero = 0.0; t0 = timer(&zero); t1 = 0.0; j=0; while (t1 == 0.0) { j++; zero=0.0; foo(j); t1 = timer(&t0); } printf (“It took %d iterations for a nonzero time\n”, j); if (j==1) printf (“timer resolution <= %13.7f seconds\n”, t1); else printf (“timer resolution is %13.7f seconds\n”, t1); foo(n){ . . . i=0; for (j=0; j<n; j++) i++; return(i); 21

Measuring Timer Resolution Timers Measuring Timer Resolution Using clock(): Using times(): Using getrusage(): It took 682 iterations for a nonzero time timer resolution is 0.0200000 seconds It took 720 iterations for a nonzero time timer resolution is 0.0200000 seconds It took 7374 iterations for a nonzero time timer resolution is 0.0002700 seconds 22

Timers Spin Loops For codes that take less time to run than the resolution of the timer First call to a function may require an inordinate amount of time. Therefore the minimum of all times may be desired. main() { . . . zero = 0.0; t2 = 100000.0; for (j=0; j<n; j++) { t0 = timer(&zero); foo(j); t1 = timer(&t0); t2 = min(t2, t1); } t2 = t2 / n; printf (“Minimum time is %13.7f seconds\n”, t2); foo(n){ . . . < code segment > 23

Profilers A profiler automatically insert timing calls into applications to generate calls into applications It is used to identify the portions of the program that consumes the largest fraction of the total execution time. It may also be used to find system-level bottlenecks in a multitasking system. Profilers may alter the timing of a program’s execution 24

Profilers Data collection techniques Information kept Sampling-based This type of profilers use a predefined clock; every multiple of this clock tick the program is interrupted and the state information is recorded. They give the statistical profile of the program behavior. They may miss some important events. Event-based Events are defined (e.g. entry into a subroutine) and data about these events are collected. The collected information shows the exact execution frequencies. It has substantial amount of run-time overhead and memory requirement. Information kept Trace-based: The compiler keeps all information it collects. Reductionist: Only statistical information is collected. 25

Performance Evaluation Performance Evaluation steps: Measurement / Prediction What to measure? How to measure? Modeling for prediction Simulation Analytical Modeling Queuing Theory Analysis & Reporting Performance metrics

Predicting Performance Performance of simple kernels can be predicted to a high degree Theoretical performance and peak performance must be close It is preferred that the measured performance is over 80% of the theoretical peak performance 27

Performance Evaluation Performance Evaluation steps: Measurement / Prediction What to measure? How to measure? Modeling for prediction Simulation Analytical Modeling Queuing Theory Analysis & Reporting Performance metrics

Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size of a parameter Rate: Operations executed per second 29

Performance Mertrics Clock Speed Clock speed/frequency (f): the rate of clock pulses (ex: 1GHz) Cycle time (Tc): time between two clock pulses (Tc = 1/f) Tc 30

Instruction Execution Rate Performance Mertrics Instruction Execution Rate Cycles per Instruction (CPI): is an average depends on the design of micro-architecture (hardwired/microprogrammed, pipelined) Number of instructions: is the number of instructions executed at runtime Depends on instruction set architecture (ISA) compiler CPIi: number of cycles required for instruction i Ii: number of executed instructions of type i CPI = 31

(cycles per instruction) Performance Metrics CPU Performance CPU time of a program (T) = instructions x cycles x time program instruction cycle CPI (cycles per instruction) T = instruction count x CPI x 1 f

Performance Metrics CPU Performance Drawbacks: In modern computers, no program runs without some operating system running on the hardware Comparing performance between machines with different operating systems will be unfair 33

Performance Metrics Execution time Elapsed time / wall-clock time / response time Latency to complete a task, including disk access, memory access, I/O, operating system overhead, and everything (includes time consumed by other programs in a time-sharing system) CPU time The time CPU is computing, not including I/O time or waiting time User time / user CPU time CPU time spent in the program System time / system CPU time CPU time spent in the operating system performing tasks requested by the program 34

Performance Comparison Performance Metrics Performance Comparison Relative performance Performancex = 1 . Execution timeX Performance Ratio = PerformanceX = Execution timeY PerformanceY Execution timeX 35

Performance Metrics Relative Performance If workload consists of more than one program, total execution time may be used. If there are more than one machine to be compared, one of them must be selected as a reference. 36

Performance Metrics Throughput Total amount of work done in a given time Measured in tasks per time unit Can be used for Operating system performance Pipeline performance Multiprocessor performance 37

Performance Metrics MIPS (Million instructions per second) Includes both integer and floating point performance Number of instructions in a program varies between different computers Number of instructions varies between different programs on the same computer MIPS = Instruction count = Clock rate Execution time x 106 CPI x 106 38

Performance Metrics MFLOPS (Million floating point operations per second) Give performance of only floating-point operations Different mixes of integer and floating-point operations may have different execution times: Integer and floating-point units work independently Instruction and data caches provide instruction and data concurrently 39

Performance Metrics Utilization Speciality ratio 1  general purpose Utilization = Busy time . Total time Speciality ratio = Maximum performance . Minimum performance 40

Performance Metrics Asymptotic and Half performance r – asymptotic performance n1/2 – half performance T = r (n + n1/2) r = 1/t n1/2 = t0/t Slope = r-1 2t0 t0 -n1/2 n1/2 41

Performance Metrics Speedup Express how much faster is system 2 than system 1 Calculated directly from execution time Performancex = 1 = 1 Execution timeX TX Speedup2,1 = Performance2 = T1 Performance1 T2 42

Performance Metrics Relative Change It expresses the performance of system 2 relative to system 1 Performancex = 1 = 1 Execution timeX TX Relative change2,1 = Performance2 - Performance1 = T1 - T2 = Speedup2,1 - 1 Performance1 T2 43

Performance Metrics Statistical Analysis Used to compare performance Workload consists of many programs Depends on the nature of the data as well as distribution of the test results 44

Indices of Central Tendency Performance Metrics Indices of Central Tendency Used to summarize multiple measurements Mean Median Mode 45

Performance Metrics Mean (average) Gives equal weight to all measurements Arithmetic mean = S xi , 1 ≤ i ≤ n n Measurement Execution time X1 10 X2 20 X3 15 X4 18 X5 16 Mean 15.8 46

Performance Metrics Median Order all n measurements The middle value is the median. If n is even, median is the mean of the middle 2 values Using Median instead of Mean reduces the skewing effect of the outliers. Measurement Execution time X1 10 X2 20 X3 15 X4 18 X5 16 X6 200 Mean 46.5 Measurement Execution time X1 10 X3 15 X5 16 X4 18 X2 20 X6 200 Median = = 17 47

Performance Metrics Mode Mode is the value that occurs most frequently If all values occur once, there is no mode If there are several samples that all have the same value, there would be several modes Measurement Execution time X1 10 X2 20 X3 36 X4 X5 X6 Mode = 20 48

Mean, Median, Mode Mean Incorporates information from the entire measured values Sensitive to outliers Median and Mode Less sensitive to outliers Do not effectively use all information ex

Performance Metrics Arithmetic mean (average) May be misleading if the data are skewed or scattered Arithmetic mean = S xi , 1 ≤ i ≤ n n MA MB MC Prog1 50 100 500 Prog2 400 800 Prog3 5550 5100 4700 Average 2000 50

Performance Metrics Weighted average weight is the frequency of each program in daily processing Results may change with a different set of execution frequencies Weighted average = ∑ wi . xi 1 ≤ i ≤ n weight MA MB MC Prog1 60% 50 100 500 Prog2 30% 400 800 Prog3 10% 5550 5100 4700 Average 705 810 1010 51

Performance Metrics Geometric mean Results are stated in relation to the performance of a reference machine Geometric mean = (  xi )1/n , 1 ≤ i ≤ n MA Normalized to MA MB (reference) Normalized to MB MC Normalized to MC Prog1 50 2 100 1 500 0.2 Prog2 400 800 Prog3 5550 0.92 5100 4700 1.085 Average 1.54 0.60 Results are consistent no matter which system is chosen as reference 52

Performance Metrics Harmonic mean Used to compare performance results that are expressed as a rate (e.g. operations per second, throughput, etc.) Slowest rates have the greatest influence on the result It identifies areas where performance can be improved Harmonic mean = n , 1 ≤ i ≤ n ∑ 1/xi 53

Performance Metrics Characteristics of a good performance metric If the time values are averaged, then the resulting mean value must be directly proportional to the total time. If the rate values are averaged, then the resulting mean value must be inversely proportional to the total time. 54

Performance Metrics Ex n benchmark programs Ti is the execution time of program i F floating-point operations in each program Mi = F / Ti is the execution rate of program i (MFLOP/s) Arithmetic mean Inappropriate for summarizing rates TA = TA is directly proportional to the total execution time MA is inversely proportional to the total execution time MA = = 55

Performance Metrics Harmonic mean Inappropriate for summarizing execution times Appropriate for summarizing rates TH = TH is not directly proportional to the total execution time MH is inversely proportional to the total execution time MH = = 56

Performance Metrics Ex 57

Performance Metrics Geometric mean Inappropriate for summarizing execution times Inappropriate for summarizing rates TG = TG is not directly proportional to the total execution time MH is not inversely proportional to the total execution time MH = = 58

Performance Metrics Geometric mean Produces a consistent ordering of the systems but it is the wrong ordering M1 M2 M3 Prog1 417 244 134 Prog2 83 70 Prog3 66 153 135 Prog4 39499 33527 66000 Prog5 772 368 369 Geometric mean (Normalized wrt S1) 1.0 0.86 0.84 Geometric mean (Normalized wrt S2) 1.17 0.99 Rank 3 2 1 Total time 40787 34362 66798 Arithmetic mean 8157 6872 13342 59

Performance Metrics Histogram Used to display the distribution of a set of measured values (variability) First find the minimum and maximum values. Then divide the range into b subranges, called cells. 60

Histogram Message size (kbytes) Network A Network B 0 < xi ≤ 5 11 39 5 < xi ≤ 10 27 25 10 < xi ≤ 15 41 18 15 < xi ≤ 20 32 5 20 < xi ≤ 25 21 19 25 < xi ≤ 30 12 42 30 < xi ≤ 35 4 Histogram

Performance Metrics Index of Dispersion Index of dispersion is used to compare the spread of measurements around the mean value Range is the simplest metric for an index of dispersion Range is sensitive to a few extreme values 62

Performance Metrics Index of Dispersion Maximum of the absolute values of the difference of each measurement from the mean It is also sensitive to extreme values 63

Performance Metrics Index of Dispersion Sample variance is the simplest metric for an index of dispersion Requires 2 passes through the data to calculate first x and then s2 Requires 1 pass 64

Performance Metrics Index of Dispersion Standard deviation Coefficient of variance (COV): normalizes standard deviation wrt the mean 65

Performance Evaluation Methods Benchmarking Monitoring Analytical Modeling Queuing Theory 66

Benchmarking Benchmark is a program that is run on a computer to measure its performance and compare it with other machines Best benchmark is the users’ workload – the mixture of programs and operating system commands that users run on a machine.  Not practical Standard benchmarks 67

Benchmarking Types of Benchmarks Synthetic benchmarks Toy benchmarks Microbenchmarks Program Kernels Real Applications 68

Benchmarking Synthetic benchmarks Artificially created benchmark programs that represent the average frequency of operations (instruction mix) of a large set of programs Whetstone benchmark Dhrystone benchmark Rhealstone benchmark 69

Benchmarking Synthetic benchmarks Whetstone benchmark First written in Algol60 in 1972, today Fortran, C/C++, Java versions are available Represents the workload of numerical applications Measures floating point arithmetic performance Unit is Millions of Whetstone instructions per second (MWIPS) Shortcommings: Does not represent constructs in modern languages, such as pointers, etc. Does not consider cache effects 70

Benchmarking Synthetic benchmarks Dhrystone benchmark First written in Ada in1984, today Represents the workload of C version is available Statistics are collected on system software, such as operating system, compilers, editors and a few numerical programs Measures integer and string performance, no floating-point operations Unit is the number of program iteration completions per second Shortcommings: Does not represent real life programs Compiler optimization overstates system performance Small code that may fit in the instruction cache 71

Benchmarking Synthetic benchmarks ∑ wi . (1/ ti) Rhealstone benchmark Multi-tasking real-time systems Factors are: Task switching time Pre-emption time Interrupt latency time Semaphore shuffling time Dead-lock breaking time Datagram throughput time Metric is Rhealstones per second 6 ∑ wi . (1/ ti) i=1 72

Benchmarking Toy benchmarks 10-100 lines of code that the result is known before running the toy program Quick sort Sieve of Eratosthenes Finds prime numbers http://upload.wikimedia.org/wikipedia/commons/8/8c/New_Animation_Sieve_of_Eratosthenes.gif func sieve( var N ) var PrimeArray as array of size N initialize PrimeArray to all true for i from 2 to N for each j from i + 1 to N, where i divides j set PrimeArray( j ) = false 73

Benchmarking Microbenchmarks Small, specially designed programs used to test some specific function of a system (eg. Floating-point execution, I/O subsystem, processor-memory interface, etc.) Provide values for important parameters of a system Characterize the maximum performance if the overall performance is limited by that single component 74

Benchmarking Kernels Key pieces of codes from real applications. LINPACK and BLAS Livermore Loops NAS 75

Benchmarking Kernels LINPACK and BLAS Libraries LINPACK – linear algebra package Measures floating-point computing power Solves system of linear equations Ax=b with Gaussian elimination Metric is MFLOP/s DAXPY - most time consuming routine Used as the measure for TOP500 list BLAS – Basic linear algebra subprograms LINPACK makes use of BLAS library 76

Benchmarking Kernels LINPACK and BLAS Libraries SAXPY – Scalar Alpha X Plus Y Y = a X + Y, where X and Y are vectors, a is a scalar SAXPY for single and DAXPY for double precision Generic implementation: for (int i = m; i < n; i++) { y[i] = a * x[i] + y[i]; } 77

Benchmarking Kernels Livermore Loops Developed at LLNL Originally in Fortran, now also in C 24 numerical application kernels, such as: hydrodynamics fragment, incomplete Cholesky conjugate gradient, inner product, banded linear systems solution, tridiagonal linear systems solution, general linear recurrence equations, first sum, first difference, 2-D particle in a cell, 1-D particle in a cell, Monte Carlo search, location of a first array minimum, etc. Metrics are arithmetic, geometric and harmonic mean of CPU rate 78

Benchmarking Kernels NAS Parallel Benchmarks Developed at NASA Advanced Supercomputing division Paper-and-pencil benchmarks 11 benchmarks, such as: Discrete Poisson equation, Conjugate gradient Fast Fourier Transform Bucket sort Embarrassingly parallel Nonlinear PDE solution Data traffic, etc. 79

Benchmarking Real Applications Programs that are run by many users C compiler Text processing software Frequently used user applications Modified scripts used to measure particular aspects of system performance, such as interactive behavior, multiuser behavior 80

Benchmarking Benchmark Suites Desktop Benchmarks Server Benchmarks SPEC benchmark suite Server Benchmarks TPC Embedded Benchmarks EEMBC 81

Benchmarking SPEC Benchmark Suite Desktop Benchmarks Server Benchmarks CPU-intensive SPEC CPU2000 11 integer (CINT2000) and 14 floating-point (CFP2000) benchmarks Real application programs: C compiler Finite element modeling Fluid dynamics, etc. Graphics intensive SPECviewperf Measures rendering performance using OpenGL SPECapc Pro/Engineer – 3D rendering with solid models Solid/Works – 3D CAD/CAM design tool, CPU-intensive and I/O intensive tests Unigraphics – solid modeling for an aircraft design Server Benchmarks SPECWeb – for web servers SPECSFS – for NFS performance, throughput-oriented 82

Benchmarking TPC Benchmark Suite Server Benchmark Transaction processing (TP) benchmarks Real applications TPC-C: simulates a complex query environment TPC-H: ad hoc decision support TPC-R: business decision support system where users run a standard set of queries TPC-W: business-oriented transactional web server Measures performance in transactions per second. Throughput performance is measured only when response time limit is met. Allows cost-performance comparisons 83

Benchmarking EEMBC Benchmarks for embedded computing systems 34 benchmarks from 5 different application classes: Automotive/industrial Consumer Networking Office automation Telecommunications 84

Benchmarking Strategies Fixed-computation benchmarks Fixed-time benchmarks Variable-computation and variable-time benchmarks 85

Benchmarking Strategies Fixed-computation benchmarks Fixed-time benchmarks Variable-computation and variable-time benchmarks 86

Fixed-Computation benchmarks Benchmarking Fixed-Computation benchmarks W: fixed workload (number of instructions, number of floating-point operations, etc) T: measured execution time R: speed Compare 87

Fixed-Computation benchmarks Benchmarking Fixed-Computation benchmarks Amdahl’s Law 88

Fixed-Time benchmarks Benchmarking Fixed-Time benchmarks On a faster system, a larger workload can be processed in the same amount of time T: fixed execution time W: workload R: speed Compare 89

Fixed-Time benchmarks Benchmarking Fixed-Time benchmarks Scaled Speedup 90

Variable-Computation and Variable-Time benchmarks Benchmarking Variable-Computation and Variable-Time benchmarks In this type of benchmark, quality of the solution is improved. Q: quality of the solution T: execution time Quality improvements per second: 91

Quality of Measurement Characteristics of a measurement tool (timer) Accuracy: Absolute difference of a measured value and the corresponding standard reference value (such as the duration of a second). Precision: Reliability of the measurements made with the tool. Highly precise measurements are tightly clustered around a single value. Resolution: Smallest incremental change that can be detected. Ex: interval between clock ticks 92

Quality of Measurement accuracy precision mean value true value

Quality of Measurement The uncertainties in the measurements are called errors or noise Sources of errors: Accuracy, precision, resolution of the measurement tool Time required to read and store the current time value Time-sharing among multiple programs Processing of interrupts Cache misses, page faults

Quality of Measurement Types of errors: Systematic errors Are the result of some experimental mistake Usually constant across all measurements Ex: temperature may effect clock period Random errors Unpredictable, nondeterministic Effect the precision of measurement Ex: timer resolution ±T , effects measurements with equal probability

Quality of Measurement Experimental measurements follow Gaussian (normal) distribution Ex: x measured value ±E random error Two sources of errors, each having 50% probability Pg 48 Actual value of x is measured half of the time. Error 1 Error 2 Measured value Probability -E x-2E 1/4 +E x x+2E

Confidence Intervals Used to find a range of values that has a given probability of including the actual value. Case 1: number of measurements is large (n≥30) {x1, x2, … xn} - Samples Gaussian distribution m – mean s – standard deviation Confidence interval: [ c1, c2 ] Confidence level: (1-)×100 Pr[ c1 ≤ x ≤ c2 ] = 1- Pr[ x < c1 ] = Pr[ x > c2] = /2

Confidence Intervals Case 1: number of measurements is large (n≥30) Confidence interval: [ c1, c2 ] - Sample mean - Standard deviation is obtained from the precomputed table

Confidence Intervals Case 2: number of measurements is small (n<30) Sample variances s2 can vary significantly. t distribution: - Sample mean - Standard deviation is obtained from the precomputed table

Confidence Intervals Ex: number of measurements is large (n<30) Pg 51 90% confidence interval means that there is a 90% chance that the actual mean is within that interval.

Wider interval  Less precise knowledge about the mean Confidence Intervals 90% c1= 6.5 c2= 9.4 95% c1= 6.1 c2= 9.7 99% c1= 5.3 c2=10.6 Wider interval  Less precise knowledge about the mean

Determining the Number of measurements Needed Confidence Intervals Determining the Number of measurements Needed

Determining the Number of measurements Needed Confidence Intervals Determining the Number of measurements Needed Estimating s: Make small number of measurements. Estimate standard deviation s. Calculate n. Make n measurements.

Confidence Intervals Ex: Pg 53

Confidence Intervals Confidence Intervals for Proportions When we are interested in the number of times events occur. Bimonial distribution: If np≥10 it approximates Gaussian distribution with mean p and variance p(1-p)/n - Total events recorded - Number of times desired outcome occurs is the sample proportion

Confidence Intervals for Proportions Determining the number of measurements needed:

Confidence Intervals Ex: Pg 55

Comparing Alternatives Three different cases: Before-and-after comparison Comparison of non-corresponding (impaired) measurements Comparisons involving proportions 108

Comparing Alternatives Before-and-after comparison Used to determine whether some change made to a system has statistically significant impact on its performance. Find a confidence interval for the mean of the differences of the paired observations If this interval includes 0, then measured differences are not statistically significant. 109

Comparing Alternatives Before-and-after comparison Before measurements: b1, … bn After measurements: a1, … an Differences: d1= a1, - b1 d2= a2, - b2 … - Arithmetic mean - Standard deviation n ≥ 30 110

Comparing Alternatives Before-and-after comparison Ex: pg 65 111

Comparing Alternatives Non-corresponding Measurements There is no direct corresponding between pairs of measurements. First system: n1 measurements, find x1 and s1 Second system: n2 measurements, find x2 and s2 Calculate the difference of means and standard deviation of the difference of means If confidence interval includes 0, then no significant difference 112

Comparing Alternatives Non-corresponding Measurements n1 ≥ 30 and n2 ≥ 30 113

Comparing Alternatives Non-corresponding Measurements n1 < 30 or n2 < 30 114

Comparing Alternatives Non-corresponding Measurements Ex: pg 67 115

Comparing Alternatives Comparing Proportions m1 is the number of times the event occurs in system 1 out of a total of n1 events measured. If m1>10 and m2>10 the it approximates normal distribution with means and variance and 116

Comparing Alternatives Comparing Proportions Confidence intervals where Standard deviation 117

Comparing Alternatives Comparing more than Two Alternatives 118

Comparing Alternatives Comparing more than Two Alternatives 119

Comparing Alternatives Comparing more than Two Alternatives 120

Comparing Alternatives Comparing more than Two Alternatives 121

Timers Roll Over Suppose a timer returns 32-bit integer data and measures microseconds. It rolls over after 232 microseconds (= 1.2 hours) Timers that measure milliseconds and use 32-bit data roll over after 232 milliseconds (= 49 days) There is a trade-off between roll over time and accuracy. 122

Performance Evaluation Performance Evaluation steps: Measurement / Prediction What to measure? How to measure? Modeling for prediction Analysis & Reporting Performance metrics 123