Performance Evaluation II November 10, 1998 Topics Amdahl’s Law Benchmarking (lying with numbers) 15-213 class23.ppt.

Slides:



Advertisements
Similar presentations
11 Measuring performance Kosarev Nikolay MIPT Feb, 2010.
Advertisements

Power calculation for transistor operation What will cause power consumption to increase? CS2710 Computer Organization1.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
CS 6290 Evaluation & Metrics. Performance Two common measures –Latency (how long to do X) Also called response time and execution time –Throughput (how.
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
Lecture 7: 9/17/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
Computer Performance Evaluation: Cycles Per Instruction (CPI)
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
Chapter 4 Assessing and Understanding Performance
Gordon Moore Gordon Moore, cofounder of Intel 1965: 2 x trans. per chip/year After 1970: 2 x trans. per chip/1.5year 摩爾定律.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 3, 2003 Lecture 2.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Computer Performance Computer Engineering Department.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
BİL 221 Bilgisayar Yapısı Lab. – 1: Benchmarking.
Memory/Storage Architecture Lab Computer Architecture Performance.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Lecture 9: 9/24/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Cost and Performance.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CSE2021: Computer Organization Instructor: Dr. Amir Asif Department of Computer Science York University Handout # 2: Measuring Performance Topics: 1. Performance:
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
EGRE 426 Computer Organization and Design Chapter 4.
Computer Engineering Rabie A. Ramadan Lecture 2. Table of Contents 2 Architecture Development and Styles Performance Measures Amdahl’s Law.
Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4.
ECE/CS 552: Benchmarks, Means and Amdahl’s Law © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi,
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Measuring Performance and Benchmarks Instructor: Dr. Mike Turi Department of Computer Science and Computer Engineering Pacific Lutheran University Lecture.
CpE 442 Introduction to Computer Architecture The Role of Performance
CS203 – Advanced Computer Architecture
Lecture 2: Performance Evaluation
CS161 – Design and Architecture of Computer Systems
September 2 Performance Read 3.1 through 3.4 for Tuesday
Performance Performance The CPU Performance Equation:
CSCE 212 Chapter 4: Assessing and Understanding Performance
Performance of computer systems
Recitation 6: Cache Access Patterns
August 30, 2000 Prof. John Kubiatowicz
Performance of computer systems
Computer Organization and Design Chapter 4
Presentation transcript:

Performance Evaluation II November 10, 1998 Topics Amdahl’s Law Benchmarking (lying with numbers) class23.ppt

CS 213 F’98– 2 – class23.ppt Amdahl’s law You plan to visit a friend in Normandy France and must decide whether it is worth it to take the Concorde SST ($3,100) or a 747 ($1,021) from NY to Paris, assuming it will take 4 hours Pgh to NY and 4 hours Paris to Normandy. time NY->Paristotal trip timespeedup over hours16.5 hours1 SST3.75 hours11.75 hours1.4 Taking the SST (which is 2.2 times faster) speeds up the overall trip by only a factor of 1.4!

CS 213 F’98– 3 – class23.ppt Amdahl’s law (cont) T1T1 T2T2 Old program (unenhanced) T 1 = time that can NOT be enhanced. T 2 = time that can be enhanced. T 2 ’ = time after the enhancement. Old time: T = T 1 + T 2 T 1 ’ = T 1 T 2 ’ <= T 2 New program (enhanced) New time: T’ = T 1 ’ + T 2 ’ Speedup: S overall = T / T’

CS 213 F’98– 4 – class23.ppt Amdahl’s law (cont) Two key parameters: F enhanced = T 2 / T (fraction of original time that can be improved) S enhanced = T 2 / T 2 ’ (speedup of enhanced part) T’ = T 1 ’ + T 2 ’ = T 1 + T 2 ’ = T(1-F enhanced ) + T 2 ’ = T(1-F enhanced ) + (T 2 /S enhanced ) [by def of S enhanced ] = T(1-F enhanced ) + T(F enhanced /S enhanced ) [by def of F enhanced ] = T((1-F enhanced ) + F enhanced /S enhanced ) Amdahl’s Law: S overall = T / T’ = 1/((1-F enhanced ) + F enhanced /S enhanced ) Key idea: Amdahl’s law quantifies the general notion of diminishing returns. It applies to any activity, not just computer programs.

CS 213 F’98– 5 – class23.ppt Amdahl’s law (cont) Trip example: Suppose that for the New York to Paris leg, we now consider the possibility of taking a rocket ship (15 minutes) or a handy rip in the fabric of space-time (0 minutes): time NY->Paristotal trip timespeedup over hours16.5 hours1 SST3.75 hours11.75 hours1.4 rocket0.25 hours8.25 hours2.0 rip0.0 hours8 hours2.1

CS 213 F’98– 6 – class23.ppt Amdahl’s law (cont) Useful corollary to Amdahl’s law: 1 <= S overall <= 1 / (1 - F enhanced ) F enhanced Max S overall Moral: It is hard to speed up a program. Moral++ : It is easy to make premature optimizations.

CS 213 F’98– 7 – class23.ppt Characterizing computer performance Computer buyers want a single number that predicts performance of real applications. Computer makers have resisted measures that would allow meaninful direct comparisons. lack of operating system and language standards difficult to develop portable and realistic applications 1970’s and 1980’s: era of meaningless rates (e.g, MIPS) 1980’s: age of meaningless benchmarks (e.g., Whetstone) 1990’s: dawn of semi-realistic benchmarks (e.g., SPEC CPU95)

CS 213 F’98– 8 – class23.ppt Meaningless rate #1: MHz MHz = millions of clock cycles/sec MHz doesn’t predict running time: T secs = I inst x (C cycles/I inst) x 1/(MHz x 10^6) cycles/sec CPUMHzSystemSPECfp95 time (secs) Pentium Pro180Alder6,440 POWER2 77RS/ ,263

CS 213 F’98– 9 – class23.ppt Meaningless rate #2: peak MIPS MIPS = millions of instructions / second Peak MIPS = MIPS for some optimal instuction stream. Peak MIPS doesn’t predict running time: number of instructions executed don’t predict running time. optimal instruction stream can be meaningless Example: If the instruction stream is a sequence of NOPS, then a 100 MHz Pentium is a 3,200 MIPS machine! Instruction decoder looks at 32 bytes at a time. NOP is a one-byte instruction. Decoder discards NOP’s.

CS 213 F’98– 10 – class23.ppt Meaningless rate #3: peak MFLOPS MFLOPS = millions of floating operations /sec peak MFLOPS = MFLOPS for some optimal instruction stream. MFLOPS doesn’t predict execution time: floating point operations do not predict running time even if the did, the ideal instruction stream is usually unrealistic Measured MFLOPS on Intel i860 (peak MFLOPS = 80): Program1d fftsasumsaxpysdotsgemmsgemvspvma MFLOPS %peak11%4%7%13%8%19%10%

CS 213 F’98– 11 – class23.ppt Benchmarking Goal: Measure a set of programs (benchmarks) that represent the workload of real applications and that predict the running time of those applications. Steps in the benchmarking process: (1) Choose representative benchmark programs. –difficult to find realistic AND portable programs. (2) Choose an individual performance measure (for each benchmark) –time, normalized time, rate? (3) Choose an aggregate performance measure (for all benchmarks) –sum, normalized sum, mean, normalized mean?

CS 213 F’98– 12 – class23.ppt Why Do Benchmarking? How we evaluate differences Different systems and changes to single system Provide a target for system developers Benchmarks should represent large class of important programs Improving benchmark performance should help many programs For better or worse, benchmarks shape a field Good ones accelerate progress –good target for development Bad benchmarks hurt progress –help real programs v. sell machines/papers? –Inventions that help real programs don’t help benchmark "Ounce of honest data is worth more than a pound of marketing hype."

CS 213 F’98– 13 – class23.ppt Benchmark examples (Toy) Benchmarks line e.g.,: sieve, puzzle, quicksort Synthetic Benchmarks attempt to match average frequencies of real workloads e.g., Whetstone, Dhrystone Kernels Time critical excerpts of REAL programs e.g., 8x8 Discrete Cosine Transform (DCT) from JPEG and MPEG compression, sparse matrix vector product from unstructured finite element models.

CS 213 F’98– 14 – class23.ppt Successful Benchmark Suite: SPEC : RISC industry mired in “bench marketing”: “Egads! That is an 8 MIPS machine, but they claim 10 MIPS!” 1988 : EE Times + 5 companies band together to perform Systems Performance Evaluation Committee (SPEC) in 1988 Sun, MIPS, HP, Apollo, DEC Create standard list of programs, inputs, reporting: some real programs, includes OS calls, some I/O Currently SPEC is more than 40 computer companies: Compaq, Cray, DEC, HP, Hitachi, IBM, Intel, Motorola, Netscape, SGI, Sun

CS 213 F’98– 15 – class23.ppt SPEC Benchmarks New incarnations required every three years: SPEC89, SPEC92, SPEC95. –droh’s entry for SPEC98: quake »ground motion modeling of earthquake using unstructured finite elements »irregular access pattern in sparse matrix vector product stresses memory system »still in the running (40 entries left) Causes of benchmark obsolescence: increasing processor speed increasing cache sizes increasing application code size library code dependences aggressive benchmark engineering by vendors

CS 213 F’98– 16 – class23.ppt SPEC95 integer benchmarks benchmarkdescription go plays a game of go m88ksimMotorola 88k chip simulator gccGnu C compiler compressin-memory LZW file compression liLisp interpreter jpegspectral based image compression/decompression perlPerl program that manipulates strings and primes vortexdatabase program

CS 213 F’98– 17 – class23.ppt SPEC95 floating point benchmarks benchmarkdescription tomcatvmesh generation program swim513x513 shallow water finite difference model su2corMonte Carlo simulation hydro2d2D Navier-Stokes solver mgrid3D multigrid solver appluparabolic/elliptic PDE solver turb3dturbulence model apuair pollution model fpppquantum chemistry model wave5electromagnetic particle model

CS 213 F’98– 18 – class23.ppt SPEC CPU performance measures SPECfp = (NT 1 x NT 2 x... x NT n ) 1/n Each NT k is a normalized time: (reference time for benchmark k) / (measured time for benchmark k) reference times are measured on a Sparcstation 10/40 (40 MHz Supersparc with no L2 cache) Problem: SPEC performance measures don’t predict execution time!!! system total timeSPECfp MHz Pentium Pro MHz Pentium Pro

CS 213 F’98– 19 – class23.ppt Lying with means and ratios framessys Asys Bsys C prog secs10 secs40 secs prog secs80 secs20 secs total60 secs90 secs60 secs Total running time is the ultimate performance measure.

CS 213 F’98– 20 – class23.ppt Lying with means and ratios (cont) seconds framessys Asys Bsys C prog prog total normalized to A11.51 normalized to B normalized to C11.51 Normalized total running time is OK too. It tracks with total running time.

CS 213 F’98– 21 – class23.ppt Lying with means and ratios (cont) Arithmetic mean (AM) = (T 1 + T T n ) / n seconds framessys Asys Bsys C prog prog total AM normalized to A11.51 normalized to B normalized to C11.51 Normalized and unormalized arithmetic means predict running time.

CS 213 F’98– 22 – class23.ppt Lying with means and ratios (cont) seconds (normalized to A)(normalized to B) framessys Asys Bsys C prog (1.0) (2.0)10 (0.5) (1.0)40 (2.0) (4.0) prog (1.0) (0.5)80 (2.0) (1.0)20 (0.5) (0.25) total60 (2.0) (2.5)90 (2.5) (2.0)60 (2.5) (4.25) AM30 (1.0) (1.25)45 (1.25) (1.0)30 (1.25) (2.13) Sums of normalized times and arithmetic means of normalized times do NOT predict running time!!!

CS 213 F’98– 23 – class23.ppt Lying with means and ratios (cont) Geometric mean (GM) = (T 1 x T 2 x... x T n ) 1/n seconds (normalized to A)(normalized to B) framessys Asys Bsys C prog (1.0) (2.0)10 (0.5) (1.0)40 (2.0) (4.0) prog (1.0) (0.5)80 (2.0) (1.0)20 (0.5) (0.25) total60 (2.0) (2.5)90 (2.5) (2.0)60 (2.5) (4.25) GM28.3 (1) (1)28.3 (1) (1)28.3 (1) (1) The geometric means are consistent (i.e. independent of the system they are normalized to), but they are consistently wrong!!! This is why the SPECfp95 numbers don’t always predict running time.

CS 213 F’98– 24 – class23.ppt Lying with means and ratios (cont) The harmonic mean (HM) is a measure for rates (and ratios in general) that predicts running time: Suppose rate for each program k is W k /T k, where W k = work for program k T k = running time for program k. Then HM = sum k=1..n (W k ) / sum k=1..n (T k )

CS 213 F’98– 25 – class23.ppt Lying with means and ratios (cont) frames/sec framessys Asys Bsys C prog prog total frames/sec AM GM HM (total time)(60)(90)(60) HM is the only measure for rates (and ratios in general) that predicts running time.

CS 213 F’98– 26 – class23.ppt Alternate formulation of HM If W k = W j =W, for all k and j, and R k = W / T k Then HM = n / sum k=1...n (1/R k )

CS 213 F’98– 27 – class23.ppt Summary 1. Total running time is the true performance measure. 2. A performance metric should track total running time. 3. AM can be used to summarize performance expressed as an unnormalized time. 4. AM should NOT be used to summarize performance expressed as a ratio (i.e. a rate or normalized time) 5. GM should NOT be used for summarizing performance expressed as a time or a rate. 6. HM should be used for summarizing any performance expressed as a ratio. 7. If you want to normalize, compute the aggregate measure first, then normalize.