EECE476: Computer Architecture Lecture 12: Evaluating Performance …and the Importance of Benchmarks! Chapter 4.3, 4.4, 4.5 There is more material in this.

Slides:



Advertisements
Similar presentations
11 Measuring performance Kosarev Nikolay MIPT Feb, 2010.
Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
1  1998 Morgan Kaufmann Publishers Chapter 2 Performance Text in blue is by N. Guydosh Updated 1/25/04*
Evaluating Performance
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
Lecture 7: 9/17/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
EECE476: Computer Architecture Lecture 11: Understanding and Assessing Performance Chapter 4.1, 4.2 The University of British ColumbiaEECE 476© 2005 Guy.
Assessing and Understanding Performance B. Ramamurthy Chapter 4.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Measuring Performance Chris Clack B261 Systems Architecture.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Lecture 2: Computer Performance
Ch4b- 2 EE/CS/CPE Computer Organization  Seattle Pacific University Performance metrics I’m concerned with how long it takes to run my program.
Computer Performance Computer Engineering Department.
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 Measuring and Discussing Computer System Performance or “My computer is faster than your computer” Reading: 2.4, Peer Instruction Lecture Materials.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CSE2021: Computer Organization Instructor: Dr. Amir Asif Department of Computer Science York University Handout # 2: Measuring Performance Topics: 1. Performance:
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
1/17/2016CPEG323-08F\Topic4a1 Topic IV - Cont’d Performance Measurement Introduction to Computer Systems Engineering (CPEG 323)
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Measuring Performance II and Logic Design
Lecture 2: Performance Evaluation
CS161 – Design and Architecture of Computer Systems
4- Performance Analysis of Parallel Programs
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
Topic IV - Cont’d Performance Measurement
CMSC 611: Advanced Computer Architecture
Performance of computer systems
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Performance of computer systems
Computer Organization and Design Chapter 4
Presentation transcript:

EECE476: Computer Architecture Lecture 12: Evaluating Performance …and the Importance of Benchmarks! Chapter 4.3, 4.4, 4.5 There is more material in this lecture than what appears in the textbook. You must know the lecture material as well. The University of British ColumbiaEECE 476© 2005 Guy Lemieux

2 Overview Last week… Define Execution Time, Performance Performance Equation Today… Evaluating Performance –Of one and many programs Benchmarking –SPEC2000 standard suite, others Tomorrow… Summarizing Performance –Distilling it all down to 1 magic number

3 Review: Two Fundamental Performance Concepts 1. Throughput (aka bandwidth) –Total amount of work done in a given time Boeing 747 Laundromat with many washers & dryers Important for computer data centres 2. Response time (aka latency) –Time from start to end of a given task Concorde One fast, modern laundry machine at home Important for personal computers Which is more important for this course? –Mostly response time! –Better response time  usually implies higher throughput (but not  )

4 Evaluating Performance... of one program! (aka latency)

5 MIPS and others… MIPS and its relatives… –Relative MIPS (Vax 11/780 = 1 MIPS) –Dhrystone MIPS –GIPS –MFLOPS, GFLOPS, TFLOPS –MOPS, GOPS What’s wrong with these?

6 Performance Equation (3) Full version: CPUTime =  i (InstrCount i * CPI i ) * CycleTime InstrCount i count of instructions of type i CPI i cycles per instruction of type i

7 Example Same CPU with 3 different instruction types CycleTime = 20ns Program A, Program B InstrType i CPI i i=11 i=22 i=33 Prog ram: InstrCount i i=1i=2i=3 A424 B822

8 Example (cont’d) CPUTime =  i (InstrCount i * CPI i ) * CycleTime Program A (total 10 instructions): = [ (4*1) + (2*2) + (4*3) ] * 20 = [4+4+12]*20 = 400 ns/program InstrType i CPI i i=11 i=22 i=33 Prog ram: InstrCount i i=1i=2i=3 A424 B822

9 Example (cont’d) CPUTime =  i (InstrCount i * CPI i ) * CycleTime Program B (total 12 instructions): = [ (8*1) + (2*2) + (2*3) ] * 20 = [8+4+6]*20 = 360 ns/program InstrType i CPI i i=11 i=22 i=33 Prog ram: InstrCount i i=1i=2i=3 A424 B822

10 Example (cont’d, final) Program A (total 10 instructions): 400ns Program B (total 12 instructions): 360ns Program B is faster! (Intuitively, why should we expect this?) InstrType i CPI i i=11 i=22 i=33 Prog ram: InstrCount i i=1i=2i=3 A424 B822

Evaluating Performance … of many programs! First…. choose the programs!

12 Benchmarks You’re in engineering and you want to buy a computer…. … which one should you buy? The fastest one of course! But you can’t trust: –MHz/GHz –MIPS –Or even your friend!

13 Benchmarks Important: Choose A Realistic Workload Best solution: Try it before you buy it! –Run your program on the computer –Mix and match your most-frequently used programs Quake 3, MSN, Quartus (help!) –Called a workload –Measure the CPUTime (fast stopwatch?) –Use TOTAL CPUTime as your metric (?) Problem: salesman doesn’t want you to try it! –Find a new salesman!

14 Benchmarks Problem: your programs are not portable! –Different OS, different CPU architectures, … –Find a new OS ? A new program? –Write tiny version of your program to be portable Toy Benchmarks: Sieve of Erastosthenes, Puzzle, Quicksort Synthetic Benchmarks: Dhrystone (int), Whetstone (fp) Computational Kernels: Livermore Loops, Linpack Problem: your program wasn’t tuned for this computer –Spend an eternity tuning it for each one? –Rely upon compiler? Benchmarking is problematic!

15 Benchmarks Compromise solution: Let somebody else try it! –They run their program –You trust them –Who? 3 rd parties, eg: ZDnet? CNET? Tom’s Hardware? –Who? Manufacturer, eg: IBM? Intel? Dell? –Who do you trust? SPEC: System Performance Evaluation Cooperative –Collect and distribute set of programs: a benchmark suite –Benchmark suite represents some typical workload (who’s?) –Founded by industry (Apollo/HP, DEC, MIPS, and Sun) –Note: this is a bit like buying a car from an auto-mechanic… “of course, it runs just fine”

16 SPEC Benchmarks –Measure speed of a system –System = CPU + memory subsystem + compiler + OS –Improve any of these => improved performance –Valuable indicator of system performance! SPEC Rules –Strict data reporting and collecting standards –Rules of gameplay (benchmarks can be abused!) –SPEC is the best we’ve got (so far…)! –Only possible in last years due to portable software (C) SPEC Periodically Updates Benchmarks… –1989, 1992, 1995, 2000, 2004/2005/2006? –Eventually computers get too fast! –Or nature of the workload changes! –Or compilers get too smart!

17 Benchmarks – Compiler Result!

18 SPEC Benchmark Evolution SPEC89 –Originally, called “SPEC” –4 integer programs, 6 floating-point programs –One number: geometric mean of speedup relative to VAX 11/780 –Represents a scientific workload (note – fp bias) SPEC92 –6 integer, 14 floating-point (int, fp results are always separated) –Eliminates matrix300 from SPEC89 –Called CINT92, CFP92 or SPECint92, SPECfp92 –Each number: geometric mean of speedup relative to VAX 11/780 SPEC95 –8 integer, 10 floating-point –Two numbers: SPECint95, SPECfp95, relative to Sun 10/40 SPEC history

19 Modern SPEC Lots of workloads!

20 SPEC CPU2000 Two benchmark sets –12 Integer, 14 Floating-Point Two measurement conditions –Speed (“response time” or latency) SPECint2000, SPECfp2000 –Throughput SPECint_rate2000, SPECfp_rate2000 Why throughput numbers? –Computers with multiple CPUs (or multiple cores) –Computers with “virtual multiple” CPUs (eg, hyperthreading) How to measure throughput? –Run N copies of the benchmark, measure completion time –Convert execution time into a rate

21 SPEC CPU2000 Benchmarks

22 SPECint2000 Results

23 SPEC CPU2000: Base vs Peak “Base” results –same compiler flags used for all programs, “typical user” “Peak” results –choose best compiler flags for each program, “power user” Base, Peak numbers are normalized “percentage” results –Base Machine: Sun ULTRA5-10, 300MHz –Each program “problem size” scaled once Runtime ~1000s-3000s on Base Machine Takes ~40hrs to run full suite! (3 passes, CINT + CFP) –Base machine Performance defined to be “100%”

GHz Pentium 4 Base Ratio = 100*(1300/74.2) = 1752 Peak Ratio = 100*(1300/62.2) = 2090 Geometric Mean

25 SPEC CPU2000 Measurement SPECint2000base score of 1510 –Means “15.10 times faster” than Base Machine –This is an average result (how was average computed?) Fair measurement requirement –Run each program an ODD number of times, report the median execution time –Must not create special compiler flags, eg “-spec_cpu2000” Reporting requirements –Vendor supplies results, SPEC accepts and publishes them –Must report which OS, compiler version, all compiler flags used –Must report complete system configuration

26 Top 20 Computer Systems According to SPEC CPU2000 Data Source: (Sept 27, 2005)

27 Top 20 Computer Systems Source: (Oct, 2004) (Sept 27, 2005) What’s Changed?

28 Coming Soon: New SPEC aka “CPU2005” or “CPU2006”

29 Top 500 Supercomputers Supercomputing –1,000s of processors –Specialized programs Nuclear simulations, weather prediction –Usually: floating-point operations on dense matrices Linpack performance LINPACK GFLOPS !!

30 Other Benchmarks Banking/database transactions – (Transaction Processing Performance Council) Embedded CPUs – (Embedded Microprocessor Benchmark Consortium) Multimedia, Network processors – (MediaBench, NetBench) (broken link?) Supercomputing (Linpack) – eqn solving of dense matrices) Reconfigurable computing –RAW –VersaBench Some toy benchmarks or misc benchmarks… – –

31 Summary: Evaluating Performance Performance of one program –Execution time, performance equation –Instruction counts, CPI, clock cycle time Performance of many programs –First, choose the benchmark program(s) This is the most critical step !!! –Second, compute execution time for each task/program SPEC has strict rules about fair play… –Third, summarize performance SPEC computes geometric average of normalized performance We’ll investigate the rationale behind these rules tomorrow…