Topic IV - Cont’d Performance Measurement

Slides:

Advertisements

Similar presentations

11 Measuring performance Kosarev Nikolay MIPT Feb, 2010.

Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.

TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.

Lecture 7: 9/17/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.

Chapter 4 M. Keshtgary Spring 91 Type of Workloads.

Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.

CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.

EECE476: Computer Architecture Lecture 12: Evaluating Performance …and the Importance of Benchmarks! Chapter 4.3, 4.4, 4.5 There is more material in this.

Chapter 4 Assessing and Understanding Performance Bo Cheng.

CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.

1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.

Chapter 4 Assessing and Understanding Performance

CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.

1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.

Using Standard Industry Benchmarks Chapter 7 CSE807.

CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

1 Computer Performance: Metrics, Measurement, & Evaluation.

Computer Performance Computer Engineering Department.

BİL 221 Bilgisayar Yapısı Lab. – 1: Benchmarking.

2016/5/26\ELEG323-08F\Topic4.ppt1 Topics 4: Performance Measurement Introduction to Computer Systems Engineering (CPEG 323)

Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.

1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.

Computer Architecture

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.

CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.

Performance Performance

1/17/2016CPEG323-08F\Topic4a1 Topic IV - Cont’d Performance Measurement Introduction to Computer Systems Engineering (CPEG 323)

September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!

Lec2.1 Computer Architecture Chapter 2 The Role of Performance.

EGRE 426 Computer Organization and Design Chapter 4.

CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.

Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,

Measuring Performance Based on slides by Henri Casanova.

June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.

BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.

Measuring Performance II and Logic Design

Lecture 2: Performance Evaluation

Computer Architecture & Operations I

CS161 – Design and Architecture of Computer Systems

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

September 2 Performance Read 3.1 through 3.4 for Tuesday

ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance

EE380, Fall 2010 Hank Dietz Chapter 2 EE380, Fall 2010 Hank Dietz

Performance Performance The CPU Performance Equation:

Defining Performance Which airplane has the best performance?

Computer Architecture & Operations I

Prof. Hsien-Hsin Sean Lee

Morgan Kaufmann Publishers

CSCE 212 Chapter 4: Assessing and Understanding Performance

CS2100 Computer Organisation

Computer Performance He said, to speed things up we need to squeeze the clock.

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture

Performance of computer systems

Performance of computer systems

CMSC 611: Advanced Computer Architecture

Performance of computer systems

CMSC 611: Advanced Computer Architecture

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

January 25 Did you get mail from Chun-Fa about assignment grades?

Computer Performance Read Chapter 4

Computer Organization and Design Chapter 4

CS2100 Computer Organisation

Presentation transcript:

Topic IV - Cont’d Performance Measurement Introduction to Computer Systems Engineering (CPEG 323) 11/21/2018 CPEG323-05F\Topic4a

Relative MIPS Time reference Relative MIPS = * MIPS reference Where Time reference = execution time of a program on the reference machine Time unrated = execution time of the same program on machine to be rated MIPS reference = agreed-upon MIPS rating of the reference machine Time unrated 11/21/2018 CPEG323-05F\Topic4a

Relative MIPS Cont’d Relative MIPS only tracks execution time for the given program and input. Even when they are identified, it becomes harder to find a reference machine 11/21/2018 CPEG323-05F\Topic4a

Relative MIPS Cont’d The question also arises whether the older machine should be run with the newest release of the compiler and operating system, or whether the software should fixed so the reference machine does not get faster over time. 11/21/2018 CPEG323-05F\Topic4a

Relative MIPS Cont’d In summary, the advantage of relative MIPS is questionable: Which program and input? Which ref. M to use? (time changes) Which compiler or/and OS to use? (time changes) Which benchmark to use? Patterson: The advantage is small 11/21/2018 CPEG323-05F\Topic4a

Peak Rates vs. Sustained Rates The peak rate is the rate which can be attained if every resource involved in the measurement can be used at its maximum rate. For example, a 1 GHz processor can do 1 floating-point addition and 1 floating-point multiplication each clock cycle. Therefore, we can say this processor has a peak rate of 2 GFLOPS. OK for the theory: can we get this in practice? 11/21/2018 CPEG323-05F\Topic4a

Limitations on Peak Rates Other resources may not be able to keep up Your program may not be able to use the resources in the manner needed to get the peak performance - Example: your program only does floating-point +, not x The peak rate may not be physically sustainable 11/21/2018 CPEG323-05F\Topic4a

Peak Rate Example The i860 (1991, Intel) was advertised as having an 80 MFLOPS peak rate (1 FP add and 1 FP multiply per cycle of a 40 MHz clock). However, when compiling and running various linear algebra programs, experimenters found the actual rate ranged from 15.0 MFLOPS (19% of peak) down to 3.2 MFLOPS (4% of peak)! What’s wrong? 11/21/2018 CPEG323-05F\Topic4a

Benchmarks Real programs Kernels: Synthesis benchmarks: C, Tex, Spice Kernels: Livermore Loops LINPACK best for isolating performance of individual features of the machine Toy benchmarks: 10 ~ 100 lines Sieve of Erastosthenes Puzzle Quicksort N-Queen Synthesis benchmarks: Try to match average frequency of a large set of programs 11/21/2018 CPEG323-05F\Topic4a

More Benchmarks Drystone [Weicker84] Whestone: [Currow & Wichmann76] University computer center jobs 12 loops SPEC Benchmarks SPEC 89 SPEC 92 SDPEC 95 SPEC2000 11/21/2018 CPEG323-05F\Topic4a

More Benchmarks MediaBench CommBench . 11/21/2018 CPEG323-05F\Topic4a

Small Benchmarks and Kernels Early benchmarks used “toy problems” (quicksort, Towers of Hanoi) Other benchmarks took small fragments of code from inside application loops. One early example was the Livermore Loops (21 code fragments) 11/21/2018 CPEG323-05F\Topic4a

Small Benchmarks: Pluses and Minuses + Drawn from real applications – seems to be realistic + easy to understand and analyze + Highly portable (can even be converted to other languages) + Emphasizes the “make common case fast” principle - Still too much like MIPS if your app is not like theirs - Not representative if your app is complex and has many different parts 11/21/2018 CPEG323-05F\Topic4a

Synthetic Benchmarks A synthetic benchmark attempts to exercise the hardware in a manner which mimics real-world applications, but in a small piece of code. Examples: Whetstone, Dhrystone – Each repeatedly executes a loop which performs a varied mix of instructions and uses the memory in various ways; figure of merit is how many “Whetstones” or “Dhrystones” per second your computer can do. 11/21/2018 CPEG323-05F\Topic4a

Synthetic Benchmarks: Pluses and Minuses + Seem to be more realistic than kernels + Still easy to understand and analyze + Still highly portable Reliance on a single benchmark skews perceptions Nobody can agree on a single benchmark Easy to abuse – designers focus on improving that benchmark instead of real apps 11/21/2018 CPEG323-05F\Topic4a

Application Benchmarks OK, we admit it; you can’t capture real-world complexity in a few dozen (or hundred) lines of C code! So use some real programs instead. If you’re going to buy a machine, you’re best off trying the apps you will use. But you may not always be able to do this. 11/21/2018 CPEG323-05F\Topic4a

Using Real Applications for Benchmarks LINPACK (Linear algebra Package) is used to rank world’s 500 fastest computers (www.top500.org) Tom’s hardware (www.tomshardware.com) and some other hardware reviewers run a game (such as Quake) and measure the FPS (frames per second) 11/21/2018 CPEG323-05F\Topic4a

Application Benchmarks: Pluses + Closer to applications – more realistic + Better at exposing weaknesses and performance bottlenecks in systems 11/21/2018 CPEG323-05F\Topic4a

Application Benchmarks: Minuses Harder to compare different machines unless you use a common standard (e.g., ANSI C) Difficult to determine why a particular program runs fast or slow, due to complexity Whose benchmark? (You can always find one benchmark which makes your product look best) Takes too long to simulate (bit issue for researchers) 11/21/2018 CPEG323-05F\Topic4a

11/21/2018 CPEG323-05F\Topic4a

Benchmark Suites - Objectives Run a bunch of programs and combine the results. Get everyone to agree to use the same benchmarks Lay down common ground rules Develop a method for reporting and disseminating results Put caveats everywhere and try to educate the people using the results May be targeted toward applications domains (e.t., web servers, transaction processing, multimedia, HOPC) 11/21/2018 CPEG323-05F\Topic4a

The SPEC Benchmarks SPEC (System Performance Evaluation Cooperative) formed to write “standard” benchmark suites with industry acceptance Main releases: SPEC89, SPEC92, SPEC95, SPEC2000 Divided into integer and FP-intensive apps, e.g., SPECfp95. CINT2000, CFP2000, etc. Recent domains-specific suites (e.g., SPEC HPC2002, SPECweb99) On ECE/CIS machines, type “sysinfo hostname” to see scores 11/21/2018 CPEG323-05F\Topic4a

SPEC RATIO: Measuring Latency Results for each individual benchmark of the SPEC benchmark suites, expressed as the ratio of the wall clock time to execute one single copy of the benchmark, compared to a fixed "SPEC reference time", which was chosen as the execution time on a a SUN Ultra 5_10 with a 300 MHz processor. From: P&H: Third Ed., p259 11/21/2018 CPEG323-05F\Topic4a

SPEC RATE: Measuring Throughput Several copies of a given SPEC benchmark are executed. The method is particularly suitable for multiprocessor systems. The results, called SPEC rate, express how many jobs of a particular type (characterised by the individual benchmark) can be executed in a given time (The SPEC reference time happens to be a week, the execution times are normalized with respect to a VAX 11/780). From: http://www.hyperdictionary.com/ 11/21/2018 CPEG323-05F\Topic4a

SPEC Ground Rules and Reproducibility Everyone uses the same code – modifying the code not allowed Also use the same data inputs! You must describe the configuration, including the compiler If you report numbers, it must be commercially available Everything compiled the same way with “standard” optimizations - A separate score is allowed for program-specific tuning 11/21/2018 CPEG323-05F\Topic4a

The SPEC CINT2000 and CFP2000 ratings for the Intel Pentium III and Pentium IV processors at different clock speed Note: This chart is or the “base case”. More detailed see www.spec.org) 11/21/2018 CPEG323-05F\Topic4a

SPEC Examples (Integer Benchmarks) 10 9 8 7 6 SPECint 5 4 3 2 1 50 100 150 200 250 Pentium Pentium Pro (From Patterson and Henness, p. 73: COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED) 11/21/2018 CPEG323-05F\Topic4a

SPEC Examples (FP Benchmarks) 10 9 8 7 6 SPECfp 5 4 3 2 1 50 100 150 200 250 Pentium Pentium Pro (From Patterson and Henness, p. 74: COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED) 11/21/2018 CPEG323-05F\Topic4a

Tuning for SPEC Compiler Enhanced compiler 800 700 600 500 SPEC performance ratio 400 300 200 100 gcc expresso spice dodluc NASA7 li epntott matrix300 fppp tomcatv Compiler Enhanced compiler (From Patterson and Henness, p. 68: COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED) 11/21/2018 CPEG323-05F\Topic4a

Summary of Performance Measurement Latency: How long does it take to get a particular task done? Throughput: How many tasks can you perform in a unit of time? Performance Execution time 1 Performance  Execution time (Wall clock time) User time System time Other time 11/21/2018 CPEG323-05F\Topic4a

Summary of Performance Measurement(Con’t) Clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) CPI Cycles per instruction – smaller is better IPC Instruction per cycle – bigger is better CPU time = Clock rate Instruction count * CPI Weighted CPI n i=1 S CPU time =( (CPIi * Ii))/clock rate 11/21/2018 CPEG323-05F\Topic4a

Summary of Performance Measurement(Con’t) MIPS (Millions of Instructions Per Second) MOPS (Millions of Operations Per Second) MFLOPS (Millions of Floating-point Operations Per Second) Instruction count Clock rate MIPS = = Execution time * 106 CPI * 106 Benchmarks SPEC ratio and rate 11/21/2018 CPEG323-05F\Topic4a