1/17/2016CPEG323-08F\Topic4a1 Topic IV - Cont’d Performance Measurement Introduction to Computer Systems Engineering (CPEG 323)

Slides:



Advertisements
Similar presentations
11 Measuring performance Kosarev Nikolay MIPT Feb, 2010.
Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Lecture 7: 9/17/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Chapter 4 M. Keshtgary Spring 91 Type of Workloads.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
EECE476: Computer Architecture Lecture 12: Evaluating Performance …and the Importance of Benchmarks! Chapter 4.3, 4.4, 4.5 There is more material in this.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
Lecture 3: Computer Performance
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Using Standard Industry Benchmarks Chapter 7 CSE807.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Ch4b- 2 EE/CS/CPE Computer Organization  Seattle Pacific University Performance metrics I’m concerned with how long it takes to run my program.
Computer Performance Computer Engineering Department.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 7-1 Measuring.
BİL 221 Bilgisayar Yapısı Lab. – 1: Benchmarking.
Memory/Storage Architecture Lab Computer Architecture Performance.
2016/5/26\ELEG323-08F\Topic4.ppt1 Topics 4: Performance Measurement Introduction to Computer Systems Engineering (CPEG 323)
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CSE2021: Computer Organization Instructor: Dr. Amir Asif Department of Computer Science York University Handout # 2: Measuring Performance Topics: 1. Performance:
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Measuring Performance II and Logic Design
Lecture 2: Performance Evaluation
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
Topic IV - Cont’d Performance Measurement
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Performance of computer systems
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Computer Organization and Design Chapter 4
Presentation transcript:

1/17/2016CPEG323-08F\Topic4a1 Topic IV - Cont’d Performance Measurement Introduction to Computer Systems Engineering (CPEG 323)

1/17/2016CPEG323-08F\Topic4a2 Relative MIPS Relative MIPS = * MIPS reference Where Time reference = execution time of a program on the reference machine Time unrated = execution time of the same program on machine to be rated MIPS reference = agreed-upon MIPS rating of the reference machine Time reference Time unrated

1/17/2016CPEG323-08F\Topic4a3 Relative MIPS only tracks execution time for the given program and input. Even when they are identified, it becomes harder to find a reference machine Relative MIPS Cont’d

1/17/2016CPEG323-08F\Topic4a4 The question also arises whether the older machine should be run with the newest release of the compiler and operating system, or whether the software should fixed so the reference machine does not get faster over time. Relative MIPS Cont’d

1/17/2016CPEG323-08F\Topic4a5 In summary, the advantage of relative MIPS is questionable: Which program and input? Which ref. M to use? (time changes) Which compiler or/and OS to use? (time changes) Which benchmark to use? Patterson: The advantage is small Relative MIPS Cont’d

1/17/2016CPEG323-08F\Topic4a6 Peak Rates vs. Sustained Rates The peak rate is the rate which can be attained if every resource involved in the measurement can be used at its maximum rate. For example, a 1 GHz processor can do 1 floating-point addition and 1 floating-point multiplication each clock cycle. Therefore, we can say this processor has a peak rate of 2 GFLOPS. OK for the theory: can we get this in practice?

1/17/2016CPEG323-08F\Topic4a7 Limitations on Peak Rates Other resources may not be able to keep up Your program may not be able to use the resources in the manner needed to get the peak performance - Example: your program only does floating-point +, not x The peak rate may not be physically sustainable

1/17/2016CPEG323-08F\Topic4a8 Peak Rate Example The i860 (1991, Intel) was advertised as having an 80 MFLOPS peak rate (1 FP add and 1 FP multiply per cycle of a 40 MHz clock). However, when compiling and running various linear algebra programs, experimenters found the actual rate ranged from 15.0 MFLOPS (19% of peak) down to 3.2 MFLOPS (4% of peak)! What’s wrong? Try an example on a new machine today ?

1/17/2016CPEG323-08F\Topic4a9 Benchmarks Real programs C, Tex, Spice Kernels: Livermore Loops LINPACK best for isolating performance of individual features of the machine Toy benchmarks: 10 ~ 100 lines Sieve of Erastosthenes Puzzle Quicksort N-Queen Synthesis benchmarks: Try to match average frequency of a large set of programs

1/17/2016CPEG323-08F\Topic4a10 Drystone [Weicker84] Whestone: [Currow & Wichmann76] University computer center jobs 12 loops SPEC Benchmarks SPEC 89 SPEC 92 SDPEC 95 SPEC2000 SPEC2005 More Benchmarks

1/17/2016CPEG323-08F\Topic4a11 MediaBench CommBench. More Benchmarks

1/17/2016CPEG323-08F\Topic4a12 Small Benchmarks and Kernels Early benchmarks used “toy problems” (quicksort, Towers of Hanoi) Other benchmarks took small fragments of code from inside application loops. One early example was the Livermore Loops (21 code fragments)

1/17/2016CPEG323-08F\Topic4a13 Small Benchmarks: Pluses and Minuses + Drawn from real applications – seems to be realistic + easy to understand and analyze + Highly portable (can even be converted to other languages) + Emphasizes the “make common case fast” principle - Still too much like MIPS if your app is not like theirs - Not representative if your app is complex and has many different parts

1/17/2016CPEG323-08F\Topic4a14 Synthetic Benchmarks A synthetic benchmark attempts to exercise the hardware in a manner which mimics real-world applications, but in a small piece of code. Examples: Whetstone, Dhrystone – Each repeatedly executes a loop which performs a varied mix of instructions and uses the memory in various ways; figure of merit is how many “Whetstones” or “Dhrystones” per second your computer can do.

1/17/2016CPEG323-08F\Topic4a15 Synthetic Benchmarks: Pluses and Minuses + Seem to be more realistic than kernels + Still easy to understand and analyze + Still highly portable - Reliance on a single benchmark skews perceptions - Nobody can agree on a single benchmark - Easy to abuse – designers focus on improving that benchmark instead of real apps

1/17/2016CPEG323-08F\Topic4a16 Application Benchmarks OK, we admit it; you can’t capture real-world complexity in a few dozen (or hundred) lines of C code! So use some real programs instead. If you’re going to buy a machine, you’re best off trying the apps you will use. But you may not always be able to do this.

1/17/2016CPEG323-08F\Topic4a17 Using Real Applications for Benchmarks LINPACK (Linear algebra Package) is used to rank world’s 500 fastest computers ( ) Tom’s hardware ( ) and some other hardware reviewers run a game (such as Quake) and measure the FPS (frames per second)

1/17/2016CPEG323-08F\Topic4a18 Application Benchmarks: Pluses + Closer to applications – more realistic + Better at exposing weaknesses and performance bottlenecks in systems

1/17/2016CPEG323-08F\Topic4a19 Application Benchmarks: Minuses - Harder to compare different machines unless you use a common standard (e.g., ANSI C) - Difficult to determine why a particular program runs fast or slow, due to complexity - Whose benchmark? (You can always find one benchmark which makes your product look best) - Takes too long to simulate (bit issue for researchers)

1/17/2016CPEG323-08F\Topic4a20

1/17/2016CPEG323-08F\Topic4a21 Benchmark Suites - Objectives Run a bunch of programs and combine the results. Get everyone to agree to use the same benchmarks Lay down common ground rules Develop a method for reporting and disseminating results Put caveats everywhere and try to educate the people using the results May be targeted toward applications domains (e.t., web servers, transaction processing, multimedia, HOPC)

1/17/2016CPEG323-08F\Topic4a22 The SPEC Benchmarks SPEC (Standard Performance Evaluation Corporation) formed to write “standard” benchmark suites with industry acceptance Main releases: SPEC89, SPEC92, SPEC95, SPEC2000, SPEC2005 Divided into integer and FP-intensive apps, e.g., SPECfp95. CINT2000, CFP2000, etc. Recent domains-specific suites (e.g., SPEC HPC2002, SPECweb99) On ECE/CIS machines, type “sysinfo hostname” to see scores

1/17/2016CPEG323-08F\Topic4a23 SPEC RATIO: Measuring Latency Results for each individual benchmark of the SPEC benchmark suites, expressed as the ratio of the wall clock time to execute one single copy of the benchmark, compared to a fixed "SPEC reference time", which was chosen as the execution time on a a SUN Ultra 5_10 with a 300 MHz processor. From: P&H: Third Ed., p259

1/17/2016CPEG323-08F\Topic4a24 SPEC RATE: Measuring Throughput Several copies of a given SPEC benchmark are executed. The method is particularly suitable for multiprocessor systems. The results, called SPEC rate, express how many jobs of a particular type (characterised by the individual benchmark) can be executed in a given time (The SPEC reference time happens to be a week, the execution times are normalized with respect to a VAX 11/780). From:

1/17/2016CPEG323-08F\Topic4a25 SPEC Ground Rules and Reproducibility Everyone uses the same code – modifying the code not allowed Also use the same data inputs! You must describe the configuration, including the compiler If you report numbers, it must be commercially available Everything compiled the same way with “standard” optimizations - A separate score is allowed for program-specific tuning

1/17/2016CPEG323-08F\Topic4a26 The SPEC CINT2000 and CFP2000 ratings for the Intel Pentium III and Pentium IV processors at different clock speed Note: This chart is or the “base case”. More detailed see

1/17/2016CPEG323-08F\Topic4a27 SPEC Examples (Integer Benchmarks) Pentium Pentium Pro (From Patterson and Henness, p. 73: COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED) SPECint

1/17/2016CPEG323-08F\Topic4a28 SPEC Examples (FP Benchmarks) Pentium Pentium Pro (From Patterson and Henness, p. 74: COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED) SPECfp

1/17/2016CPEG323-08F\Topic4a29 Tuning for SPEC expresso dodluc epntott matrix300 tomcatv Compiler Enhanced compiler (From Patterson and Henness, p. 68: COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED) SPEC performance ratio gccspiceli fppp NASA7

1/17/2016CPEG323-08F\Topic4a30 Summary of Performance Measurement Latency: How long does it take to get a particular task done? Throughput: How many tasks can you perform in a unit of time? Performance Performance  Execution time 1 Execution time (Wall clock time) User time System time Other time

1/17/2016CPEG323-08F\Topic4a31 Summary of Performance Measurement(Con’t) Clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) CPI Cycles per instruction – smaller is better IPC Instruction per cycle – bigger is better CPU time = Clock rate Instruction count * CPI n i=1  CPU time =( (CPIi * Ii))/clock rate Weighted CPI

1/17/2016CPEG323-08F\Topic4a32 Summary of Performance Measurement(Con’t) MIPS (Millions of Instructions Per Second) MOPS (Millions of Operations Per Second) MFLOPS (Millions of Floating-point Operations Per Second) MIPS = = Instruction countClock rate Execution time * 10 6 CPI * 10 6 Benchmarks SPEC ratio and rate