Download presentation
Presentation is loading. Please wait.
Published byBertina Kelly Modified over 9 years ago
1
1/17/2016CPEG323-08F\Topic4a1 Topic IV - Cont’d Performance Measurement Introduction to Computer Systems Engineering (CPEG 323)
2
1/17/2016CPEG323-08F\Topic4a2 Relative MIPS Relative MIPS = * MIPS reference Where Time reference = execution time of a program on the reference machine Time unrated = execution time of the same program on machine to be rated MIPS reference = agreed-upon MIPS rating of the reference machine Time reference Time unrated
3
1/17/2016CPEG323-08F\Topic4a3 Relative MIPS only tracks execution time for the given program and input. Even when they are identified, it becomes harder to find a reference machine Relative MIPS Cont’d
4
1/17/2016CPEG323-08F\Topic4a4 The question also arises whether the older machine should be run with the newest release of the compiler and operating system, or whether the software should fixed so the reference machine does not get faster over time. Relative MIPS Cont’d
5
1/17/2016CPEG323-08F\Topic4a5 In summary, the advantage of relative MIPS is questionable: Which program and input? Which ref. M to use? (time changes) Which compiler or/and OS to use? (time changes) Which benchmark to use? Patterson: The advantage is small Relative MIPS Cont’d
6
1/17/2016CPEG323-08F\Topic4a6 Peak Rates vs. Sustained Rates The peak rate is the rate which can be attained if every resource involved in the measurement can be used at its maximum rate. For example, a 1 GHz processor can do 1 floating-point addition and 1 floating-point multiplication each clock cycle. Therefore, we can say this processor has a peak rate of 2 GFLOPS. OK for the theory: can we get this in practice?
7
1/17/2016CPEG323-08F\Topic4a7 Limitations on Peak Rates Other resources may not be able to keep up Your program may not be able to use the resources in the manner needed to get the peak performance - Example: your program only does floating-point +, not x The peak rate may not be physically sustainable
8
1/17/2016CPEG323-08F\Topic4a8 Peak Rate Example The i860 (1991, Intel) was advertised as having an 80 MFLOPS peak rate (1 FP add and 1 FP multiply per cycle of a 40 MHz clock). However, when compiling and running various linear algebra programs, experimenters found the actual rate ranged from 15.0 MFLOPS (19% of peak) down to 3.2 MFLOPS (4% of peak)! What’s wrong? Try an example on a new machine today ?
9
1/17/2016CPEG323-08F\Topic4a9 Benchmarks Real programs C, Tex, Spice Kernels: Livermore Loops LINPACK best for isolating performance of individual features of the machine Toy benchmarks: 10 ~ 100 lines Sieve of Erastosthenes Puzzle Quicksort N-Queen Synthesis benchmarks: Try to match average frequency of a large set of programs
10
1/17/2016CPEG323-08F\Topic4a10 Drystone [Weicker84] Whestone: [Currow & Wichmann76] University computer center jobs 12 loops SPEC Benchmarks SPEC 89 SPEC 92 SDPEC 95 SPEC2000 SPEC2005 More Benchmarks
11
1/17/2016CPEG323-08F\Topic4a11 MediaBench CommBench. More Benchmarks
12
1/17/2016CPEG323-08F\Topic4a12 Small Benchmarks and Kernels Early benchmarks used “toy problems” (quicksort, Towers of Hanoi) Other benchmarks took small fragments of code from inside application loops. One early example was the Livermore Loops (21 code fragments)
13
1/17/2016CPEG323-08F\Topic4a13 Small Benchmarks: Pluses and Minuses + Drawn from real applications – seems to be realistic + easy to understand and analyze + Highly portable (can even be converted to other languages) + Emphasizes the “make common case fast” principle - Still too much like MIPS if your app is not like theirs - Not representative if your app is complex and has many different parts
14
1/17/2016CPEG323-08F\Topic4a14 Synthetic Benchmarks A synthetic benchmark attempts to exercise the hardware in a manner which mimics real-world applications, but in a small piece of code. Examples: Whetstone, Dhrystone – Each repeatedly executes a loop which performs a varied mix of instructions and uses the memory in various ways; figure of merit is how many “Whetstones” or “Dhrystones” per second your computer can do.
15
1/17/2016CPEG323-08F\Topic4a15 Synthetic Benchmarks: Pluses and Minuses + Seem to be more realistic than kernels + Still easy to understand and analyze + Still highly portable - Reliance on a single benchmark skews perceptions - Nobody can agree on a single benchmark - Easy to abuse – designers focus on improving that benchmark instead of real apps
16
1/17/2016CPEG323-08F\Topic4a16 Application Benchmarks OK, we admit it; you can’t capture real-world complexity in a few dozen (or hundred) lines of C code! So use some real programs instead. If you’re going to buy a machine, you’re best off trying the apps you will use. But you may not always be able to do this.
17
1/17/2016CPEG323-08F\Topic4a17 Using Real Applications for Benchmarks LINPACK (Linear algebra Package) is used to rank world’s 500 fastest computers ( www.top500.org ) www.top500.org Tom’s hardware ( www.tomshardware.com ) and some other hardware reviewers run a game (such as Quake) and measure the FPS (frames per second) www.tomshardware.com
18
1/17/2016CPEG323-08F\Topic4a18 Application Benchmarks: Pluses + Closer to applications – more realistic + Better at exposing weaknesses and performance bottlenecks in systems
19
1/17/2016CPEG323-08F\Topic4a19 Application Benchmarks: Minuses - Harder to compare different machines unless you use a common standard (e.g., ANSI C) - Difficult to determine why a particular program runs fast or slow, due to complexity - Whose benchmark? (You can always find one benchmark which makes your product look best) - Takes too long to simulate (bit issue for researchers)
20
1/17/2016CPEG323-08F\Topic4a20
21
1/17/2016CPEG323-08F\Topic4a21 Benchmark Suites - Objectives Run a bunch of programs and combine the results. Get everyone to agree to use the same benchmarks Lay down common ground rules Develop a method for reporting and disseminating results Put caveats everywhere and try to educate the people using the results May be targeted toward applications domains (e.t., web servers, transaction processing, multimedia, HOPC)
22
1/17/2016CPEG323-08F\Topic4a22 The SPEC Benchmarks SPEC (Standard Performance Evaluation Corporation) formed to write “standard” benchmark suites with industry acceptance Main releases: SPEC89, SPEC92, SPEC95, SPEC2000, SPEC2005 Divided into integer and FP-intensive apps, e.g., SPECfp95. CINT2000, CFP2000, etc. Recent domains-specific suites (e.g., SPEC HPC2002, SPECweb99) On ECE/CIS machines, type “sysinfo hostname” to see scores
23
1/17/2016CPEG323-08F\Topic4a23 SPEC RATIO: Measuring Latency Results for each individual benchmark of the SPEC benchmark suites, expressed as the ratio of the wall clock time to execute one single copy of the benchmark, compared to a fixed "SPEC reference time", which was chosen as the execution time on a a SUN Ultra 5_10 with a 300 MHz processor. From: P&H: Third Ed., p259
24
1/17/2016CPEG323-08F\Topic4a24 SPEC RATE: Measuring Throughput Several copies of a given SPEC benchmark are executed. The method is particularly suitable for multiprocessor systems. The results, called SPEC rate, express how many jobs of a particular type (characterised by the individual benchmark) can be executed in a given time (The SPEC reference time happens to be a week, the execution times are normalized with respect to a VAX 11/780). From: http://www.hyperdictionary.com/
25
1/17/2016CPEG323-08F\Topic4a25 SPEC Ground Rules and Reproducibility Everyone uses the same code – modifying the code not allowed Also use the same data inputs! You must describe the configuration, including the compiler If you report numbers, it must be commercially available Everything compiled the same way with “standard” optimizations - A separate score is allowed for program-specific tuning
26
1/17/2016CPEG323-08F\Topic4a26 The SPEC CINT2000 and CFP2000 ratings for the Intel Pentium III and Pentium IV processors at different clock speed Note: This chart is or the “base case”. More detailed see www.spec.org)
27
1/17/2016CPEG323-08F\Topic4a27 SPEC Examples (Integer Benchmarks) 10 9 8 7 6 5 4 3 2 1 0 50 100 150 200 250 Pentium Pentium Pro (From Patterson and Henness, p. 73: COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED) SPECint
28
1/17/2016CPEG323-08F\Topic4a28 SPEC Examples (FP Benchmarks) 10 9 8 7 6 5 4 3 2 1 0 50 100 150 200 250 Pentium Pentium Pro (From Patterson and Henness, p. 74: COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED) SPECfp
29
1/17/2016CPEG323-08F\Topic4a29 Tuning for SPEC 800 700 600 500 400 300 200 100 0 expresso dodluc epntott matrix300 tomcatv Compiler Enhanced compiler (From Patterson and Henness, p. 68: COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED) SPEC performance ratio gccspiceli fppp NASA7
30
1/17/2016CPEG323-08F\Topic4a30 Summary of Performance Measurement Latency: How long does it take to get a particular task done? Throughput: How many tasks can you perform in a unit of time? Performance Performance Execution time 1 Execution time (Wall clock time) User time System time Other time
31
1/17/2016CPEG323-08F\Topic4a31 Summary of Performance Measurement(Con’t) Clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) CPI Cycles per instruction – smaller is better IPC Instruction per cycle – bigger is better CPU time = Clock rate Instruction count * CPI n i=1 CPU time =( (CPIi * Ii))/clock rate Weighted CPI
32
1/17/2016CPEG323-08F\Topic4a32 Summary of Performance Measurement(Con’t) MIPS (Millions of Instructions Per Second) MOPS (Millions of Operations Per Second) MFLOPS (Millions of Floating-point Operations Per Second) MIPS = = Instruction countClock rate Execution time * 10 6 CPI * 10 6 Benchmarks SPEC ratio and rate
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.