CDA 3101 Fall 2013 Introduction to Computer Organization Benchmarks 30 August 2013
Overview Benchmarks Popular benchmarks –Linpack –Intel’s iCOMP SPEC Benchmarks MIPS Benchmark Fallacies and Pitfalls
Benchmarks Benchmarks measure different aspects of component and system performance Ideal situation: use real workload Types of Benchmarks Risk: adjust design to benchmark requirements –(partial) solution: use real programs and update constantly Engineering or scientific applications Software development tools Transaction processing Office applications Real programs Kernels Toy benchmarks Synthetic benchmarks
A / Benchmark Story 1.You create a benchmark called the vmark 2.Run it on lots of different computers 3.Publish the vmarks in 4.vmark and become popular –Users start buying their PCs based on vmark –Vendors would be banging on your door 5.Vendors examine the vmark code and fix up their compilers and/or microarchitecture to run vmark 6.Your vmark benchmark has been broken 7.Create vmark 2.0
Performance Reports Reproducibility –Include hardware / software configuration (SPEC) –Evaluation process conditions Summarizing performance –Total time: –Arithmetic mean: AM = 1/n * Σ exec time i –Harmonic mean: HM = n / Σ (1/rate i ) –Weighted mean: WM = Σ w i * exec time i –Geometric mean: GM = (Π exec time ratio i ) 1/n GM (X i ) X i GM (Y i ) Y i = GM
Ex.1: Linpack Benchmark “Mother of all benchmarks” Time to solve a dense systems of linear equations DO I = 1, N DY(I) = DY(I) + DA * DX(I) END DO Metrics –R peak : system peak Gflops –N max : matrix size that gives the highest Gflops –N 1/2 : matrix size that achieves half the rated R max Gflops –R max : the Gflops achieved for the N max size matrix Used in
Ex.2: Intel’s iCOMP Index 3.0 New version (3.0) reflects: Mix of instructions for existing and emerging software. Increasing use of 3D, multimedia, and Internet software. Benchmarks 2 integer productivity applications (20% each) 3D geometry and lighting calculations (20%) FP engineering and finance programs and games (5%) Multimedia and Internet application (25%. ) Java application (10%) Weighted GM of relative performance –Baseline processor: Pentium II processor at 350MHz
Ex.3: SPEC CPU Benchmarks System Performance Evaluation Corporation Need to update/upgrade benchmarks –Longer run time –Larger problems –Application diversity Rules to run and report –Baseline and optimized –Geometric mean of normalized execution times –Reference machine: Sun Ultra5_10 ( 300-MHz SPARC, 256MB ) CPU2006: latest SPEC CPU benchmark (4 th version) –12 integer and 17 floating point programs Metrics: response time and throughput
Ex.3: SPEC CPU Benchmarks Previous Benchmarks, now retired
Ex.3: SPEC CPU Benchmarks Observe: We will use SPEC 2000 & 2006 CPU benchmarks in this set of notes. Task: However, you are asked to read about SPEC 2006 CPU benchmark suite, described at Result: Compare SPEC 2006 with SPEC 2000 data to answer the extra-credit questions in Homework #2.
SPEC CINT2000 Benchmarks gzip CCompression vpr CFPGA Circuit Placement and Routing gcc CC Programming Language Compiler mcf CCombinatorial Optimization crafty CGame Playing: Chess parser CWord Processing eon C++Computer Visualization perlbmk CPERL Programming Language gap CGroup Theory, Interpreter vortex CObject-oriented Database bzip2 CCompression twolf CPlace and Route Simulator
SPEC CFP2000 Benchmarks wupwise F77Physics / Quantum Chromodynamics swim F77Shallow Water Modeling mgrid F77Multi-grid Solver: 3D Potential Field applu F77Parabolic / Elliptic Partial Differential Equations mesa C3-D Graphics Library galgel F90Computational Fluid Dynamics art CImage Recognition / Neural Networks equake CSeismic Wave Propagation Simulation facerec F90Image Processing: Face Recognition ammp CComputational Chemistry lucas F90Number Theory / Primality Testing fma3d F90Finite-element Crash Simulation sixtrack F77High Energy Nuclear Physics Accelerator Design apsi F77Meteorology: Pollutant Distribution
SPECINT2000 Metrics SPECint2000: The geometric mean of 12 normalized ratios (one for each integer benchmark) when each benchmark is compiled with "aggressive" optimization SPECint_base2000: The geometric mean of 12 normalized ratios when compiled with "conservative" optimization SPECint_rate2000: The geometric mean of 12 normalized throughput ratios when compiled with "aggressive" optimization SPECint_rate_base2000: The geometric mean of 12 normalized throughput ratios when compiled with "conservative" optimization
SPECint_base2000 Results Alpha/Tru MHz Mips/IRIX 400MHz Intel/NT MHz
SPECfp_base2000 Results Alpha/Tru MHz Mips/IRIX 400MHz Intel/NT MHz
Effect of CPI: SPECint95 Ratings Microarchitecture improvements CPU time = IC * CPI * clock cycle
Effect of CPI: SPECfp95 Ratings Microarchitecture improvements
SPEC Recommended Readings SPEC 2006 – Survey of Benchmark Programs SPEC 2006 Benchmarks - Journal Articles on Implementation Techniques and Problems SPEC 2006 Installation, Build, and Runtime Issues
Another Benchmark: MIPS Millions of Instructions Per Second MIPS = IC / (CPUtime * 10 6 ) Comparing apples to oranges Flaw: 1 MIPS on one processor does not accomplish the same work as 1 MIPS on another –It is like determining the winner of a foot race by counting who used fewer steps –Some processors do FP in software (e.g. 1FP = 100 INT) –Different instructions take different amounts of time Useful for comparisons between 2 processors from the same vendor that support the same ISA with the same compiler (e.g. Intel’s iCOMP benchmark)
Fallacies and Pitfalls Ignoring Amdahl’s law Using clock rate or MIPS as a performance metric Using the Arithmetic Mean of normalized CPU times (ratios) instead of the Geometric Mean Using hardware-independent metrics –Using code size as a measure of speed Synthetic benchmarks predict performance –They do not reflect the behavior of real programs The geometric mean of CPU times ratios is proportional to the total execution time [NOT!!]
Conclusions Performance is specific to a particular program/s CPU time: only adequate measure of performance For a given ISA performance increases come from: –increases in clock rate (without adverse CPI affects) –improvements in processor organization that lower CPI –compiler enhancements that lower CPI and/or IC Your workload: the ideal benchmark You should not always believe everything you read!
Happy & Safe Holiday Weekend