BITS Pilani, Pilani Campus Today’s Agenda Role of Performance
BITS Pilani, Pilani Campus Definition of Performance For some program running on machine X: Performance X = 1 / Execution time X X is n times faster than Y means: Performance X / Performance Y = n
BITS Pilani, Pilani Campus Performance Equation I So, to improve performance one can either: – reduce the number of cycles for a program, or – reduce the clock cycle time, or, equivalently, – increase the clock rate CPU execution time CPU clock cycles Clock cycle time for a program = equivalently
BITS Pilani, Pilani Campus Performance Equation II CPU execution time Instruction count average CPI Clock cycle time for a program Derive the above equation from Performance Equation I = Average CPI: It provides one way to compare two different implementations of same ISA since instruction count for a program remain same It defers with processor design It also depends on the mix of instruction types executed in an application.
BITS Pilani, Pilani Campus CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? A is faster… …by this much
BITS Pilani, Pilani Campus CPI for Instruction Class CPU clock cycles can be computed by looking at different types of instructions and using their individual clock cycle counts. Formula: n CPU clock cycles = (CPI i X C i ) i=1 Where, Ci is count of instructions of class I and CPIi is number of cycles per instruction for that instruction class. 6
BITS Pilani, Pilani Campus CPI Example Alternative compiled code sequences using instructions in classes A, B, C ClassABC CPI for class123 IC in sequence 1212 IC in sequence 2411 Sequence 1: IC = 5 Clock Cycles = 2×1 + 1×2 + 2×3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 4×1 + 1×2 + 1×3 = 9 Avg. CPI = 9/6 = 1.5
BITS Pilani, Pilani Campus Performance Summary Performance depends on – Algorithm: affects IC, possibly CPI – Programming language: affects IC, CPI – Compiler: affects IC, CPI – Instruction set architecture: affects IC, CPI – Clock cycle time and CPI are determined by Implementation of the processor.
BITS Pilani, Pilani Campus Evaluating Performance of a machine One way is to compare the execution time of the workload on two different computers. Another better way is to evaluate a computer using set of benchmark programs specially chosen to measure performance. The best type of programs to use for benchmarks are real applications. For example, typical engineering or scientific applications For software development engineers benchmarks may include programs such as compiler or word processing systems. 9
BITS Pilani, Pilani Campus Benchmarks Benchmark suites – Perfect Club: set of application codes – Livermore Loops: 24 loop kernels – Linpack: linear algebra package – SPEC: mix of code from industry organization
BITS Pilani, Pilani Campus SPEC (System Performance Evaluation Corporation) Sponsored by industry but independent and self-managed – trusted by code developers and machine vendors Clear guides for testing, see Regular updates (benchmarks are dropped and new ones added periodically according to relevance) Specialized benchmarks for particular classes of applications
BITS Pilani, Pilani Campus SPEC History First Round: SPEC CPU89 – 10 programs yielding a single number Second Round: SPEC CPU92 – SPEC CINT92 (6 integer programs) and SPEC CFP92 (14 floating point programs) – compiler flags can be set differently for different programs Third Round: SPEC CPU95 – new set of programs: SPEC CINT95 (8 integer programs) and SPEC CFP95 (10 floating point) – single flag setting for all programs Fourth Round: SPEC CPU2000 – new set of programs: SPEC CINT2000 (12 integer programs) and SPEC CFP2000 (14 floating point) – single flag setting for all programs – programs in C, C++, Fortran 77, and Fortran 90
BITS Pilani, Pilani Campus CINT2000 (Integer component of SPEC CPU2000) Program LanguageWhat It Is 164.gzip CCompression 175.vpr CFPGA Circuit Placement and Routing 176.gcc CC Programming Language Compiler 181.mcf CCombinatorial Optimization 186.crafty CGame Playing: Chess 197.parser CWord Processing 252.eon C++Computer Visualization 253.Perlbmk CPERL Programming Language 254.gap C Group Theory, Interpreter 255.Vortex CObject-oriented Database 256.bzip2 CCompression 300.twolf CPlace and Route Simulator
BITS Pilani, Pilani Campus CFP2000 (Floating point component of SPEC CPU2000) Program Language What It Is 168.wupwiseFortran 77Physics / Quantum Chromodynamics 171.swimFortran 77Shallow Water Modeling 172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field 173.applu Fortran 77 Parabolic / Elliptic Differential Equations 177.mesa C 3-D Graphics Library 178.Galgel Fortran 90 Computational Fluid Dynamics 179.art CImage Recognition / Neural Networks 183.equakeCSeismic Wave Propagation Simulation 187.facerecFortran 90 Image Processing: Face Recognition 188.ammp CComputational Chemistry 189.lucas Fortran 90 Number Theory / Primality Testing 191.fma3d Fortran 90 Finite-element Crash Simulation 200.sixtrackFortran 77 High Energy Physics Accelerator Design 301.apsi Fortran 77 Meteorology: Pollutant Distribution
BITS Pilani, Pilani Campus Amdahl's Law Amdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahlcomputer architectGene Amdahl This law is used to find the maximum expected improvement to an overall system when only part of the system is improved. 15
BITS Pilani, Pilani Campus Amdahl's Law Execution Time After Improvement = Execution Time Unaffected +( Execution Time Affected / Amount of Improvement ) 16 Time before Time after Improvement Improvement
BITS Pilani, Pilani Campus Amdahl’s Law Speed-up = Perf new / Perf old =Exec_time old / Exec_time new = Performance improvement from using faster mode is limited by the fraction the faster mode can be applied. 17 f (1 - f) T old (1 - f) T new f / P
BITS Pilani, Pilani Campus Example "Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?" How about making it 5 times faster? There is no amount by which multiple can be enhanced to achieve 5 times increase in performance. Thus performance enhance possible with a given improvement is limited by the amount that the improved feature is used. Principle: Make the common case fast 18
BITS Pilani, Pilani Campus Remember Performance is specific to a particular program/s –Total execution time is a consistent summary of performance For a given architecture performance increases come from: –increases in clock rate (without adverse CPI affects) –improvements in processor organization that lower CPI –compiler enhancements that lower CPI and/or instruction count 19
BITS Pilani, Pilani Campus Processor Design Introduction 20
BITS Pilani, Pilani Campus Implementing MIPS Fetch/Execute cycle 21
BITS Pilani, Pilani Campus 22