1 Recap
2 Measuring Performance A computer user: response time (execution time). A computer center manager - throughput - the total amount of work done in a period of time. CPU time : a very good and fair measure of performance. CPU time can also be divided into user CPU time (program) and system CPU time (OS).
3 Aspects of CPU Execution Time CPU Time = Instruction count x CPI x Clock cycle Instruction Count I ClockCycle C CPI Depends on: CPU Organization Technology Depends on: Program Used Compiler ISA CPU Organization Depends on: Program Used Compiler ISA
4 Factors Affecting CPU Performance CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPI C Clock Cycle C Instruction I Count I Program Compiler Organization Technology Instruction Set Architecture (ISA) X X X X X X X X X
5 Example: tradeoff between C and CPI Assume stores can execute in 1 cycle by slowing clock 15% Should this be implemented? OpFrequencyCycle Count ALU ops43%1 Loads21%1 Stores12%2 Branches24%2
6 Simple Example Old CPI = 0.43 x x x x 2 = 1.36 New CPI = 0.43 x x x x 2 = 1.24 Speedup = old time/new time –= {I x old CPI x C}/{I x new CPI x 1.15 C} –= 1.36 / (1.24 x 1.15) = 0.95 Answer: Don’t make the change
7 Some Caveats Inter-dependence of I, CPI, and C: Improvement In One May Impact Another –increasing pipeline depth tends to increase clock speed but may increase CPI –Change in ISA to reduce instruction count may require a design with slower clock => May Not Improve Performance –CPI depends on instruction mix => Smaller Instruction Count May Not Improve Performance
8 Code Size & Performance
9 Benchmarks and Benchmarking In lack of a universal task pick some programs that represent common tasks Use representative programs to compare performance of systems: CAUTIONS: –Comparisons are as good as the benchmarks are in representing your real workload. –Many parameters affect measured performance
10 Example: We must use the same compiler Compiler “enhancements” and performance 1998 Morgan Kaufmann Publishers
11 Benchmark Suites A Suite Is a Collection of Representative Benchmarks From Different Application Domains Weakness of Any One Benchmark Likely to Be Compensated By Another Standard Performance Evaluation Corporation (SPEC) –Most Popular Benchmark Suite –Suite Consists of Kernels, Small Fragments, Large Applications –SPEC2006: CINT2006, CFP2006 – Benchmark suites for servers –SPECSFS: measures performance of File servers –SPECWeb: measurers performance of Web servers
12 SPEC CPU2006 Programs Benchmark Language Descriptions 400.PerlbenchC Programming Language 401.bzip2 C Compression 403.GccCC Compiler 429.mcf C Combinatorial Optimization 445.gobmk C Artificial Intelligence: Go 456.HmmerC Search Gene Sequence 458.sjeng C Artificial Intelligence: chess 462.libquantum CPhysics / Quantum Computing 464.h264refCVideo Compression 471.omnetpp C++Discrete Event Simulation 473.astar C++Path-finding Algorithms 483.xalancbmk C++XML Processing CINT2006 (Integer) Source:
13 SPEC CPU2006 Programs Benchmark Language Descriptions 410.BwavesFortran Fluid Dynamics 416.GamessFortran Quantum Chemistry 433.MilcC Physics / Quantum Chromodynamics 434.ZeusmpFortran Physics / CFD 435.GromacsC, FortranBiochemistry / Molecular Dynamics 436.cactusADMC, FortranPhysics / General 437.leslie3dFortranFluid Dynamics 444.NamdC++Biology / Molecular Dynamics 447.dealIIC++ Finite Element Analysis 450.SoplexC++ Linear Programming, Optimization 453.PovrayC++ Image Ray-tracing 454.CalculixC, FortranStructural Mechanics 459.GemsFDTDFortran Computational Electromagnetics 465.TontoFortranQuantum Chemistry 470.LbmCFluid Dynamics 481.WrfC, FortranWeather 482.sphinx3CSpeech CFP2006 (Floating Point) Source:
14 Top 20 SPEC CPU2006 Results (As of August 2007) # MHz Processor int peak int baseMHz Processor fp peak fp base 13000Core 2 Duo E POWER POWER Core 2 Duo E Xeon Dual-Core Itanium Xeon X Dual-Core Itanium Core 2 Duo E Core 2 Duo E Core 2 Duo E Xeon Core 2 Quad Q Opteron Xeon X Core 2 Duo E Xeon Opteron Xeon X Xeon Xeon X Xeon X Core Core 2 Quad Q Core 2 Quad Q Xeon X Core 2 Duo X Core 2 Extreme X Xeon Core 2 Quad Q Core 2 Duo T Dual-Core Itanium Xeon E Xeon Xeon Xeon E Xeon Opteron Xeon X Xeon X Source: Top 20 SPECfp2006 Top 20 SPECint2006
15 Performance Evaluation Using Benchmarks “For better or worse, benchmarks shape a field” Good products created when we have: –Good benchmarks –Good ways to summarize performance Given sales depend in big part on performance relative to competition, there is big investment in improving products as reported by performance summary If benchmarks inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins!
16 How to Summarize Performance