Recap Technology trends Cost/performance
Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine A executes a program in 10s; Machine B executes the same program in 15s. Which is true: 1)A is 50% faster than B? 2)A is 33% faster than B?
Performance H&P’s definition: “X is n times faster than Y” means Performance is reciprocal of time:
Example Answer: 1) A is 50% faster than B E.g. Machine A executes a program in 10s; Machine B executes the same program in 15s. Which is true: 1)A is 50% faster than B? 2)A is 33% faster than B?
Performance Response time? Throughput?
Measuring Performance Focus on execution time of real programs Measuring execution time? Wall clock time (elapsed time) CPU time (excludes I/O and other processes) oUser CPU time oSystem CPU time iota:~$ time gcc -g tmpcnv.s -o tmpcnv real 0m3.352s user 0m0.367s sys 0m0.468s
Choosing Programs to Measure Performance Real Programs –Compilers, text-processing, CAD tools, etc. Modified applications –Scripted or modified for portability Kernels –Attempt to extract key sections from real programs (Livermore loops, Linpack) Toy Benchmarks –Short examples (e.g. Sieve of Eratosthenes) Synthetic Benchmarks –Whetstone, Dhrystone
Benchmarking H&P: car magazines are more scientific about reporting performance than many CS journals!
Benchmark Suites Collections of benchmarks –E.g. SPEC CPU2000 (INT and FP) 25 real FORTRAN/C/C++ programs, modified for portability –Specific graphics benchmarks
Server Benchmarks SPEC also has server benchmarks –File server –Web server TPC: Transaction Processing Council –Various transaction processing benchmarks
Embedded Benchmarks Much less well developed –Tend to use Dhrystone! EEMBC –Recent development –34 benchmarks (mainly kernels) in five application areas
Summarising Performance Measurements Complex area –Weighted arithmetic mean –Geometric mean –Normalised results –…
1.6 Quantitative Principles Make the common case fast! –E.g. addition: focus on “normal” addition, not overflow situations Amdahl’s Law –Quantifies improvements gained by focussing on one aspect of a design
Amdahl’s Law
Example We are considering an enhancement that is 10 times faster than the original, but is only used 40% of the time.
CPU Performance CPU time related to clock speed: –Period (e.g. 1ns) –Rate (e.g. 1GHz) Also interested in Cycles Per Instruction (CPI)
Three Equal Factors Clock rate (technology) CPI (architecture) Instruction count (architecture and compiler)
Measuring IC & CPI Many modern processors include hardware counters for instructions and clock cycles Simulations can give even more detail –Time consuming, but can be very accurate
Another Principle: Locality Locality of Reference –“90/10 Rule” Also applies to data Two aspects: –Temporal locality –Spatial locality
Taking Advantage of Parallelism Key principle for improving performance Examples: –System level: parallel processing, disk arrays, etc. –Processor level: pipelining –Digital design: caches, ALU adders, etc.
1.7 Putting It All Together: Performance & Price/Performance Measure performance and performance/cost for three categories –Desktop (SPEC INT and FP) –TP Servers (TPC-C) –Embedded Processors (EEMBC)
Desktop Integer: –Performance/cost tracks performance FP: –Not as closely related –Pentium 4 much better than Pentium III AMD Athlon very good value for money
Servers Twelve systems –Six top performers –Six best price-performance Multiprocessors –3 P3’s – 280 P3’s Cost: –$131,000 – $15 million
Embedded Processors Difficult to assess –Benchmarks very new –Designs very application-specific –Power a major constraint –Cost difficult to quantify (are support chips required?)
Embedded Processors Range: –500MHz AMD K6 ($78) and IBM PowerPC ($94) used for network switches, etc. –167MHz NEC VR 5432 ($25) popular in colour laser printers –180MHz NEC VR 4122 ($33) popular in PDAs (low power)
1.8 Another View: Power Consumption and Efficiency Embedded processors from previous example: power ranged from 700mW to 9600mW Fig. 1.27: Performance/watt –NEC VR 4122 huge leader
1.9 Fallacies and Pitfalls Fallacy: Relative performance of two similar processors can be judged by clock rate or by a single benchmark –Factors such as pipeline structure and memory system have major impact –E.g. Pentium III vs. Pentium 4 (Fig. 1.28)
1.7GHz P4 –vs– 1.0GHz P3
Fallacies and Pitfalls Fallacy: Benchmarks remain valid indefinitely –Optimisations change –Perhaps deliberately! –Even real programs are affected by changes in technology –E.g. gcc : increasing percentage is “system time” –SPEC has adapted considerably
Fallacies and Pitfalls Pitfall: Comparing hand-coded assembly and compiled high-level language performance –E.g. embedded processor benchmarks –Hand-coded is 5 – 87 times faster!