Download presentation
Presentation is loading. Please wait.
1
Computer Organization and Design Chapter 4
EGRE 426 Computer Organization and Design Chapter 4
2
Performance Performance is important!
Often determines viability of the hardware software system. Consider running windows XP on a PC with performance of the original IBM PC (4.8 MHz clock) Determining performance can be difficult. Response Time (latency) — How long does it take for my job to run? — How long does it take to execute a job? — How long must I wait for the database query? Throughput — How many jobs can the machine run at once? — What is the average execution rate? — How much work is getting done?
3
Determining performance can be difficult
Instruction execution times. When a salesman quotes a MIPS (millions of instructions per second) value he is guaranteeing that the machine will not run faster than that value. Benchmarks programs are useful but can produce misleading results. Benchmarks may depend on small sections of repetitive code. There have been many instances of compilers being optimized to do well on popular benchmarks. Real programs provide best indication of performance. Should be chosen based on user needs. Scientific applications have different requirements than large data base applications.
4
Spec95 Benchmarks The System Performance Evaluation Cooperative (SPEC) group was formed in 1988 by representatives of many computer companies. Most popular and comprehensive set of CPU benchmarks. 8 integer and 10 floating-point programs (see Fig 2.6 page 72). Suite of test aimed at providing insight into performance.
5
SPEC ‘95
6
Amdahl's Law Execution Time After Improvement = Execution Time Unaffected +( Execution Time Affected / Amount of Improvement ) Example: "Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?" How about making it 5 times faster? Principle: Make the common case fast Total time = 100 sec, multiplication time = 20 sec. Ta = 100 sec S = Ta/Tb = 4 Tb = Ta/4 = 100/4 = 25 Let n be improvement in multiplication. Tb = /n = 25 80/n = 5 n = 80/5 = 16 To increase overall performance by 4 we must increase multiplication performance by 16. S = Ta/Tb = 5 Tb = Ta/5 = 100/5 = 20 sec To increase overall performance by 5 we would have to do multiplications in 0 time.
7
Execution Time Elapsed Time CPU time Our focus: user CPU time
counts everything (disk and memory accesses, I/O , etc.) a useful number, but often not good for comparison purposes CPU time doesn't count I/O or time spent running other programs can be broken up into system time, and user time Our focus: user CPU time time spent executing the lines of code that are "in" our program
8
For a given instruction set architecture increase in CPU performance can come form three sources
Increase in clock rate or reduction in clock cycles per instruction. Better compilers Improvements in processor architecture
9
Terms Cycle time or clock cycle time.
If clock frequency, f = 400 MHz then cycle time T = 1/f = 2.5 ns. CPI – cycles per instruction or clocks per instruction. Different instructions may require different number of clock cycles to execute. MIPS – million of instructions per second. Varies depending on instruction stream. Peak MIPS – Best case instruction stream. Native MIPS – Typical instruction stream. MIPS = (instruction count) / (execution time x 106) = average number of instructions executed in one micro sec.
10
An example Assume we only need to consider CPU time.
Let clock rate = 400 MHz = 400 million cycles/sec. Three types of instructions: A, B, and C. Assume we run a program that executes 1000 million instructions.
11
An example continued A 4 300 B 6 500 C 8 200 Inst type CPI
Millions of instruction Millions of cycles Time in sec. MIPS A 4 300 300 inst x 4 cy/inst = 1200 cy T = 1.2 x 109 x 2.5 ns = 3 sec 300/3 = 100 400/4 = 100 B 6 500 500 x 6 = 3000 3.0 x 2.5 = 7.5 sec 500/7.5 = 67 400/6 = 67 C 8 200 200 x 8 = 1600 1.6 x 2.5 = 4 sec 200/4 = 50 400/8 = 50 Total 1000 5800 14.5 sec Avg CPI = 5800 Mcy/1000 Minst = 5.8 1000Mi/14.5 sec = 69 native MIPS – 300 million instructions in 3 seconds or 400 million cycles per second where each instruction takes 4 cycles.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.