Performance Evaluation of Architectures Vittorio Zaccaria
Vittorio Zaccaria, Architectures Performance Evaluation From the client perspective: response time (or latency): time to run the task. From the server perspective: Throughput (or bandwidth): tasks executed per second.
Vittorio Zaccaria, Architectures Speedup X is n% faster than Y if: ExTime(y) Speedup(x,y)= = 1+n/100 ExTime(x)
Vittorio Zaccaria, Architectures Performance and Speedup Performance(A)=1/ExTime(A). Speedup(x,y)= Performance(x)/Performance(y)
Vittorio Zaccaria, Architectures Excercise: A executes a task in 10 secs. B executes the same task in 15 secs What is true? 1) A is 50% faster than B 2) A is 33% faster than B
Vittorio Zaccaria, Architectures Excercise (15 min) Linpack and Dhrystone benchmarks on several VAX models: ModelYearLinpack ExTime Dhrystone ExTime VAX-11/ VAX VAX
Vittorio Zaccaria, Architectures Excercise: Calculate: In the Linpack case: Total speedup and average per-year speedup from VAX8600 to VAX780 The same for VAX8550 and VAX8600 In the Dhrystone case: Total speedup and average per-year speedup from VAX8600 to VAX780 The same for VAX8550 and VAX8600
Vittorio Zaccaria, Architectures Excercise speedup Average per Year speedup
Vittorio Zaccaria, Architectures Amdahl's Law
Vittorio Zaccaria, Architectures Amdahl’s Law ExTime new = ExTime old x (1 - Fraction enhanced ) + Fraction enhanced Speedup overall = ExTime old ExTime new Speedup enhanced = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced If speedup-enhanced goes to infinity, speedup-oveall reaches 1/(1-fraction_enhanced)
Vittorio Zaccaria, Architectures Excercise on Amdhal’s Law Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedup overall = ?
Vittorio Zaccaria, Architectures Excercise on Amdhal’s Law Speedup overall = =1.053 ExTime new = ExTime old x ( /2) = 0.95 x ExTime old Solution:
Vittorio Zaccaria, Architectures nd Excercise on Amdhal’s Law Suppose to improve the CPU speed 5X (with a 5X cost) Suppose that the CPU is used 50% of the time and that the base CPU cost is 1/3 of the entire system It is worth to upgrade the CPU? Compare speedup and costs!
Vittorio Zaccaria, Architectures nd Excercise on Amdhal’s Law Speedup=1/( /5)=1.67 Increased= (2/3)+(1/3)*5=2.33 It is not worth to upgrade the CPU!
Vittorio Zaccaria, Architectures Performance Indexes Response time = latency due to the completion of a task including disk accesses, memory accesses, I/O Activity and other parallel tasks. CPU time = does not include I/O wait time and corresponds to CPU user time and the CPU system time (OS)
Vittorio Zaccaria, Architectures CPU time CPUtime(P)= Clock Cycles needed to exec P clock frequency
Vittorio Zaccaria, Architectures Average CPI The average Clock Cycles per Instruction (CPI) can be defined as: clock cycles needed to exec. P CPI(P)= number of instructions CPUtime= Tclock*CPI*Ninst = (CPI*Ninst)/f
Vittorio Zaccaria, Architectures Aspects of CPU performance CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle
Vittorio Zaccaria, Architectures Aspects of CPU performance The CPI can vary among instructions: CPI_i is the number of clock cycles needed by instruction type i IC_i is the number of times that instruction i is executed. CPU time =CycleTime * Σ CPI * IC i = 1 n i i
Vittorio Zaccaria, Architectures Overall CPI The overall CPI can be expressed as (CPU clock cycles)/Instructions: CPI = Σ CPI i *(I i / instructions) i = 1 n Invest Resources where time is Spent!
Vittorio Zaccaria, Architectures Excercise Base Machine (Reg / Reg) OpFreqCycles ALU50%1 Load20%5 Store10%3 Branch20%2 A RISC processor shows the following statistics: Calculate the average CPI and the speedup w.r.t.: The same machine with an improved D$ (Load Cycles=2) The same machine with a branch CPI=1 The same machine with 2 ALUs working in parallel.
Vittorio Zaccaria, Architectures Solution Average CPI: 0.5x1+0.2x5+0.1x3+0.2x2=2.2 Use Amdhal’s law to compute overall speedup: Cache improved Speedup: 1.13 Branch improved Speedup: 1.11 ALU improved Speedup: 1.33
Vittorio Zaccaria, Architectures Excercise Procedure calls in architecture A are very expensive. Suppose to introduce a new architecture B similar to A such that: A has a clock 5% faster than B. The fraction of loads/stores of A is 30%. B executes 30% loads/stores less than A Loads/stores require 1 clock cycle. Compare CPU times of A and B.
Vittorio Zaccaria, Architectures Solution Number of instr. of B NB = [1-(0.3x0.3)]*NA=0.9*NA Clock Period of B: TB=TA*1.05 CPUtimeA=1*NA*TA CPUtimeB=0.9*NA*TA*1.05*1 =0.945*CPUtimeA
Vittorio Zaccaria, Architectures MIPS MIPS= millions of instructions per second. number of instructions frequency of the clock = execution time(in sec) * 10^6 CPI * 10^6
Vittorio Zaccaria, Architectures MIPS (cont.) Problem: depends heavily on the ISA. Difficult to compare different ISAs It depends on the program It can be the inverse of the performance!! A complex instruction set can have a MIPS lower than a simple instruction set but can execute in less time programs.
Vittorio Zaccaria, Architectures Relative MIPS Relative MIPS of an architecture A: TCPU_A x MIPS_reference_arch TCPU_reference_arch In the 80’s the reference architecture was the VAX_11/780