Presentation is loading. Please wait.

Presentation is loading. Please wait.

204521 Digital System Architecture 1 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.

Similar presentations


Presentation on theme: "204521 Digital System Architecture 1 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta."— Presentation transcript:

1 204521 Digital System Architecture 1 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta (Based on lecture notes by Prof. Randy Katz & Prof. Jan M. Rabaey, UC Berkeley)

2 204521 Digital System Architecture 2 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Review: What is Computer Architecture? Technology Applications Computer Architect Interfaces Machine Organization Measurement & Evaluation ISAAPI Link I/O Chan Regs IR

3 204521 Digital System Architecture 3 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 The Architecture Process New concepts created Estimate Cost & Performance Sort Good ideas Mediocre ideas Bad ideas

4 204521 Digital System Architecture 4 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Performance Measurement and Evaluation l Many dimensions to computer performance –CPU execution time l by instruction or sequence –floating point –integer –branch performance –Cache bandwidth –Main memory bandwidth –I/O performance l bandwidth l seeks l pixels or polygons per second l Relative importance depends on applications P $ M

5 204521 Digital System Architecture 5 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Evaluation Tools l Benchmarks, traces, & mixes –macrobenchmarks & suites l application execution time –microbenchmarks l measure one aspect of performance –traces l replay recorded accesses –cache, branch, register l Simulation at many levels –ISA, cycle accurate, RTL, gate, circuit l trade fidelity for simulation rate l Area and delay estimation l Analysis –e.g., queuing theory MOVE39% BR20% LOAD20% STORE10% ALU11% LD 5EA3 ST 31FF …. LD 1EA2 ….

6 204521 Digital System Architecture 6 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Benchmarks l Microbenchmarks –measure one performance dimension l cache bandwidth l main memory bandwidth l procedure call overhead l FP performance –weighted combination of microbenchmark performance is a good predictor of application performance –gives insight into the cause of performance bottlenecks l Macrobenchmarks –application execution time l measures overall performance, but on just one application Perf. Dimensions Applications Micro Macro

7 204521 Digital System Architecture 7 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Some Warnings about Benchmarks l Benchmarks measure the whole system –application –compiler –operating system –architecture –implementation l Popular benchmarks typically reflect yesterday’s programs –computers need to be designed for tomorrow’s programs l Benchmark timings often very sensitive to –alignment in cache –location of data on disk –values of data l Benchmarks can lead to inbreeding or positive feedback –if you make an operation fast (slow) it will be used more (less) often l so you make it faster (slower) –and it gets used even more (less) and so on…

8 204521 Digital System Architecture 8 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Architectural Performance Laws and Rules of Thumb

9 204521 Digital System Architecture 9 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Measurement and Evaluation Architecture is an iterative process: Searching the space of possible designs Make selections Evaluate the selections made Good measurement tools are required to accurately evaluate the selection.

10 204521 Digital System Architecture 10 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Measurement Tools l Benchmarks, Traces, Mixes l Cost, delay, area, power estimation l Simulation (many levels) –ISA, RTL, Gate, Circuit l Queuing Theory l Rules of Thumb l Fundamental Laws

11 204521 Digital System Architecture 11 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Measuring and Reporting Performance l What do we mean by one Computer is faster than another? –program runs less time l Response time or execution time –time that users see the output l Throughput –total amount of work done in a given time

12 204521 Digital System Architecture 12 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Performance “Increasing and decreasing” ????? We use the term “improve performance” or “ improve execution time” When we mean increase performance and decrease execution time. improve performance = increase performance improve execution time = decrease execution time

13 204521 Digital System Architecture 13 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Measuring Performance Definition of time l Wall Clock time l Response time l Elapsed time –A latency to complete a task including disk accesses, memory accesses, I/O activities, operating system overhead

14 204521 Digital System Architecture 14 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Does Anybody Really Know What Time it is? l User CPU Time (Time spent in program) l System CPU Time (Time spent in OS) l Elapsed Time (Response Time = 159 Sec.) l (90.7+12.9)/159 * 100 = 65%, % of lapsed time that is CPU time. 45% of the time spent in I/O or running other programs UNIX Time Command: 90.7u 12.9s 2:39 65%

15 204521 Digital System Architecture 15 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 example UNIX time command 90.7u 12.95 2:39 65% user CPU time is 90.7 sec system CPU time is 12.9 sec elapsed time is 2 min 39 sec. (159 sec) % of elapsed time that is CPU time is 90.7 + 12.9 = 65% 159

16 204521 Digital System Architecture 16 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Time CPU time –time the CPU is computing –not including the time waiting for I/O or running other program User CPU time –CPU time spent in the program System CPU time –CPU time spent in the operating system performing task requested by the program decrease execution time CPU time = User CPU time + System CPU time

17 204521 Digital System Architecture 17 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Performance System Performance –elapsed time on unloaded system CPU performance –user CPU time on an unloaded system

18 204521 Digital System Architecture 18 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Programs to Evaluate Processor Performance l Real Program l Kernel –Time critical excerpts l Toy benchmark –10-100 lines of code, Sieve of Erastasthens, Puzzles, Quick Sort l Synthetic benchmark –Attempt to match average frequencies of real workloads –similar to kernel –Whetstone, Dhrystone –not even pieces of real program, but kernel might be.

19 204521 Digital System Architecture 19 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Benchmark Suite l collecting of benchmark, try to measure the performance of processor with a variety of application –SPECint92, SPECfp92 –see fig. 1.9

20 204521 Digital System Architecture 20 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Benchmarking Games l Differing configurations used to run the same workload on two systems l Compiler wired to optimize the workload l Test specification written to be biased towards one machine l Synchronized CPU/IO intensive job sequence used l Workload arbitrarily picked l Very small benchmarks used l Benchmarks manually translated to optimize performance

21 204521 Digital System Architecture 21 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Common Benchmarking Mistakes (I) l Only average behavior represented in test workload l Skewness of device demands ignored l Loading level controlled inappropriately l Caching effects ignored l Buffer sizes not appropriate l Inaccuracies due to sampling ignored

22 204521 Digital System Architecture 22 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Common Benchmarking Mistakes (II) l Ignoring monitoring overhead l Not validating measurements l Not ensuring same initial conditions l Not measuring transient (cold start) performance l Using device utilizations for performance comparisons l Collecting too much data but doing too little analysis

23 204521 Digital System Architecture 23 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 SPEC: System Performance Evaluation Cooperative l First Round 1989 –10 programs yielding a single number l Second Round 1992 –SpecInt92 (6 integer programs) and SpecFP92 (14 floating point programs) –Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=memcpy (b,a,c)” wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas l Third Round 1995 –Single flag setting for all programs; new set of programs “benchmarks useful for 3 years”

24 204521 Digital System Architecture 24 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 SPEC: System Performance Evaluation Cooperative l First Round 1989 –10 programs yielding a single number (“SPECmarks”) l Second Round 1992 –SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) l Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)= memcpy(b,a,c)” wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas

25 204521 Digital System Architecture 25 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 SPEC: System Performance Evaluation Cooperative l Third Round 1995 –new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) –“benchmarks useful for 3 years” –Single flag setting for all programs: SPECint_base95, SPECfp_base95

26 204521 Digital System Architecture 26 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 How to Summarize Performance Arithmetic mean (weighted arithmetic mean) tracks execution time:  (T i )/n or  (W i *T i ) Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/  (1/R i ) or n/  (W i /R i ) l Normalized execution time is handy for scaling performance (e.g., X times faster than SPARCstation 10) –Arithmetic mean impacted by choice of reference machine Use the geometric mean for comparison:  (T i )^1/n –Independent of chosen machine –but not good metric for total execution time

27 204521 Digital System Architecture 27 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 SPEC First Round l One program: 99% of time in single line of code l New front-end compiler could improve dramatically

28 204521 Digital System Architecture 28 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Impact of Means on SPECmark89 for IBM 550 ( without and with special compiler option) Ratio to VAX: Time: Weighted Time: ProgramBeforeAfterBeforeAfterBeforeAfter gcc302949518.919.22 espresso353465677.647.86 spice47475105105.695.69 doduc464941385.815.45 nasa7781442581403.431.86 li34341831837.867.86 eqntott404028286.686.68 matrix300787305863.430.37 fpppp908734352.973.07 tomcatv3313820192.011.94 Mean547212410854.4249.99 Geometric Arithmetic Weighted Arith. Ratio1.33Ratio1.16Ratio1.09

29 204521 Digital System Architecture 29 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 The Bottom Line: Performance (and Cost) Time to run the task (ExTime) –Execution time, response time, latency Tasks per day, hour, week, sec, ns … (Performance) –Throughput, bandwidth Plane Boeing 747 BAD/Sud Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers 470 132 Throughput (pass/hr) 72 44

30 204521 Digital System Architecture 30 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 The Bottom Line: Performance (and Cost) "X is n times faster than Y" means ExTime(Y) Performance(X) --------- = --------------- = n ExTime(X) Performance(Y) Performance is the reciprocal of execution time Speed of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde

31 204521 Digital System Architecture 31 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Performance Terminology “X is n% faster than Y” means: ExTime(Y) Performance(X) n --------- =-------------- = 1 + ----- ExTime(X)Performance(Y) 100 n = 100(Performance(X) - Performance(Y)) Performance(Y) n = 100(ExTime(Y) - ExTime(X)) ExTime(X)

32 204521 Digital System Architecture 32 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Example Example: Y takes 15 seconds to complete a task, X takes 10 seconds. What % faster is X? n = 100(ExTime(Y) - ExTime(X)) ExTime(X) n = 100(15 - 10) 10 n = 50%

33 204521 Digital System Architecture 33 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Speedup Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction enhanced of the task by a factor Speedup enhanced, and the remainder of the task is unaffected, then what is ExTime(E) = ? Speedup(E) = ?

34 204521 Digital System Architecture 34 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Amdahl’s Law l States that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time faster mode can be used

35 204521 Digital System Architecture 35 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Amdahl’s Law ExTime new = ExTime old x (1 - Fraction enhanced ) + Fraction enhanced Speedup overall = ExTime old ExTime new Speedup enhanced = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced

36 204521 Digital System Architecture 36 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Example of Amdahl’s Law l Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedup overall = ExTime new =

37 204521 Digital System Architecture 37 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Example of Amdahl’s Law l Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedup overall = 1 0.95 =1.053 ExTime new = ExTime old x (0.9 +.1/2) = 0.95 x ExTime old

38 204521 Digital System Architecture 38 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Corollary: Make The Common Case Fast l All instructions require an instruction fetch, only a fraction require a data fetch/store. –Optimize instruction access over data access l Programs exhibit locality –90% of time in 10% of code –Spatial Locality –Temporal Locality l Access to small memories is faster –Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories. Reg's Cache Memory Disk / Tape

39 204521 Digital System Architecture 39 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 RISC Philosophy l The simple case is usually the most frequent and the easiest to optimize! l Do simple, fast things in hardware and be sure the rest can be handled correctly in software

40 204521 Digital System Architecture 40 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Metrics of Performance Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s Cycles per second (clock rate) Megabytes per second Answers per month Operations per second

41 204521 Digital System Architecture 41 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 % of Instructions Responsible for 80-90% of Instructions Executed

42 204521 Digital System Architecture 42 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Aspects of CPU Performance CPU time= Seconds = Instructions x Cycles x Seconds (User) Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds (User) Program Program Instruction Cycle In str. Cnt CPI Clock Rate Program Compiler Instr. Set Organization Technology

43 204521 Digital System Architecture 43 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Aspects of CPU Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X

44 204521 Digital System Architecture 44 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Marketing Metrics MIPS = Instruction Count / Time * 10^6 = Clock Rate / CPI * 10^6 Machines with different instruction sets ? Programs with different instruction mixes ? – Dynamic frequency of instructions Uncorrelated with performance MFLOP/s = FP Operations / Time * 10^6 Machine dependent Often not where time is spent Normalized: add,sub,compare,mult 1 divide, sqrt 4 exp, sin,... 8 Normalized: add,sub,compare,mult 1 divide, sqrt 4 exp, sin,... 8

45 204521 Digital System Architecture 45 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Cycles Per Instruction CPU time = CycleTime *  CPI i * I i i = 1 n Avg. CPI =  CPI * F where F = I i = 1 n i i ii Instruction Count “Instruction Frequency” Invest Resources where time is Spent! CPI = Clock Cycles for a Program/Instr. Count “Average Cycles per Instruction”

46 204521 Digital System Architecture 46 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Example: Calculating CPI Typical Mix Base Machine (Reg / Reg) OpFreqCyclesCPI(i)(% Time) ALU50%1.5(33%) Load20%2.4(27%) Store10%2.2(13%) Branch20%2.4(27%) 1.5 1.5 is the Average CPI for this instruction mix

47 204521 Digital System Architecture 47 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Organizational Trade-offs Instruction Mix Cycle Time CPI Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units

48 204521 Digital System Architecture 48 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Base Machine (Reg / Reg) OpFreqCycles ALU50%1 Load20%2 Store10%2 Branch20%2 Typical Mix Trade-off Example Add register / memory operations: – One source operand in memory – One source operand in register – Cycle count of 2 Branch cycle count to increase to 3. What fraction of the loads must be eliminated for this to pay off?

49 204521 Digital System Architecture 49 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Example Solution Exec Time = Instr Cnt x CPI x Clock OpFreqCycles ALU.50 1.5 Load.20 2.4 Store.10 2.2 Branch.20 2.3 Reg/Mem 1.00 1.5

50 204521 Digital System Architecture 50 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Example Solution Exec Time = Instr Cnt x CPI x Clock OpFreqCycles FreqCycles ALU.50 1.5.5 – X 1.5 – X Load.20 2.4.2 – X 2.4 – 2X Store.10 2.2.1 2.2 Branch.20 2.3.2 3.6 Reg/Mem X 22X 1.00 1.5 1 – X (1.7 – X)/(1 – X) CPI New must be normalized to new instruction frequency Cycles New Instructions New

51 204521 Digital System Architecture 51 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Example Solution Exec Time = Instr Cnt x CPI x Clock OpFreqCycles FreqCycles ALU.50 1.5.5 – X 1.5 – X Load.20 2.4.2 – X 2.4 – 2X Store.10 2.2.1 2.2 Branch.20 2.3.2 3.6 Reg/Mem X 22X 1.00 1.5 1 – X (1.7 – X)/(1 – X) Instr Cnt Old x CPI Old x Clock Old = Instr Cnt New x CPI new x Clock New 1.00 x 1.5 = (1 – X) x (1.7 – X)/(1 – X)

52 204521 Digital System Architecture 52 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Example Solution Exec Time = Instr Cnt x CPI x Clock OpFreqCycles FreqCycles ALU.50 1.5.5 – X 1.5 – X Load.20 2.4.2 – X 2.4 – 2X Store.10 2.2.1 2.2 Branch.20 2.3.2 3.6 Reg/Mem X 22X 1.00 1.5 1 – X (1.7 – X)/(1 – X) Instr Cnt Old x CPI Old x Clock Old = Instr Cnt New x CPI New x Clock New 1.00 x 1.5 = (1 – X) x (1.7 – X)/(1 – X) 1.5 = 1.7 – X 0.2 = X ALL loads must be eliminated for this to be a win!

53 204521 Digital System Architecture 53 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 How to Summarize Performance l Arithmetic mean (weighted arithmetic mean) tracks execution time: SUM(Ti)/n or SUM(Wi*Ti) l Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/SUM(1/Ri) or n/SUM(Wi/Ri) l Normalized execution time is handy for scaling performance l But do not take the arithmetic mean of normalized execution time, use the geometric mean (prod(Ri)^1/n)

54 204521 Digital System Architecture 54 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Means Arithmetic mean Geometric mean Harmonic mean Consistent independent of reference Can be weighted. Represents total execution time

55 204521 Digital System Architecture 55 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Which Machine is “Better”? Computer A Computer B Computer C Program P1(sec)1 10 20 Program P2 1000 100 20 Total Time 1001 110 40

56 204521 Digital System Architecture 56 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Weighted Arithmetic Mean Assume three weighting schemes P1/P2 Comp AComp BComp C.5/.5 500.5 55.0 20.909/.09191.82 18.8 20.999/.0012 10.09 20

57 204521 Digital System Architecture 57 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Performance Evaluation l Given sales is a function of performance relative to the competition, big investment in improving product as reported by performance summary l Good products created then have: –Good benchmarks –Good ways to summarize performance l If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! l Execution time is the measure of performance


Download ppt "204521 Digital System Architecture 1 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 28 ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta."

Similar presentations


Ads by Google