Execution time Execution Time (processor-related) = IC x CPI x T

Execution time Execution Time (processor-related) = IC x CPI x T
IC = instruction count CPI = average number of system clock periods to execute an instruction T = clock period

data transfer instructions
Example Consider two SRC programs having three types of instructions given as follows Number of .. Program 1 Program 2 data transfer instructions 2 1 control instructions 5 ALSU Instructions Compare both the programs for the following parameters Instruction count Speed of execution

Example contd.. Instruction count IC. IC for program 1= 2+2+2=6
For execution time we can use the following SRC specifications. ET = IC x CPI x T ET1= (2x2)+(2x3)+(2x4) = 18 ET2 =(5x2)+(1x3)+(1x4) =17 Instruction Type CPI Control 2 ALSU 3 Data Transfer 4 Note: Since both programs are executing on the same machine, the T factor can be ignored while calculating ET.

lar r6,mpy ;load address of mpy lar r7, next ;load address of next
Problem: Consider the following SRC code segments for implementing the operation a=b+5c. Find which one is more efficient in terms of instruction count and execution time. Program 1: Multiplication by using repeated addition in a for loop org 0 a: .dw 1 b: .dw 1 c: .dw 1 .org 80 la r5, ; load value of loop lar r6,mpy ;load address of mpy lar r7, next ;load address of next ld r2, b ; load contents of b ld r3, c ; load contents of c la r4, ;load 0 in r4 mpy: brzr r7,r ; jump to next after 5 iterations add r4,r4,r ;r4 contains r4+c addi r5,r5, ; decrement index br r ; loop again next: add r4,r4,r ; r4 contains sum of b and 5c st r4, a ;store at address a stop

Problem: Consider the following two SRC code segments for implementing the operation a=b+5c. Find which one is more efficient in terms of instruction count and execution time. Program 2: Multiplication using sub-routine call .org 0 a: .dw 1 b: .dw 1 c: .dw 1 .org 80 lar r1,mpy ;load address of mpy in r1 ld r2, b ; load contents of b in r2 la r3, ; load index in r3 ld r4,c ; load contents of c in r4 brl r5, r ; r5 contains PC add r2,r2,r7 ; r2 contains sum b+5c st r2, a stop mpy: la r7, ;r7 contains zero lar r8,again ;r8 contain again address again: brzr r5,r ;exit loop when index is add r7,r7,r4 ; r7 contains r7+c addi r3,r3, ; decrement index br r8

Solution The instructions in both programs can be divided into 3
types and the respective count of each type is Number of.. Program 1 Program 2 Data transfer instructions 7 Control instructions 3 4 ALSU instructions IC for program 1 = = 13 IC for program 2 = = 14

Solution contd.. For execution time, consider the following SRC
specifications. ET = IC x CPI x T ET1= (7x4)+(3x2)+(3x3) = 43T ET2= (7x4)+(4x2)+(3x3) = 45T Conclusion: Program 1 runs faster than program 2 as obvious from the execution time of both. Instruction Type CPI Control 2 ALSU 3 Data Transfer 4

MIPS Millions of Instructions Per Second = IC / (ET x 106)
Capability of different instructions varies from machine to machine, eg. RISC machines have simpler instructions, so the same job will require more instructions Was popular when the VAX 11/780 was treated as a reference – late 70s and early 80s

MIPS as a performance metric
MIPS is inversely proportional to execution time, ET= IC / (MIPS x 106 )

Example Consider a machine having a 100 MHz clock and three
instruction types with following parameters. Now suppose that two different compilers generate code for the same program. The instruction count for each is given as follows Instruction Type CPI Control 2 ALSU 3 Data Transfer 4 IC in millions Code from compiler 1 Code from compiler 2 Control 5 10 ALSU 1 Data Transfer

Compare the two codes according to MIPS and according to execution time.
Solution: First we find the CPI for both code sequences Since CPI = clock cycles for each type of instruction / IC CPI1= (5x2 + 1x3 + 1x4)/ 7 = 2.43 CPI2= (10x2 +1x3 + 1x4)/12 = 2.25 As MIPS= Clock Rate/ (CPI x 106 ) MIPS1= 100 x 106 / (2.43 x 106) = 41.15 MIPS2=100 x 106 / (2.25 x 106) = 44.44 Hence the code generated by compiler 2 has higher MIPS Rating.

Compare the two codes according to MIPS and according to execution time.
Solution: First we find the CPI for both code sequences Since CPI = clock cycles for each type of instruction / IC CPI1= (5x2 + 1x3 + 1x4)/ 7 = 2.43 CPI2= (10x2 +1x3 + 1x4)/12 = 2.25 As MIPS= Clock Rate/ (CPI x 106 ) MIPS1= 100 x 106 / (2.43 x 106) = 41.15 MIPS2=100 x 106 / (2.25 x 106) = 44.44 Hence the code generated by compiler 2 has higher MIPS Rating. As MIPS = IC / (ET x 106) MIPS= (IC x clock rate)/ ( IC x CPI x 106) = Clock rate/(CPI x 106)

Solution contd.. Since ET = IC / (MIPS x 106)
ET1= (7 x 106) / (41.15 x 106) = 0.17 seconds ET2= (12 x 106) / ( x 106) = 0.27 seconds Hence code sequence 1 is much more efficient in terms of execution time.

MFLOPS Millions of FLoating point Operations Per Second
Using FP operations makes more sense to some compared to using just any instructions Results vary from FP op to FP op Better compared to MIPS because of two reasons:

2 reasons FP ops are complex, and therefore, provide a better picture of the hardware capabilities on which they are run Overheads (get operands, store results, etc. ) are effectively lumped with the FP ops they support

*** The name is a play on the word Whetstone
Dhrystones *** Dhrystone is a general “integer performance” benchmark test originally developed by Reinhold Weicker in 1984. Small program; less than 100 HLL statements Compiles to about 1 to 1.5 Kb of code *** The name is a play on the word Whetstone

Disadvantages of using Whetstones and Dhrystones
Both Whetstones and Dhrystones are now considered obsolete because of the following reasons. Small, fit in cache Obsolete instruction mix Prone to compiler tricks Difficult to reproduce results Uncontrolled source code

SPEC System Performance Evaluation Cooperative
(SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard, MIPS Computer Systems and SUN Microsystems Latest version is SPEC CPU2000

SPEC The standard SPEC benchmark suite includes: A compiler
A Boolean minimization program A spreadsheet program A number of other programs that stress arithmetic processing speed It uses a simple metric, elapsed time, to measure performance of competing machines Machine independent code is used for fair comparisona

Advantages It provides for ease of publication.
Each benchmark carries the same weight. SPECratio is dimensionless. It is not unduly influenced by long running programs. It is relatively immune to performance variation on individual benchmarks. It provides a consistent and fair metric.

Programmer’s view of the SRC
7 31 R0 R1 R31 Register file IR PC CPU 1 2 : Main memory 232-1

SRC: Notation R[3] means contents of register 3
M[8] means contents of memory location 8 A memory word at address 8 is defined as the 32 bits at address 8,9,10 and 11

SRC: Notation (continued…)
Special notation for 32-bit memory words M[8]<31…0>:=M[8]©M[9]©M[10]©M[11] © is used to represent concatenation M[8] M[9] M[10] M[11] 7 8 15 16 23 24 31 MS Byte LS Byte One memory “word” a a+1  Logical addresses a+2 a+3

SRC: instruction formats
Op-code 26 27 31 Type A unused Op-code ra 22 26 27 31 Type B c1 21 Op-code ra rb 16 17 21 22 26 27 31 Type C c2 Op-code ra rb rc c3 11 12 16 17 21 22 26 27 31 Type D

Type A Only two instructions nop (op-code = 0) useful in pipelining
31 27 26 Op-code unused Only two instructions nop (op-code = 0) useful in pipelining stop (op-code = 31) Both are 0-operand

Type B three instructions; all three use relative addressing mode
Op-code ra 22 26 27 31 c1 21 Note: R8 is register name and R[8] means contents of register R8 three instructions; all three use relative addressing mode ldr (op-code = 2 ) load register from memory using relative address ldr R3, 56 R[3] M[PC+56] lar (op-code = 6 ) load register with relative address lar R3, 56 R[3] PC+56 str (op-code = 4) store register to memory using relative address str R8, 34 M[PC+34] R[8] the effective address is computed at run-time by adding a constant to the PC makes the instructions relocatable

Type C three load/store instructions, plus three ALU instructions
Op-code ra rb 16 17 21 22 26 27 31 c2 three load/store instructions, plus three ALU instructions ld (op-code = 1 ) load register from memory ld R3, 56 R[3] M[56] (rb field = 0) ld R3, 56(R5) R[3] M[56+R[5]] (rb field ≠ 0) la (op-code = 5 ) load register with displacement address la R3, 56 R[3] la R3, 56(R5) R[3] R[5] st (op-code = 3 ) store register to memory st R8, 34 M[34] R[8] st R8, 34(R6) M[34+R[6]] R[8]

Problem: Consider the following two SRC code segments for implementing multiplication. Find which one is more efficient in terms of instruction count and execution time. Program 1: Multiplication by using repeated addition in a for loop Program 2: Multiplication using sub-routine call la r5, ; load value of loop lar r6,mpy ;load address of mpy lar r7, next ;load address of next ld r2, b ; load contents of b in r2 ld r3, c ; load contents of c in r3 la r4, ;load 0 in r4 mpy: brzr r7,r ; jump to next after 5 iteration add r4,r4,r ;r4 contains r4+c addi r5,r5, ; decrement index br r ; loop again next: add r4,r4,r ; r4 contains sum of b st r4, a ;store at address label a lar r1,mpy ;load address of mpy in r1 la r3, ; load index in r3 ld r4,c ; load contents of c in r4 brl r5, r ; r5 contains PC add r2,r2,r7 ; r2 contains sum of b & 5c st r2, a lar r8,again ;r8 contain again address again: brzr r5,r ;exit loop when index is 0 add r7,r7,r4 ; r7 contains r7+c addi r3,r3, ; decrement index br r8

Solution The instructions in both programs can be divided into 3
types and the respective count of each type is Number of.. Program 1 Program 2 Data transfer instructions 7 6 Control instructions 2 3 ALSU instructions IC for program 1 = = 12 IC for program 2 = = 12

Solution contd.. For execution time, consider the following SRC
specifications. ET = IC x CPI x T ET1= (7x4)+(2x2)+(3x3) = 41 ET2= (6x4)+(3x2)+(3x3) = 39 Conclusion: Although the instruction count for both programs is same, program 2 runs much faster than program 1 due to lesser number of clock cycles required. Instruction Type CPI Control 2 ALSU 3 Data Transfer 4

Execution time Execution Time (processor-related) = IC x CPI x T

Similar presentations

Presentation on theme: "Execution time Execution Time (processor-related) = IC x CPI x T"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Execution time Execution Time (processor-related) = IC x CPI x T

Similar presentations

Presentation on theme: "Execution time Execution Time (processor-related) = IC x CPI x T"— Presentation transcript:

Similar presentations

About project

Feedback