CS 314 Computer Organization Fall 2017 Chapter 3: Arithmetic for Computers Combinations to AV system, etc. 0808 (1988 in 113 IST) Call AV hot line at 8-777-0035 Haojin Zhu (http://tdt.sjtu.edu.cn/~hjzhu/ ) [Adapted from Computer Organization and Design, 4th Edition, Patterson & Hennessy, © 2012, MK]
Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number of instruction formats opcode always the first 6 bits Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes Make the common case fast arithmetic operands from the register file (load-store machine) allow instructions to contain immediate operands Good design demands good compromises three instruction formats
Specifying Branch Destinations Use a register (like in lw and sw) added to the 16-bit offset which register? Instruction Address Register (the PC) its use is automatically implied by instruction PC gets updated (PC+4) during the fetch cycle so that it holds the address of the next instruction limits the branch distance to -215 to +215-1 (word) instructions from the (instruction after the) branch instruction, but most branches are local anyway Note that two low order 0’s are concatenated to the 16 bit field giving an eighteen bit address 0111111111111111 00 = 2**15 –1 words (2**17 – 1 bytes) PC Add 32 offset 16 00 sign-extend from the low order 16 bits of the branch instruction branch dst address ? 4
Other Control Flow Instructions MIPS also has an unconditional branch instruction or jump instruction: j label #go to label Instruction Format (J Format): 0x02 26-bit address If-then-else code compilation PC 4 32 26 00 from the low order 26 bits of the jump instruction Why shift left by two bits?
Review: MIPS Addressing Modes Illustrated 1. Register addressing op rs rt rd funct Register word operand op rs rt offset 2. Base (displacement) addressing base register Memory word or byte operand 3. Immediate addressing op rs rt operand 4. PC-relative addressing op rs rt offset Program Counter (PC) Memory branch destination instruction 5. Pseudo-direct addressing op jump address Program Counter (PC) Memory jump destination instruction ||
Number Representations 32-bit signed numbers (2’s complement): 0000 0000 0000 0000 0000 0000 0000 0000two = 0ten 0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten ... 0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten 0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten 1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten 1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten ... 1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten 1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten MSB LSB maxint minint Converting <32-bit values into 32-bit values copy the most significant bit (the sign bit) into the “empty” bits 0010 -> 0000 0010 1010 -> 1111 1010 sign extend versus zero extend (lb vs. lbu)
MIPS Arithmetic Logic Unit (ALU) 32 m (operation) result A B ALU 4 zero ovf 1 Must support the Arithmetic/Logic operations of the ISA add, addi, addiu, addu sub, subu mult, multu, div, divu sqrt and, andi, nor, or, ori, xor, xori beq, bne, slt, slti, sltiu, sltu With special handling for sign extend – addi, addiu, slti, sltiu zero extend – andi, ori, xori overflow detection – add, addi, sub
Result indicating overflow Dealing with Overflow Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur Operation Operand A Operand B Result indicating overflow A + B ≥ 0 < 0 A - B MIPS signals overflow with an exception (aka interrupt) – an unscheduled procedure call where the EPC contains the address of the instruction that caused the exception
Addition & Subtraction Just like in grade school (carry/borrow 1s) 0111 0111 0110 + 0110 - 0110 - 0101 Two's complement operations are easy do subtraction by negating and then adding 0111 0111 - 0110 + 1010 Overflow (result too large for finite computer word) e.g., adding two n-bit numbers does not yield an n-bit number 0111 + 0001 1101 0001 0001 for lecture 0001 1 0001 1000
Building a 1-bit Binary Adder carry_in A B carry_in carry_out S 1 A 1 bit Full Adder S B carry_out S = A xor B xor carry_in carry_out = A&B | A&carry_in | B&carry_in (majority function) How can we use it to build a 32-bit adder? How can we modify it easily to build an adder/subtractor?
Building 32-bit Adder 1-bit FA A0 B0 S0 c0=carry_in c1 Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . . 1-bit FA A1 B1 S1 c2 1-bit FA A2 B2 S2 c3 Ripple Carry Adder (RCA) advantage: simple logic, so small (low cost) disadvantage: slow and lots of glitching (so lots of energy consumption) c32=carry_out 1-bit FA A31 B31 S31 c31 . . .
A 32-bit Ripple Carry Adder/Subtractor add/sub 1-bit FA S0 c0=carry_in c1 S1 c2 S2 c3 c32=carry_out S31 c31 . . . A0 A1 A2 A31 Remember 2’s complement is just complement all the bits add a 1 in the least significant bit B0 control (0=add,1=sub) B0 if control = 0 !B0 if control = 1 For lecture A 0111 0111 B - 0110 + 1001 1 0001 1 0001
Overflow Detection Logic Carry into MSB ! = Carry out of MSB For a N-bit ALU: Overflow = CarryIn [N-1] XOR CarryOut [N-1] A0 B0 1-bit ALU Result0 CarryIn0 CarryOut0 A1 B1 Result1 CarryIn1 CarryOut1 A2 B2 Result2 CarryIn2 CarryOut2 A3 B3 Result3 CarryIn3 CarryOut3 X Y X XOR Y 1 1 1 1 Recall the XOR gate implements the not equal function: that is, its output is 1 only if the inputs have different values. Therefore all we need to do is connect the carry into the most significant bit and the carry out of the most significant bit to the XOR gate. Then the output of the XOR gate will give us the Overflow signal. +1 = 42 min. (Y:22) 1 1 why? Overflow
Multiply Binary multiplication is just a bunch of right shifts and adds n multiplicand multiplier partial product array can be formed in parallel and added in parallel for faster multiplication n double precision product 2n
More complicated than addition Multiplication More complicated than addition Can be accomplished via shifting and adding 0010 (multiplicand) x_1011 (multiplier) 0010 0010 (partial product 0000 array) 0010 00010110 (product) In every step multiplicand is shifted next bit of multiplier is examined (also a shifting step) if this bit is 1, shifted multiplicand is added to the product
Multiplication Algorithm 1 In every step multiplicand is shifted next bit of multiplier is examined (also a shifting step) if this bit is 1, shifted multiplicand is added to the product
Comments on Multiplicand Algorithm 1 Performance Three basic steps for each bit It requires 100 clock cycles to multiply two 32-bit numbers If each step took a clock cycle, How to improve it? Motivation (Performing the operations in parallel): Putting multiplier and the product together Shift them together
Refined Multiplicand Algorithm 2 add 32-bit ALU shift right product multiplier Control 32-bit ALU and multiplicand is untouched the sum keeps shifting right at every step, number of bits in product + multiplier = 64, hence, they share a single 64-bit register
Add and Right Shift Multiplier Hardware 0 1 1 0 = 6 multiplicand add 32-bit ALU shift right product For lecture multiplier Control 0 0 0 0 0 1 0 1 = 5 add 0 1 1 0 0 1 0 1 0 0 1 1 0 0 1 0 add 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 add 0 1 1 1 1 0 0 1 0 0 1 1 1 1 0 0 add 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 0 = 30
Exercise Using 4-bit numbers to save space, multiply 2ten*3ten, or 0010two * 0011two
dividend = quotient x divisor + remainder Division Division is just a bunch of quotient digit guesses and left shifts and subtracts dividend = quotient x divisor + remainder n n quotient dividend divisor partial remainder array remainder n
Division 1001ten Quotient Divisor 1000ten | 1001010ten Dividend -1000 10ten Remainder At every step, shift divisor right and compare it with current dividend if divisor is larger, shift 0 as the next bit of the quotient if divisor is smaller, subtract to get new dividend and shift 1 as the next bit of the quotient
First Version of Hardware for Division A comparison requires a subtract; the sign of the result is examined; if the result is negative, the divisor must be added back
Divide Algorithm 3. Shift the Divisor register right1 bit. Start 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Remainder >=0 Remainder < 0 Test Remainder 2b. Restore the original value by adding the Divisor reg to the Remainder reg and place the sum in the Remainder reg. Also shift the Quotient register to the left, setting the new LSB to 0 2a. Shift the Quotient register to the left Q: 0000 D: 0010 0000 R: 0000 0111 –D = 1110 0000 1: R = R–D Q: 0000 D: 0010 0000 R: 1110 0111 2b: +D, sl Q, 0 Q: 0000 D: 0010 0000 R: 0000 0111 3: Shr D Q: 0000 D: 0001 0000 R: 0000 0111 –D = 1111 0000 1: R = R–D Q: 0000 D: 0001 0000 R: 1111 0111 2b: +D, sl Q, 0 Q: 0000 D: 0001 0000 R: 0000 0111 3: Shr D Q: 0000 D: 0000 1000 R: 0000 0111 –D = 1111 1000 1: R = R–D Q: 0000 D: 0000 1000 R: 1111 1111 2b: +D, sl Q, 0 Q: 0000 D: 0000 1000 R: 0000 0111 3: Shr D Q: 0000 D: 0000 0100 R: 0000 0111 –D = 1111 1100 1: R = R–D Q: 0000 D: 0000 0100 R: 0000 0011 2a: sl Q, 1 Q: 0001 D: 0000 0100 R: 0000 0011 3: Shr D Q: 0000 D: 0000 0010 R: 0000 0011 –D = 1111 1110 1: R = R–D Q: 0000 D: 0000 0010 R: 0000 0001 2a: sl Q, 1 Q: 0011 D: 0000 0010 R: 0000 0001 3: Shr D Q: 0011 D: 0000 0001 R: 0000 0001 Recommend show 2’s comp of divisor, show lines for subtract divisor and restore remainder Supplement: 1st edition needs 33 iterations while 2nd and 3rd versions don’t. setting the new rightmost bit to 1. 3. Shift the Divisor register right1 bit. No: < 33 repetitions 33rd repetition? Yes: 33 repetitions Done
Divide Example Divide 7ten (0000 0111two) by 2ten (0010two) Iter Step Quot Divisor Remainder Initial values 1 2 3 4 5
Divide Example Divide 7ten (0000 0111two) by 2ten (0010two) Iter Step Quot Divisor Remainder Initial values 0000 0010 0000 0000 0111 1 Rem = Rem – Div Rem < 0 +Div, shift 0 into Q Shift Div right 0001 0000 1110 0111 2 Same steps as 1 0000 1000 1111 0111 3 0000 0100 4 Rem >= 0 shift 1 into Q 0001 0000 0010 0000 0011 5 Same steps as 4 0011 0000 0001
Efficient Division Remainder Quotient Divisor 64-bit ALU Control Shift Right Shift Left Write Control 32 bits 64 bits divisor subtract 32-bit ALU shift left dividend remainder quotient Control
Left Shift and Subtract Division Hardware 0 0 1 0 = 2 divisor subtract 32-bit ALU shift left dividend For lecture remainder quotient Control 0 0 0 0 0 1 1 0 = 6 0 0 0 0 1 1 0 0 sub 1 1 1 0 1 1 0 0 rem neg, so ‘ient bit = 0 0 0 0 0 1 1 0 0 restore remainder 0 0 0 1 1 0 0 0 sub 1 1 1 1 1 1 0 0 rem neg, so ‘ient bit = 0 0 0 0 1 1 0 0 0 restore remainder 0 0 1 1 0 0 0 0 sub 0 0 0 1 0 0 0 1 rem pos, so ‘ient bit = 1 0 0 1 0 0 0 1 0 sub 0 0 0 0 0 0 1 1 rem pos, so ‘ient bit = 1 = 3 with 0 remainder
Restoring Unsigned Integer Division s(0) = z for j = 1 to k if 2 s(j-1) - 2k d > 0 qk-j = 1 s(j) = 2 s(j-1) - 2k d else qk-j = 0 s(j) = 2 s(j-1) K=32, put divisor in the left 32 bit register the remainder shift left by 1 bit No need to restore the remainder in the case of R-D>0, Restore the remainder In the case of R-D<0,
Non-Restoring Unsigned Integer Division s(1) = 2 z - 2k d for j = 2 to k if s(j-1) 0 qk-(j-1) = 1 s(j) = 2 s(j-1) - 2k d else qk-(j-1) = 0 s(j) = 2 s(j-1) + 2k d end for if s(k) 0 q0 = 1 q0 = 0 Correction step If in the last step, remainder –divisor >0, Perform subtraction why? If in the last step, remainder –divisor <0, Perform addition
s(1) = 2 z - 2k d s(0) = z for j = 2 to k if s(j-1) 0 for j = 1 to k considering two consequent steps j-1 and j, in particular 2s(j-2) - 2k d <0 s(1) = 2 z - 2k d for j = 2 to k if s(j-1) 0 qk-(j-1) = 1 s(j) = 2 s(j-1) - 2k d else qk-(j-1) = 0 s(j) = 2 s(j-1) + 2k d end for if s(k) 0 q0 = 1 q0 = 0 Correction step s(0) = z for j = 1 to k if 2 s(j-1) - 2k d > 0 qk-j = 1 s(j) = 2 s(j-1) - 2k d else qk-j = 0 s(j) = 2 s(j-1) In the j-1 step, Restoring Algorithm computes qk-j = 0 s(j-1) = 2 s(j-2) equal In the subsequent j step, Restoring Algorithm computes 2 s(j-1) - 2k d == 2*2 s(j-2) - 2k d Why? Non-Restoring Algorithm s(j-1) = 2 s(j-2) - 2k d Restoring Unsigned Integer Division In the subsequent j step, non-Restoring Algorithm computes 2 s(j-1) + 2k d = 2*2 s(j-2) - 2*2k d +2k d = 2*2 s(j-2) - 2k d Non-Restoring Unsigned Integer Division 2x-y= 2(x-y)+y
Non-restoring algorithm set subtract_bit true 1: If subtract bit true: Subtract the Divisor register from the Remainder and place the result in the remainder register else Add the Divisor register to the Remainder and place the result in the remainder register 2:If Remainder >= 0 Shift the Quotient register to the left, setting rightmost bit to 1 Set subtract bit to false 3: Shift the Divisor register right 1 bit if < 33rd rep goto 1 Add Divisor register to remainder and place in Remainder register exit
Example: Perform n + 1 iterations for n bits Remainder 0000 1011 Divisor 00110000 ----------------------------------- Iteration 1: (subtract) Rem 1101 1011 Quotient 0 Divisor 0001 1000 Iteration 2: (add) Rem 11110011 Q00 Divisor 0000 1100 Iteration 3: Rem 11111111 Q000 Divisor 0000 0110 ----------------------------------- Iteration 4: (add) Rem 0000 0101 Q0001 Divisor 0000 0011 Iteration 5: (subtract) Rem 0000 0010 Q 00011 Divisor 0000 0001 Since reminder is positive, done. Q = 0011 and Rem = 0010
Exercise Calculate A divided by B using restoring and non-restoring division. A=26, B=5
MIPS Divide Instruction Divide (div and divu) generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file 0 16 17 0 0 0x1A Seems odd to me that the machine doesn’t support a double precision dividend in hi || lo but it looks like it doesn’t As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0.
Lecture 1