CS 314 Computer Organization Fall Chapter 3: Arithmetic for Computers

CS 314 Computer Organization Fall 2017 Chapter 3: Arithmetic for Computers
Combinations to AV system, etc (1988 in 113 IST) Call AV hot line at Haojin Zhu ( ) [Adapted from Computer Organization and Design, 4th Edition, Patterson & Hennessy, © 2012, MK]

Review: MIPS (RISC) Design Principles
Simplicity favors regularity fixed size instructions small number of instruction formats opcode always the first 6 bits Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes Make the common case fast arithmetic operands from the register file (load-store machine) allow instructions to contain immediate operands Good design demands good compromises three instruction formats

Specifying Branch Destinations
Use a register (like in lw and sw) added to the 16-bit offset which register? Instruction Address Register (the PC) its use is automatically implied by instruction PC gets updated (PC+4) during the fetch cycle so that it holds the address of the next instruction limits the branch distance to -215 to (word) instructions from the (instruction after the) branch instruction, but most branches are local anyway Note that two low order 0’s are concatenated to the 16 bit field giving an eighteen bit address = 2**15 –1 words (2**17 – 1 bytes) PC Add 32 offset 16 00 sign-extend from the low order 16 bits of the branch instruction branch dst address ? 4

Other Control Flow Instructions
MIPS also has an unconditional branch instruction or jump instruction: j label #go to label Instruction Format (J Format): 0x bit address If-then-else code compilation PC 4 32 26 00 from the low order 26 bits of the jump instruction Why shift left by two bits?

Review: MIPS Addressing Modes Illustrated
1. Register addressing op rs rt rd funct Register word operand op rs rt offset 2. Base (displacement) addressing base register Memory word or byte operand 3. Immediate addressing op rs rt operand 4. PC-relative addressing op rs rt offset Program Counter (PC) Memory branch destination instruction 5. Pseudo-direct addressing op jump address Program Counter (PC) Memory jump destination instruction ||

Number Representations
32-bit signed numbers (2’s complement): two = 0ten two = + 1ten ... two = + 2,147,483,646ten two = + 2,147,483,647ten two = – 2,147,483,648ten two = – 2,147,483,647ten ... two = – 2ten two = – 1ten MSB LSB maxint minint Converting <32-bit values into 32-bit values copy the most significant bit (the sign bit) into the “empty” bits > > sign extend versus zero extend (lb vs. lbu)

MIPS Arithmetic Logic Unit (ALU)
32 m (operation) result A B ALU 4 zero ovf 1 Must support the Arithmetic/Logic operations of the ISA add, addi, addiu, addu sub, subu mult, multu, div, divu sqrt and, andi, nor, or, ori, xor, xori beq, bne, slt, slti, sltiu, sltu With special handling for sign extend – addi, addiu, slti, sltiu zero extend – andi, ori, xori overflow detection – add, addi, sub

Result indicating overflow
Dealing with Overflow Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur Operation Operand A Operand B Result indicating overflow A + B ≥ 0 < 0 A - B MIPS signals overflow with an exception (aka interrupt) – an unscheduled procedure call where the EPC contains the address of the instruction that caused the exception

Addition & Subtraction
Just like in grade school (carry/borrow 1s) 0101 Two's complement operations are easy do subtraction by negating and then adding   + 1010 Overflow (result too large for finite computer word) e.g., adding two n-bit numbers does not yield an n-bit number 0001 for lecture 1000

Building a 1-bit Binary Adder
carry_in A B carry_in carry_out S 1 A 1 bit Full Adder S B carry_out S = A xor B xor carry_in carry_out = A&B | A&carry_in | B&carry_in (majority function) How can we use it to build a 32-bit adder? How can we modify it easily to build an adder/subtractor?

Building 32-bit Adder 1-bit FA A0 B0 S0 c0=carry_in c1 Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . . 1-bit FA A1 B1 S1 c2 1-bit FA A2 B2 S2 c3 Ripple Carry Adder (RCA) advantage: simple logic, so small (low cost) disadvantage: slow and lots of glitching (so lots of energy consumption) c32=carry_out 1-bit FA A31 B31 S31 c31 . . .

A 32-bit Ripple Carry Adder/Subtractor
add/sub 1-bit FA S0 c0=carry_in c1 S1 c2 S2 c3 c32=carry_out S31 c31 . . . A0 A1 A2 A31 Remember 2’s complement is just complement all the bits add a 1 in the least significant bit B0 control (0=add,1=sub) B0 if control = 0 !B0 if control = 1 For lecture A  B  + 1001 1 0001 1 0001

Overflow Detection Logic
Carry into MSB ! = Carry out of MSB For a N-bit ALU: Overflow = CarryIn [N-1] XOR CarryOut [N-1] A0 B0 1-bit ALU Result0 CarryIn0 CarryOut0 A1 B1 Result1 CarryIn1 CarryOut1 A2 B2 Result2 CarryIn2 CarryOut2 A3 B3 Result3 CarryIn3 CarryOut3 X Y X XOR Y 1 1 1 1 Recall the XOR gate implements the not equal function: that is, its output is 1 only if the inputs have different values. Therefore all we need to do is connect the carry into the most significant bit and the carry out of the most significant bit to the XOR gate. Then the output of the XOR gate will give us the Overflow signal. +1 = 42 min. (Y:22) 1 1 why? Overflow

Multiply Binary multiplication is just a bunch of right shifts and adds n multiplicand multiplier partial product array can be formed in parallel and added in parallel for faster multiplication n double precision product 2n

More complicated than addition
Multiplication More complicated than addition Can be accomplished via shifting and adding (multiplicand) x_ (multiplier) (partial product array) (product) In every step multiplicand is shifted next bit of multiplier is examined (also a shifting step) if this bit is 1, shifted multiplicand is added to the product

Multiplication Algorithm 1
In every step multiplicand is shifted next bit of multiplier is examined (also a shifting step) if this bit is 1, shifted multiplicand is added to the product

Comments on Multiplicand Algorithm 1
Performance Three basic steps for each bit It requires 100 clock cycles to multiply two 32-bit numbers If each step took a clock cycle, How to improve it? Motivation (Performing the operations in parallel): Putting multiplier and the product together Shift them together

Refined Multiplicand Algorithm 2
add 32-bit ALU shift right product multiplier Control 32-bit ALU and multiplicand is untouched the sum keeps shifting right at every step, number of bits in product + multiplier = 64, hence, they share a single 64-bit register

Add and Right Shift Multiplier Hardware
= 6 multiplicand add 32-bit ALU shift right product For lecture multiplier Control = 5 add add add add = 30

Exercise Using 4-bit numbers to save space, multiply 2ten*3ten, or 0010two * 0011two

dividend = quotient x divisor + remainder
Division Division is just a bunch of quotient digit guesses and left shifts and subtracts dividend = quotient x divisor + remainder n n quotient dividend divisor partial remainder array remainder n

Division 1001ten Quotient Divisor 1000ten | 1001010ten Dividend -1000
10ten Remainder At every step, shift divisor right and compare it with current dividend if divisor is larger, shift 0 as the next bit of the quotient if divisor is smaller, subtract to get new dividend and shift 1 as the next bit of the quotient

First Version of Hardware for Division
A comparison requires a subtract; the sign of the result is examined; if the result is negative, the divisor must be added back

Divide Algorithm 3. Shift the Divisor register right1 bit. Start
1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Remainder >=0 Remainder < 0 Test Remainder 2b. Restore the original value by adding the Divisor reg to the Remainder reg and place the sum in the Remainder reg. Also shift the Quotient register to the left, setting the new LSB to 0 2a. Shift the Quotient register to the left Q: 0000 D: R: –D = 1: R = R–D Q: 0000 D: R: 2b: +D, sl Q, 0 Q: 0000 D: R: 3: Shr D Q: 0000 D: R: –D = 1: R = R–D Q: 0000 D: R: 2b: +D, sl Q, 0 Q: 0000 D: R: 3: Shr D Q: 0000 D: R: –D = 1: R = R–D Q: 0000 D: R: 2b: +D, sl Q, 0 Q: 0000 D: R: 3: Shr D Q: 0000 D: R: –D = 1: R = R–D Q: 0000 D: R: 2a: sl Q, Q: 0001 D: R: 3: Shr D Q: 0000 D: R: –D = 1: R = R–D Q: 0000 D: R: 2a: sl Q, Q: 0011 D: R: 3: Shr D Q: 0011 D: R: Recommend show 2’s comp of divisor, show lines for subtract divisor and restore remainder Supplement: 1st edition needs 33 iterations while 2nd and 3rd versions don’t. setting the new rightmost bit to 1. 3. Shift the Divisor register right1 bit. No: < 33 repetitions 33rd repetition? Yes: 33 repetitions Done

Divide Example Divide 7ten (0000 0111two) by 2ten (0010two) Iter Step
Quot Divisor Remainder Initial values 1 2 3 4 5

Divide Example Divide 7ten (0000 0111two) by 2ten (0010two) Iter Step
Quot Divisor Remainder Initial values 0000 1 Rem = Rem – Div Rem < 0  +Div, shift 0 into Q Shift Div right 2 Same steps as 1 3 4 Rem >= 0  shift 1 into Q 0001 5 Same steps as 4 0011

Efficient Division Remainder Quotient Divisor 64-bit ALU Control
Shift Right Shift Left Write Control 32 bits 64 bits divisor subtract 32-bit ALU shift left dividend remainder quotient Control

Left Shift and Subtract Division Hardware
= 2 divisor subtract 32-bit ALU shift left dividend For lecture remainder quotient Control = 6 sub rem neg, so ‘ient bit = 0 restore remainder sub rem neg, so ‘ient bit = 0 restore remainder sub rem pos, so ‘ient bit = 1 sub rem pos, so ‘ient bit = 1 = 3 with 0 remainder

Restoring Unsigned Integer Division
s(0) = z for j = 1 to k if 2 s(j-1) - 2k d > 0 qk-j = 1 s(j) = 2 s(j-1) - 2k d else qk-j = 0 s(j) = 2 s(j-1) K=32, put divisor in the left 32 bit register the remainder shift left by 1 bit No need to restore the remainder in the case of R-D>0, Restore the remainder In the case of R-D<0,

Non-Restoring Unsigned Integer Division
s(1) = 2 z - 2k d for j = 2 to k if s(j-1)  0 qk-(j-1) = 1 s(j) = 2 s(j-1) - 2k d else qk-(j-1) = 0 s(j) = 2 s(j-1) + 2k d end for if s(k)  0 q0 = 1 q0 = 0 Correction step If in the last step, remainder –divisor >0, Perform subtraction why? If in the last step, remainder –divisor <0, Perform addition

s(1) = 2 z - 2k d s(0) = z for j = 2 to k if s(j-1)  0 for j = 1 to k
considering two consequent steps j-1 and j, in particular 2s(j-2) - 2k d <0 s(1) = 2 z - 2k d for j = 2 to k if s(j-1)  0 qk-(j-1) = 1 s(j) = 2 s(j-1) - 2k d else qk-(j-1) = 0 s(j) = 2 s(j-1) + 2k d end for if s(k)  0 q0 = 1 q0 = 0 Correction step s(0) = z for j = 1 to k if 2 s(j-1) - 2k d > 0 qk-j = 1 s(j) = 2 s(j-1) - 2k d else qk-j = 0 s(j) = 2 s(j-1) In the j-1 step, Restoring Algorithm computes qk-j = 0 s(j-1) = 2 s(j-2) equal In the subsequent j step, Restoring Algorithm computes 2 s(j-1) - 2k d == 2*2 s(j-2) - 2k d Why? Non-Restoring Algorithm s(j-1) = 2 s(j-2) - 2k d Restoring Unsigned Integer Division In the subsequent j step, non-Restoring Algorithm computes 2 s(j-1) + 2k d = 2*2 s(j-2) - 2*2k d +2k d = 2*2 s(j-2) - 2k d Non-Restoring Unsigned Integer Division 2x-y= 2(x-y)+y

Non-restoring algorithm
set subtract_bit true 1: If subtract bit true: Subtract the Divisor register from the Remainder and place the result in the remainder register else Add the Divisor register to the Remainder and place the result in the remainder register 2:If Remainder >= 0 Shift the Quotient register to the left, setting rightmost bit to 1 Set subtract bit to false 3: Shift the Divisor register right 1 bit if < 33rd rep goto 1 Add Divisor register to remainder and place in Remainder register exit

Example: Perform n + 1 iterations for n bits
Remainder Divisor Iteration 1: (subtract) Rem Quotient 0 Divisor Iteration 2: (add) Rem Q00 Divisor Iteration 3: Rem Q000 Divisor Iteration 4: (add) Rem Q0001 Divisor Iteration 5: (subtract) Rem Q 00011 Divisor Since reminder is positive, done. Q = 0011 and Rem = 0010

Exercise Calculate A divided by B using restoring and non-restoring division. A=26, B=5

MIPS Divide Instruction
Divide (div and divu) generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file x1A Seems odd to me that the machine doesn’t support a double precision dividend in hi || lo but it looks like it doesn’t As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0.

Lecture 1

CS 314 Computer Organization Fall Chapter 3: Arithmetic for Computers

Similar presentations

Presentation on theme: "CS 314 Computer Organization Fall Chapter 3: Arithmetic for Computers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 314 Computer Organization Fall Chapter 3: Arithmetic for Computers

Similar presentations

Presentation on theme: "CS 314 Computer Organization Fall Chapter 3: Arithmetic for Computers"— Presentation transcript:

Similar presentations

About project

Feedback