Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

Computer Organization & Architecture Lecture 03: MIPS Arithmetic Review
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, UCB] Other handouts HW#1 To handout next time

Review: MIPS Organization
Processor Memory 1…1100 src1 addr Register File src1 data 5 32 src2 addr 32 registers ($zero - $ra) 5 dst addr read/write addr 5 src2 data write data 32 230 words 32 32 32 bits branch offset read data 32 Add PC 32 32 32 32 Add 32 4 write data 0…1100 Fetch PC = PC+4 Decode Exec 32 0…1000 32 4 5 6 7 0…0100 32 ALU 1 2 3 0…0000 32 word address (binary) 32 bits 32 byte address (big Endian)

Review: MIPS Addressing Modes
1. Operand: Register addressing op rs rt rd funct Register word operand op rs rt offset 2. Operand: Base addressing base register Memory word or byte operand 3. Operand: Immediate addressing op rs rt operand 4. Instruction: PC-relative addressing op rs rt offset Memory branch destination instruction Program Counter (PC) 5. Instruction: Pseudo-direct addressing Memory op jump address || jump destination instruction Program Counter (PC)

Because ease of use is the purpose, this ratio of function to conceptual complexity is the ultimate test of system design. Neither function alone nor simplicity alone defines a good design. The Mythical Man-Month, Brooks, pg. 43

MIPS Number Representations
32-bit signed numbers (2’s complement): two = 0ten two = + 1ten ... two = + 2,147,483,646ten two = + 2,147,483,647ten two = – 2,147,483,648ten two = – 2,147,483,647ten ... two = – 2ten two = – 1ten MSB LSB maxint minint Converting <32-bit values into 32-bit values copy the most significant bit (the sign bit) into the “empty” bits > > sign extend versus zero extend (lb vs. lbu)

MIPS Arithmetic Logic Unit (ALU)
32 m (operation) result A B ALU 4 zero ovf 1 Must support the Arithmetic/Logic operations of the ISA add, addi, addiu, addu sub, subu, neg mult, multu, div, divu sqrt and, andi, nor, or, ori, xor, xori beq, bne, slt, slti, sltiu, sltu With special handling for sign extend – addi, addiu andi, ori, xori, slti, sltiu zero extend – lbu, addiu, sltiu no overflow detected – addu, addiu, subu, multu, divu, sltiu, sltu

Review: 2’s Complement Binary Representation
2’sc binary decimal 1000 -8 1001 -7 1010 -6 1011 -5 1100 -4 1101 -3 1110 -2 1111 -1 0000 0001 1 0010 2 0011 3 0100 4 0101 5 0110 6 0111 7 -23 = -(23 - 1) = Negate 1011 and add a 1 1010 complement all the bits Signed numbers – assuming 2’s complement representation – for eight bits Note that msb of 1 indicates a negative number, the bit string all 0’s is zero, that the largest positive number that can be represented is 7 (2**(n-1) –1 for n bits) whereas the largest negative number that can be represented is –8 (-2**(n-1) for n bits) Note: negate and invert are different! =

Review: A Full Adder How can we use it to build a 32-bit adder?
carry_in A B carry_in carry_out S 1 A 1-bit Full Adder S B carry_out Tip: Do you know how to write Boolean equation from truth table? S = A  B  carry_in (odd parity function) carry_out = A&B | A&carry_in | B&carry_in (majority function) How can we use it to build a 32-bit adder? How can we modify it easily to build an adder/subtractor?

Different Implementations
Not easy to decide the “best” way to build something Don't want too many inputs to a single gate Don’t want to have to go through too many gates for our purposes, ease of comprehension is important Let's look at a 1-bit ALU for addition: How could we build a 1-bit ALU for add, and, and or? cout = a b + a cin + b cin sum = a xor b xor cin AND OR

A 32-bit Ripple Carry Adder/Subtractor
add/sub 1-bit FA S0 c0=carry_in c1 S1 c2 S2 c3 c32=carry_out S31 c31 . . . A0 A1 A2 A31 Remember 2’s complement is just complement all the bits add a 1 in the least significant bit B0 control (0=add,1=sub) B0 if control = 0, !B0 if control = 1 A  B  + For class handout

A 32-bit Ripple Carry Adder/Subtractor
add/sub 1-bit FA S0 c0=carry_in c1 S1 c2 S2 c3 c32=carry_out S31 c31 . . . A0 A1 A2 A31 Remember 2’s complement is just complement all the bits add a 1 in the least significant bit B0 control (0=add,1=sub) B0 if control = 0, !B0 if control = 1 XOR with 1 1 XOR B = (NEG (B)) 0 XOR B = B A  B  + XOR 1001 For lecture 1 0001 1 0001

Serial Adder 1101 0101 Full Adder D flip flop (delay flip flop)
Flowing from right to left D flip flop (delay flip flop) D Flip Flop

Overflow Detection Overflow: the result is too large to represent in 32 bits Overflow occurs when adding two positives yields a negative (7+7) or, adding two negatives gives a positive (-7) + (-7) or, subtract a negative from a positive gives a negative (7+ (-7)) or, subtract a positive from a negative gives a positive (-7)- 7 On your own: Prove you can detect overflow by: Carry into MSB XOR Carry out of MSB, ex for 4 bit signed numbers For class handout 1 + 7 3 1 + –4 – 5

Overflow Detection Overflow: the result is too large to represent in 32 bits Overflow occurs when adding two positives yields a negative (7+7) or, adding two negatives gives a positive –7 + (-7) or, subtract a negative from a positive gives a negative 7 – (-7) or, subtract a positive from a negative gives a positive (-7 – 7) On your own: Prove you can detect overflow by: Carry into MSB xor Carry out of MSB, ex for 4 bit signed numbers 1 1 1 1 1 1 For lecture Recalled from some earlier slides that the biggest positive number you can represent using 4-bit is 7 and the smallest negative you can represent is negative 8. So any time your addition results in a number bigger than 7 or less than negative 8, you have an overflow. Keep in mind is that whenever you try to add two numbers together that have different signs, that is adding a negative number to a positive number, overflow can NOT occur. Overflow occurs when you to add two positive numbers together and the sum has a negative sign. Or, when you try to add negative numbers together and the sum has a positive sign. If you spend some time, you can convince yourself that If the Carry into the most significant bit is NOT the same as the Carry coming out of the MSB, you have a overflow. 1 + 7 3 1 + –4 – 5 – 6 1 1 7

Tailoring the ALU to the MIPS ISA
Need to support the logic operation (and,nor,or,xor) Bit wise operations (no carry operation involved) Need a logic gate for each function, mux to choose the output Need to support the set-on-less-than instruction (slt) Use subtraction to determine if (a – b) < 0 (implies a < b) Copy the sign bit into the low order bit of the result, set remaining result bits to 0 Need to support test for equality (bne, beq) Again use subtraction: (a - b) = 0 implies a = b Additional logic to “nor” all result bits together Immediates are sign extended outside the ALU with wiring (i.e., no logic needed) Multiplexer S …. 0000 …. 0000S

Adding a NOR function Can also choose to invert a. How do we get “a NOR b” ? 1 src1 src2 Out Out = src1 if Ctrl=1 Out= src2 if Ctrl=0 Ctrl MUX = ? AND MUX OR NOR = NOT + OR FA INVERT Recall AND , OR : Truth table

Supporting slt (set on less than)
1-bit ALU: AND,OR, addition on A+B,A-B Can we figure out the idea? Intuitively, a-b < 0 Thus we perform a-b and check to see if the result is negative by looking at the MSB bit. Check High bit

Supporting slt all other bits Use this ALU for most significant bit

Expanding to 32 bits Binvert = 1 , needs B invert, otherwise 0
Ainvert = 1 , needs A invert, otherwise 0

Test for equality Notice control lines: = and 001 = or 010 = add 110 = subtract 111 = slt NOR Note: zero is a 1 when the result is zero! 0 OR 0 OR 0 … OR 0 = 0 NOT (0 OR 0 OR 0 … OR 0 ) =1

Problem: ripple carry adder is slow
Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition? two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c1 = b0c0 + a0c0 + a0b0 c2 = b1c1 + a1c1 + a1b1 c2 = c3 = b2c2 + a2c2 + a2b2 c3 = c4 = b3c3 + a3c3 + a3b3 c4 = Not feasible! Why?

Carry-lookahead adder
An approach in-between our two extremes Motivation: If we didn't know the value of carry-in, what could we do? When would we always generate a carry? gi = ai bi When would we propagate the carry? pi = ai + bi Did we get rid of the ripple? c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = g1 + p1 g0 + p1 p0 c0 c3 = g2 + p2c2 c3 = g2 + p2 g1 + p2 p1 c1+p2p1 p0 c0 c4 = g3 + p3c3 c4 = g3 + p3 g2 + p3 p2 g1+p3p2 p1 g0 +p3p2p1 p0 c0 Feasible! Why?

P0 = p3 p2 p1 p0 P1 = p7 p6 p5 p4 P2 = p11 p10 p9 p8 P3 = p15 p14 p13 p12 G0 = g3 + p3 g2 + p3 p2 g1+p3p2 p1 g0 G1 = g7 + p7 g6 + p7 p6 g5+ p7 p6 p5 g4 G2 = g11 + p11 g10 + p11 p10 g9+ p11 p10 p9 g8 G3 = g15 + p15 g14 + p15 p14 g13+ p15 p14 p13 g12 C1 = G0 + P0 c0 C2 = G1 + P1G0 + P1P0 c0 C3 = G2 + P2G1 + P2P1 G0 + P2P1P0 c0 C4 = G3 + P3G2 + P3P2 G1 + P3P2P1 G0 + P3P2P1P0 c0

Same principle to build bigger adders
Can’t build a 16 bit adder this way... (too big) Could use ripple carry of 4-bit CLA adders Better: use the CLA principle again!

Hardware for Addition and Subtraction

Shift Operations Also need operations to pack and unpack 8-bit characters into 32-bit words Shifts move all the bits in a word left or right sll $t2, $s0, 8 #$t2 = $s0 << 8 bits srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits op rs rt rd shamt funct Notice that a 5-bit shamt field is enough to shift a 32- bit value 25 – 1 or 31 bit positions Such shifts are logical because they fill with zeros also have sllv, srlv, and srav

Shift Operations, con’t
An arithmetic shift (sra) maintain the arithmetic correctness of the shifted value (i.e., a number shifted right one bit should be ½ of its original value; a number shifted left should be 2 times its original value) so sra uses the most significant bit (sign bit) as the bit shifted in (1 or 0 for negative and positive) note that there is no need for a sla when using two’s complement number representation sra $t2, $s0, 8 #$t2 = $s0 >> 8 bits Div by 2 Mul by 2 No need to shift left arithmetic The shift operation is implemented by hardware separate from the ALU using a barrel shifter (which would takes lots of gates in discrete logic, but is pretty easy to implement in VLSI)

Multiply Binary multiplication is just a bunch of right shifts and adds n multiplicand multiplier partial product array can be formed in parallel and added in parallel for faster multiplication n double precision product 2n Can you try to mul 7 by 3 in binary for 4 bits?

MIPS Multiply Instruction
Multiply produces a double precision product mult $s0, $s1 # hi||lo = $s0 * $s1 Low-order word of the product is left in processor register lo and the high-order word is left in register hi Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file op rs rt rd shamt funct Multiplies are done by fast, dedicated hardware and are much more complex (and slower) than adders Hardware dividers are even more complex and even slower; ditto for hardware square root multu – does multiply unsigned Both multiplies ignore overflow, so its up to the software to check to see if the product is too big to fit into 32 bits. There is no overflow if hi is 0 for multu or the replicated sign of lo for mult.

Multiplication Example
Multiplicand (11 dec) x Multiplier (13 dec) Partial products Note: if multiplier bit is 1 copy multiplicand (place value) otherwise zero Product (143 dec) Note: need double length result

Unsigned Binary Multiplication

Execution of Example

Signed Binary Multiplication
Multiplicand Multiplier F is the sign. When the number is negative, we insert 1 instead (for 2’s complement)

Multiplication: Implementation
Datapath Control

Multiplication: Implementation -Final Version
Multiplier starts in right half of product What goes here?

Flowchart for Unsigned Binary Multiplication

Multiplying Negative Numbers
This does not work! Solution 1 Convert to positive if required Multiply as above If signs were different, negate answer Solution 2 Booth’s algorithm

Booth’s Algorithm

Example of Booth’s Algorithm

Correctness of Booth Algorithm
For any consecutive bits xi to x i-k-1 The number of 1 = k Since the result We will add when xixi-1= 01 and will subtract when xixi-1 = 10 then, we get Hence, For we have 2k – 1 , where , which is Substitute j=m+i+k, when m=0, j=i-k When m=k-1, j=i-1, then which is X x Y

Array Multiplier For P=X x Y

Division Division is just a bunch of quotient digit guesses and left shifts and subtracts n n quotient dividend divisor partial remainder array remainder n

MIPS Divide Instruction
Divide generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file op rs rt rd shamt funct Seems odd to me that the machine doesn’t support a double precision dividend in hi || lo but it looks like it doesn’t As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0.

Division of Unsigned Binary Integers
Quotient (Q) Divisor (V) 1011 Dividend (D) 1011 001110 Partial Remainders 1011 001111 1011 Remainder (R) 100 Resulting size

Division Restore

Division algorithm Let Q= qn-1 qn-2 …q1 q0 and Ri be the current remainder First , Ri is the dividend D . We shift the divisor V to the right one bit position to try to subtract. This is to test what should be qi If then new Remainder is Else ( cannot subtract)

V= 1012, D = R0 =D by 1012 Result Q is Remainder R is 011 2 Remainder = 3

Instead of shifting the divisor to the right, we shift the remainder to the left
Left shift becomes 2R0 Remainder =011

Division algorithm Q[0] Shift left A[0] Q[7]
If the probability of restroing is ½ of the total additions, total number of addition/substraction is For Divident = n bits and total substraction is n time, then the number of restoring is

Divison Other smart division algorithm Non restoring division
Current Remainder is S.A := S.A –M if S=0 (Sign Bit is 0) we need to set qi = 1 and subtract after Shift to test the next round. If S=1, we need to set qi =0 and add back then Shift Left and test the next round. Hence, the addition is done and follows by the next subtraction. That is When Ri is the current Remainder and Ri+1 is the new Remainder and V is the Divisor Hence, we combine both equations to reduce the total subtractions.

Original Compare to the original one.

Representing Big (and Small) Numbers
What if we want to encode the approx. age of the earth? 4,600,000, or x 109 or the weight in kg of one a.m.u. (atomic mass unit) or x 10-27 There is no way we can encode either of the above in a 32-bit integer. Floating point representation (-1)sign x F x 2E Still have to fit everything in 32 bits (single precision) s E (exponent) F (fraction) 1 bit bits bits Notice that in scientific notation the mantissa is represented in normalized form (no leading zeros) The base (2, not 10) is hardwired in the design of the FPALU More bits in the fraction (F) or the exponent (E) is a trade-off between precision (accuracy of the number) and range (size of the number)

FP Ranges For a 32 bit number Accuracy 8 bit exponent (E)
The effect of changing LSB of mantissa 23 bit mantissa 2-23  1.2 x 10-7 About 6 decimal places

Expressible Numbers

Problems with Early FP Representation differences exponent base
exponent and mantissa field sizes normalized form differences some examples shortly • Rounding Modes • Exceptions this is probably the key motivator of the standard underflow and overflow are common in FP calculations »taking exceptions has lots of issues • OS’s deal with them differently • huge delays incurred due to context switches or value checks • e.g. check for -1 before doing a sqrt operation

IEEE 754 FP Standard Encoding
(similar to DEC, very much Intel) Most (all?) computers these days conform to the IEEE 754 floating point standard (-1)sign x (1+F) x 2E-bias Formats for both single and double precision F is stored in normalized form where the MSB in the fraction is 1 (so there is no need to store it!) – called the hidden bit To simplify sorting FP numbers, E comes before F in the word and E is represented in excess (biased) notation (Bias 127) 1 .xxx Single Precision Double Precision Object Represented E (8) F (23) E (11) F (52) true zero (0) nonzero ± denormalized number ± 1-254 anything ± ± floating point number ± 255 ± 2047 ± infinity 255 2047 not a number (NaN) Book distinguishes between the representation with the hidden bit there – significand (the 24 bit version) , and the one with the hidden bit removed – fraction (the 23 bit version) Excess -7 or Bias -7 is ….

Biased Representation (IEEE FP Standard)
(-1)sign x (1+F) x 2E-bias The ‘bias’ 127 represents 0 128 to 255 represent positive exponents 1 to 127 represent negative exponents (remember 0 is reserved for the entire number being zero). The actual exponent is therefore: E - bias

Representation of 2 is Bad Example: (1/2) > 2 ???
0.1two = 1.0*2-1 (normalized) S E significand Representation of 2 is 10two = 1.0*21 (normalized) S E significand

Representation of Exponent
Inappropriate to use two’s complement for the exponent Ideally want to represent most negative number, most positive. Number range: ... positive use this for 20 = 127ten negative

Example 1 Represent 0.3125ten = 5/16
5/16 = 1/4 + 1/16 = two = *2-2 S = 0 E = ??? = E-bias = E-127 E = 125ten = two Significand = 010.…000

Example 2 What does S = 0 E = 0111 1101 = 125ten Significand = 1/4
represent? S = 0 E = = 125ten Exponent = E-bias = = -2 Significand = 1/4 (-1)S(1+sig.)2E-bias = (1 + 1/4)*(1/4) = 5/16

Example 3

Example 4

Distribution of Values
6-bit IEEE-like format e = 3 exponent bits f = 2 fraction bits Bias is 3 • Notice how the distribution gets denser toward zero. Zoom

Interesting Number

Rounding modes

Round to Even

Rounding Binary number

Guard Bits and Rounding
Clearly extra fraction bits are required to support rounding modes e.g. given a 23 bit representation of the mantissa in single precision you round to 23 bits for the answer but you’ll need extra bits in order to know what to do Question: How many do you really need? why?

Floating Point Addition
Addition (and subtraction) (F1  2E1) + (F2  2E2) = F3  2E3 Step 1: Restore the hidden bit in F1 and in F2 Step 1: Align fractions by right shifting F2 by E1 - E2 positions (assuming E1  E2) keeping track of (three of) the bits shifted out in a round bit, a guard bit, and a sticky bit Step 2: Add the resulting F2 to F1 to form F3 Step 3: Normalize F3 (so it is in the form 1.XXXXX …) If F1 and F2 have the same sign  F3 [1,4)  1 bit right shift F3 and increment E3 If F1 and F2 have different signs  F3 may require many left shifts each time decrementing E3 Step 4: Round F3 and possibly normalize F3 again Step 5: Rehide the most significant bit of F3 before storing the result Note that smaller significand is the one shifted (while increasing its exponent until its equal to the larger exponent) overflow/underflow can occur during addition and during normalization rounding may lead to overflow/underflow as well

MIPS Floating Point Instructions
MIPS has a separate Floating Point Register File ($f0, $f1, …, $f31) (whose registers are used in pairs for double precision values) with special instructions to load to and store from them lwcl $f1,54($s2) #$f1 = Memory[$s2+54] swcl $f1,58($s4) #Memory[$s4+58] = $f1 And supports IEEE 754 single add.s $f2,$f4,$f6 #$f2 = $f4 + $f6 and double precision operations add.d $f2,$f4,$f6 #$f2||$f3 = $f4||$f5 + $f6||$f7 similarly for sub.s, sub.d, mul.s, mul.d, div.s, div.d

MIPS Floating Point Instructions, Con’t
And floating point single precision comparison operations c.x.s $f2,$f4 #if($f2 < $f4) cond=1; else cond=0 where x may be eq, neq, lt, le, gt, ge and branch operations bclt #if(cond==1) go to PC+4+25 bclf #if(cond==0) go to PC+4+25 And double precision comparison operations c.x.d $f2,$f #$f2||$f3 < $f4||$f cond=1; else cond=0

Floating point addition/subtraction algorithm

Addition Example 0.5 + 2.75 = 3.25 0.1two + 10.11two
1.0* *21 0.010* *21 1.101*21 (already normalised) (1 + (1/2) + (1/8)) * 2 3.25 More example:

Normalization cases

Alignment and Normalization issues

For Effective Addition

For Subtraction

Round to Nearest rnd = G(R+S) + G(R+S)= G(L+R+S) G(R+S) =1

Other Rounding Modes Done about the rounding!

Floating point addition/subtraction data path

Floating point multiplication algorithm
Reading:

Example

Reading: http://www. cs. umd

Next Lecture and Reminders
Addressing and understanding performance Reading assignment – PH, Chapter 4 Try SPIM Yet? Exercise !!

Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

Similar presentations

Presentation on theme: "Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

Similar presentations

Presentation on theme: "Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,"— Presentation transcript:

Similar presentations

About project

Feedback