CS35101 Computer Architecture Spring 2006 Week 8 P Durand (www.cs.kent.edu/~durand) [Adapted from MJI (www.cse.psu.edu/~mji)] [Adapted from Dave Patterson’s.

CS35101 Computer Architecture Spring 2006 Week 8 P Durand (www.cs.kent.edu/~durand) [Adapted from MJI (www.cse.psu.edu/~mji)] [Adapted from Dave Patterson’s UCB CS152 slides]

Head’s Up  This week’s material l MIPS logic and multiply instructions -Reading assignment – PH 3.4 l MIPS ALU design -Reading assignment – PH 3.5  Reminders  Next week’s material l Building a MIPS datapath -Reading assignment – PH 5.1-5.2

Review: MIPS Arithmetic Instructions R-type: I-Type: 3125201550 opRsRtRdfunct opRsRtImmed 16 Typeop funct ADD00100000 ADDU00100001 SUB00100010 SUBU00100011 AND00100100 OR00100101 XOR00100110 NOR00100111 Typeop funct 00101000 00101001 SLT00101010 SLTU00101011 00101100 0add 1addu 2sub 3subu 4and 5or 6xor 7nor aslt bsltu l expand immediates to 32 bits before ALU l 10 operations so can encode in 4 bits 32 m (operation) result A B ALU 4 zeroovf 1 1

Review: A 32-bit Adder/Subtractor 1-bit FA S0S0 c 0 =carry_in c1c1 1-bit FA S1S1 c2c2 S2S2 c3c3 c 32 =carry_out 1-bit FA S 31 c 31...  Built out of 32 full adders (FAs) A0A0 B0B0 A1A1 B1B1 A2A2 B2B2 A 31 B 31 add/subt 1 bit FA A B S carry_in carry_out S = A xor B xor carry_in carry_out = A  B v A  carry_in v B  carry_in (majority function)  Small but slow!

Minimal Implementation of a Full Adder architecture concurrent_behavior of full_adder is signal t1, t2, t3, t4, t5: std_logic; begin t1 <= not A after 1 ns; t2 <= not cin after 1 ns; t4 <= not((A or cin) and B) after 2 ns; t3 <= not((t1 or t2) and (A or cin)) after 2 ns; t5 <= t3 nand B after 2 ns; S <= not((B or t3) and t5) after 2 ns; cout <= not(t1 or t2) and t4) after 2 ns; end concurrent_behavior;  Can you create the equivalent schematic? Can you determine worst case delay (the worst case timing path through the circuit)?  Gate library: inverters, 2-input nands, or-and-inverters

Logic Operations  Logic operations operate on individual bits of the operand. $t2 = 0…0 0000 1101 0000 $t1 = 0…0 0011 1100 0000 and $t0, $t1, $t2$t0 = or $t0, $t1 $t2$t0 = xor $t0, $t1, $t2$t0 = nor $t0, $t1, $t2$t0 =  How do we expand our FA design to handle the logic operations - and, or, xor, nor ?

Logic Operations  Logic operations operate on individual bits of the operand. $t2 = 0…0 0000 1101 0000 $t1 = 0…0 0011 1100 0000 and $t0, $t1, $t2$t0 = or $t0, $t1 $t2$t0 = xor $t0, $t1, $t2$t0 = nor $t0, $t1, $t2$t0 =  How do we expand our FA design to handle the logic operations - and, or, xor, nor ? 0…0 0000 1100 0000 0…0 0011 1101 0000 0…0 0011 0001 0000 1…1 1100 0010 1111

A Simple ALU Cell 1-bit FA carry_in carry_out A B add/subt result op

An Alternative ALU Cell 1-bit FA carry_in s1 s2 s0 result carry_out A B

The Alternative ALU Cell’s Control Codes s2s1s0c_inresultfunction 0000Atransfer A 0001A + 1increment A 0010A + Badd 0011A + B + 1add with carry 0100A – B – 1subt with borrow 0101A – Bsubtract 0110A – 1decrement A 0111Atransfer A 100xA or Bor 101xA xor Bxor 110xA and Band 111x!Acomplement A

 Need to support the set-on-less-than instruction ( slt ) remember: slt is an arithmetic instruction l produces a 1 if rs < rt and 0 otherwise l use subtraction: (a - b) < 0 implies a < b  Need to support test for equality ( beq ) l use subtraction: (a - b) = 0 implies a = b  Need to add the overflow detection hardware Tailoring the ALU to the MIPS ISA

Modifying the ALU Cell for slt 1-bit FA A B result carry_in carry_out add/subtop add/subt less

Modifying the ALU for slt + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less...  First perform a subtraction  Make the result 1 if the subtraction yields a negative result  Make the result 0 if the subtraction yields a positive result

Modifying the ALU for slt 0 0 set  First perform a subtraction  Make the result 1 if the subtraction yields a negative result  Make the result 0 if the subtraction yields a positive result tie the most significant sum bit (sign bit) to the low order less input

Modifying the ALU for Zero + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less... 0 0 set  First perform subtraction  Insert additional logic to detect when all result bits are zero add/subt op

Modifying the ALU for Zero + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less... 0 0 set  First perform subtraction  Insert additional logic to detect when all result bits are zero zero... add/subt op Note zero is a 1 when result is all zeros

Review: Overflow Detection  Overflow: the result is too large to represent in the number of bits allocated  Overflow occurs when l adding two positives yields a negative l or, adding two negatives gives a positive l or, subtract a negative from a positive gives a negative l or, subtract a positive from a negative gives a positive  On your own: Prove you can detect overflow by: l Carry into MSB xor Carry out of MSB 1 1 11 0 1 0 1 1 0 0111 0011+ 7 3 0 1 – 6 1100 1011+ –4 – 5 7 1 0

Modifying the ALU for Overflow + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less... 0 0 set  Modify the most significant cell to determine overflow output setting  Disable overflow bit setting for unsigned arithmetic zero... add/subt op overflow

But What about Performance?  Critical path of n-bit ripple-carry adder is n*CP  Design trick – throw hardware at it (Carry Lookahead) A0 B0 1-bit ALU Result0 CarryIn0 CarryOut0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 A2 B2 1-bit ALU Result2 CarryIn2 CarryOut2 A3 B3 1-bit ALU Result3 CarryIn3 CarryOut3

Shift Operations  Also need operations to pack and unpack 8-bit characters into 32-bit words  Shifts move all the bits in a word left or right sll $t2, $s0, 8 #$t2 = $s0 << 8 bits srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits  Such shifts are logical because they fill with zeros op rs rt rd shamt funct 000000 00000 10000 01010 01000 000000 000000 00000 10000 01010 01000 000010

Shift Operations, con’t  An arithmetic shift ( sra ) maintain the arithmetic correctness of the shifted value (i.e., a number shifted right one bit should be ½ of its original value; a number shifted left should be 2 times its original value) so sra uses the most significant bit (sign bit) as the bit shifted in note that there is no need for a sla when using two’s complement number representation sra $t2, $s0, 8 #$t2 = $s0 >> 8 bits  The shift operation is implemented by hardware (usually a barrel shifter) outside the ALU 000000 00000 10000 01010 01000 000011

Multiply  Binary multiplication is just a bunch of right shifts and adds multiplicand multiplier partial product array double precision product n 2n n can be formed in parallel and added in parallel for faster multiplication

 More complicated than addition accomplished via shifting and addition 0010 (multiplicand) x_1011 (multiplier) 0010 0010 (partial product 0000 array) 0010 00010110 (product)  Double precision product produced  More time and more area to compute Multiplication

 Multiply produces a double precision product mult $s0, $s1 # hi||lo = $s0 * $s1 Low-order word of the product is left in processor register lo and the high-order word is left in register hi Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file MIPS Multiply Instruction op rs rt rd shamt funct  Multiplies are done by fast, dedicated hardware and are much more complex (and slower) than adders  Hardware dividers are even more complex and even slower; ditto for hardware square root

mult $s0, $s1 # hi||lo = $s0 * $s1  Low-order word of the product is left in processor register lo and the high-order word is left in register hi  Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file MIPS Multiply Instruction op rs rt rd shamt funct 000000 10000 10001 00000 00000 011000

Multiplication: Implementation Datapath Control

Final Version What goes here? Multiplier starts in right half of product

Division  Division is just a bunch of quotient digit guesses and left shifts and subtracts dividend divisor partial remainder array quotient n n remainder n 000 0 0 0

 Divide generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file MIPS Divide Instruction  As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0. op rs rt rd shamt funct

Division: Implementation

Division Implementation

Integer Division – Example 1

Improved Division Hardware Note that the divisor register, the alu, and the quotient register are 32 bits wide. The remainder register is still 64 bits. The quotient register is combined with the right half of the remainder register.

Integer Division – Example 3

Floating Point (a brief look)  We need a way to represent l numbers with fractions, e.g., 3.1416 l very small numbers, e.g.,.000000001 very large numbers, e.g., 3.15576  10 9  Representation: sign, exponent, significand: (–1) sign  significand  2 exponent l more bits for significand gives more accuracy l more bits for exponent increases range  IEEE 754 floating point standard: l single precision: 8 bit exponent, 23 bit significand l double precision: 11 bit exponent, 52 bit significand

IEEE 754 floating-point standard  Leading “1” bit of significand is implicit  Exponent is “biased” to make sorting easier l all 0s is smallest exponent all 1s is largest l bias of 127 for single precision and 1023 for double precision summary: (–1) sign  significand)  2 exponent – bias  Example: l decimal: -.75 = - ( ½ + ¼ ) l binary: -.11 = -1.1 x 2 -1 l floating point: exponent = 126 = 01111110 IEEE single precision: 10111111010000000000000000000000

Floating Point Complexities  Operations are somewhat more complicated (see text)  In addition to overflow we can have “underflow”  Accuracy can be a big problem l IEEE 754 keeps two extra bits, guard and round l four rounding modes l positive divided by zero yields “infinity” l zero divide by zero yields “not a number” l other complexities  Implementing the standard can be tricky  Not using the standard can be even worse l see text for description of 80x86 and Pentium bug!

Representing Big (and Small) Numbers  What if we want to encode the approx. age of the earth? 4,600,000,000 or 4.6 x 10 9 or the weight in kg of one a.m.u. (atomic mass unit) 0.0000000000000000000000000166 or 1.6 x 10 -27 There is no way we can encode either of the above in a 32-bit integer.  Floating point representation (-1) sign x F x 2 E l Still have to fit everything in 32 bits (single precision) s E (exponent) F (fraction) 1 bit 8 bits 23 bits l The base (2, not 10) is hardwired in the design of the FPALU l More bits in the fraction (F) or the exponent (E) is a trade-off between precision (accuracy of the number) and range (size of the number)

IEEE 754 FP Standard Encoding  Most (all?) computers these days conform to the IEEE 754 floating point standard (-1) sign x (1+F) x 2 E-bias l Formats for both single and double precision l F is stored in normalized form where the msb in the fraction is 1 (so there is no need to store it!) – called the hidden bit l To simplify sorting FP numbers, E comes before F in the word and E is represented in excess (biased) notation Single PrecisionDouble PrecisionObject Represented E (8)F (23)E (11)F (52) 0000true zero (0) 0nonzero0 ± denormalized number ± 1-254anything± 1-2046anything± floating point number ± 2550± 20470± infinity 255nonzero2047nonzeronot a number (NaN)

Floating Point Addition  Addition (and subtraction) (  F1  2 E1 ) + (  F2  2 E2 ) =  F3  2 E3 l Step 1: Restore the hidden bit in F1 and in F2 l Step 1: Align fractions by right shifting F2 by E1 - E2 positions (assuming E1  E2) keeping track of (three of) the bits shifted out in a round bit, a guard bit, and a sticky bit l Step 2: Add the resulting F2 to F1 to form F3 l Step 3: Normalize F3 (so it is in the form 1.XXXXX …) -If F1 and F2 have the same sign  F3  [1,4)  1 bit right shift F3 and increment E3 -If F1 and F2 have different signs  F3 may require many left shifts each time decrementing E3 l Step 4: Round F3 and possibly normalize F3 again l Step 5: Rehide the most significant bit of F3 before storing the result

CSE331 W08.42Irwin Fall 2001 PSU Floating point addition

MIPS Floating Point Instructions  MIPS has a separate Floating Point Register File ( $f0, $f1, …, $f31 ) (whose registers are used in pairs for double precision values) with special instructions to load to and store from them lwcl $f1,54($s2) #$f1 = Memory[$s2+54] swcl $f1,58($s4) #Memory[$s4+58] = $f1  And supports IEEE 754 single add.s $f2,$f4,$f6 #$f2 = $f4 + $f6 and double precision operations add.d $f2,$f4,$f6 #$f2||$f3 = $f4||$f5 + $f6||$f7 similarly for sub.s, sub.d, mul.s, mul.d, div.s, div.d

MIPS Floating Point Instructions, Con’t  And floating point single precision comparison operations c.x.s $f2,$f4 #if($f2 < $f4) cond=1; else cond=0 where x may be eq, neq, lt, le, gt, ge and branch operations bclt 25 #if(cond==1) go to PC+4+25 bclf 25 #if(cond==0) go to PC+4+25  And double precision comparison operations c.x.d $f2,$f4 #$f2||$f3 < $f4||$f5 cond=1; else cond=0

Control Flow for Floating Point Multiplication

Review: MIPS ISA, so far CategoryInstrOp CodeExampleMeaning Arithmetic (R & I format) add0 and 32add $s1, $s2, $s3$s1 = $s2 + $s3 add unsigned0 and 33addu $s1, $s2, $s3$s1 = $s2 + $s3 subtract0 and 34sub $s1, $s2, $s3$s1 = $s2 - $s3 subt unsigned0 and 35subu $s1, $s2, $s3$s1 = $s2 - $s3 add immediate8addi $s1, $s2, 6$s1 = $s2 + 6 add imm. unsigned9addiu $s1, $s2, 6$s1 = $s2 + 6 multiply0 and 24mult $s1, $s2hi || lo = $s1 * $s2 multiply unsigned0 and 25multu $s1, $s2hi || lo = $s1 * $s2 divide0 and 26div $s1, $s2lo = $s1/$s2, rem. in hi divide unsigned0 and 27divu $s1, $s2lo = $s1/$s2, rem. in hi Logical (R & I format) and0 and 36and $s1, $s2, $s3$s1 = $s2 & $s3 or0 and 37or $s1, $s2, $s3$s1 = $s2 | $s3 xor0 and 38xor $s1, $s2, $s3$s1 = $s2 xor $s3 nor0 and 39nor $s1, $s3, $s3$s1 = !($s2 | $s2) and immediate12andi $s1, $s2, 6$s1 = $s2 & 6 or immediate13ori $s1, $s2, 6$s1 = $s2 | 6 xor immediate14xori $s1, $s2, 6$s1 = $s2 xor 6

Review: MIPS ISA, so far con’t CategoryInstrOp CodeExampleMeaning Shift (R format) sll0 and 0sll $s1, $s2, 4$s1 = $s2 << 4 srl0 and 2srl $s1, $s2, 4$s1 = $s2 >> 4 sra0 and 3sra $s1, $s2, 4$s1 = $s2 >> 4 Data Transfer (I format) load word35lw $s1, 24($s2)$s1 = Memory($s2+24) store word43sw $s1, 24($s2)Memory($s2+24) = $s1 load byte32lb $s1, 25($s2)$s1 = Memory($s2+25) load byte unsigned36lbu $s1, 25($s2)$s1 = Memory($s2+25) store byte40sb $s1, 25($s2)Memory($s2+25) = $s1 load upper imm15lui $s1, 6$s1 = 6 * 2 16 move from hi0 and 16mfhi $s1$s1 = hi move to hi0 and 17mthi $s1hi = $s1 move from lo0 and 18mflo $s1$s1 = lo move to lo0 and 19mtlo $s1lo = $s1

Review: MIPS ISA, so far con’t CategoryInstrOp CodeExampleMeaning Cond. Branch (I & R format) br on equal4beq $s1, $s2, Lif ($s1==$s2) go to L br on not equal5bne $s1, $s2, Lif ($s1 !=$s2) go to L set on less than0 and 42slt $s1, $s2, $s3if ($s2<$s3) $s1=1 else $s1=0 set on less than unsigned 0 and 43sltu $s1, $s2, $s3 if ($s2<$s3) $s1=1 else $s1=0 set on less than immediate 10slti $s1, $s2, 6if ($s2<6) $s1=1 else $s1=0 set on less than imm. unsigned 11sltiu $s1, $s2, 6if ($s2<6) $s1=1 else $s1=0 Uncond. Jump (J & R format) jump2j 2500go to 10000 jump and link3jal 2500go to 10000; $ra=PC+4 jump register0 and 8jr $s1go to $s1 jump and link reg0 and 9jalr $s1, $s2go to $s1, $s2=PC+4

CS35101 Computer Architecture Spring 2006 Week 8 P Durand (www.cs.kent.edu/~durand) [Adapted from MJI (www.cse.psu.edu/~mji)] [Adapted from Dave Patterson’s.

Similar presentations

Presentation on theme: "CS35101 Computer Architecture Spring 2006 Week 8 P Durand (www.cs.kent.edu/~durand) [Adapted from MJI (www.cse.psu.edu/~mji)] [Adapted from Dave Patterson’s."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS35101 Computer Architecture Spring 2006 Week 8 P Durand (www.cs.kent.edu/~durand) [Adapted from MJI (www.cse.psu.edu/~mji)] [Adapted from Dave Patterson’s.

Similar presentations

Presentation on theme: "CS35101 Computer Architecture Spring 2006 Week 8 P Durand (www.cs.kent.edu/~durand) [Adapted from MJI (www.cse.psu.edu/~mji)] [Adapted from Dave Patterson’s."— Presentation transcript:

Similar presentations

About project

Feedback