Somet things you should know about digital arithmetic: Computer arithmetic Somet things you should know about digital arithmetic: Principles Architecture Design
Multiplication Can be done in one clock cycle but: Very slow Needs a lot of hardware
Multiplication Si m n m Si n n Ci Co Ci So Co m So
Multiplication b3 b2 b1 b0 a0 a1 a2 a3 p7 p6 p5 p4 p3 p2 p1 p0 a0b3
To avoid those costs: Multiplication is usually multiple-cycle For example: Repeated add, shift In the MIPS: “4 - 12 cycles for mult” Databook s 3.9 Multiply instruction is not implemented in our simulator
Division... is even worse.... Multiple cycle Repeated shift - subtract - test Databook ... instruction uses 35 cycles Divide instruction is not implemented in our simulator
Division D / 2 can be done by D shifted right n bits When D is negative: If D is even: D Arithmetic shift right by n If D is odd: (D + 1) Arithmetic shift right by n
Example - 1 / 2 -1 arith. shift right 1: 111 -> 111 111 -> 111 result: -1 Wrong (-1 + 1) arith. shift right 1: 000 -> 000 result: 0 OK
In the mips, Multiply and divide uses special hardware (not the ALU) and special registers “HI”, “LO” (not in our simulator)
Floating point? Needs its own hardware! Co-processor, usually a separate chip Main (integer) CPU CP1 Floating point CP0 Control
So the ALU does ADD SUBTRACT SIMPLE LOGIC Simple logic is fast, but add / sub is slow because of long critical path
Add two numbers Carry from step n-1 1100 .......................010 +.......................110 000 3 input bits in each step Sum Cin Full adder B0 S0 A0 Cout
The full adder A B Ci S Co 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 2-level logic or: A =1 B =1 S Ci & & Co &
The carry chain A31 B31 S31 A30 B30 S30 A29 B29 S29 A2 B2 S2 A1 B1 S1 Cin Cout
Addition A31 B31 S31 A30 B30 S30 A29 B29 S29 A2 B2 S2 A1 B1 S1 A0 B0
Subtraction A - B ? A + Neg (B) A + Not (B) + 1 two’s complement one’s complement + 1
Add and subtract =1 =1 =1 =1 =1 =1 =1 B31 B31 B31 B31 B31 B31 B31 A31
Timing analys There are six gates per stage* There are 32 stages * Exor are two gate levels There are 32 stages The critical path are 6 * 32 gate delay! (Ripple adder) We must break up that carry chain!
Full adder again: S = A xor B xor Ci Co = (A and B) or ((A xor B) and Ci) We define P = A xor B G = A and B And we get S = P xor Ci Co = G or (P and Ci) =1 A B P & G Computed quickly!
The full adder .... Si = Pi xor Ci-1 Ci = Gi or (Pi and Ci-1) If we could be given all of the Ci at the same time, Si is just one more xor
The full adder C0 = G0 or (P0 and Cin) C1 = G1 or (P1 and C0) C1 = G1 or (P1 and (G0 or (P0 and Cin)) C1 = G1 or P1G0 or P1P0Cin in the K:th position: Ck = Gk or Gk-1Pk or....PkPk-1....P0Cin Wide or Wide and
The carry lookahead adder P / G generator (two level logic) Carry Final add (exor) A 32 B G P C S Cin
At the worst... An N-input AND (OR) has delay lg2 (N) * 2-input delay:
The combination of carry lookahead and ripple carry G0,3 P4,7 G4,7 C4 C8 P8,11 G8,11 P12,15 G12,15 C12
The carry skip adder - If the full adder in step n generates a carry, & ≥1 C0 C4 C8 C12 P4,7 P12,15 P8,11 P0,3 G12,15 G8,11 G4,7 G0,3 - If the full adder in step n generates a carry, it will be correct independent of carry in. - A carry generated in step n is propagated through the and / or gates, not through the adders
The carry select adder Full adder Full adder Full adder Full adder B A B A B A B Full adder Full adder Full adder Full adder C0 ≥1 A B A B A B S Full adder 1 Full adder 1 Full adder 1 S S S ≥1 & &
Asymptotic time and space requirements Time Space Ripple carry O(n) O(n) Carry lookahead O(log n) O(n log n) Carry skip O(sqrt n) O(n) Carry select O(sqrt n) O(n)