Topic 3b Computer Arithmetic: ALU Design Introduction to Computer Systems Engineering (CPEG 323) 2019/1/2 cpeg323-05F\Topic3b
Design Process Design Top Down decomposition of complex functions components how they are put together Top Down decomposition of complex functions Bottom-up composition of primitive 2019/1/2 cpeg323-05F\Topic3b
Problem: Design an ALU Operations Total number of operations = 10 add, addu, sub, subu, addi, addiu 2’s complement adder/sub with overflow detection and, or, andi, ori bitwise operations Total number of operations = 10 2019/1/2 cpeg323-05F\Topic3b
Design: divide & conquer method Break the problem into simpler parts Work on the parts Put pieces together Verify solution works as a whole Example: Separate immediate instructions from the rest. Process immediates before ALU ALU inputs now uniform 6 non-immediate operations remain Need 3 bits to specify the ALU mode 2019/1/2 cpeg323-05F\Topic3b
Design – First Steps Complete functional specification first inputs: 2 x 32-bit operands A, B, 3-bit operation code outputs: 32-bit result R, 1-bit carry, 1 bit overflow operations: add, addu, sub, subu, and, or High-level block diagram completed next 2019/1/2 cpeg323-05F\Topic3b
Design – Reducing the problem to something simpler For our ALU, reduce 32-bit problem into simpler 1-bit slices. Changes big combinational problem to a small combinational problem Put the pieces together to solve the big problem. 2019/1/2 cpeg323-05F\Topic3b
Designing with lower-level block diagrams 1-Bit ALU block Replicate 32 times for a 32-bit ALU Replicate 32 times for a 32-bit ALU 2019/1/2 cpeg323-05F\Topic3b
The 1-Bit ALU Block Partition into separate/independent blocks logic arithmetic Complete each block at this level or further refine. Complete logic block Complete function select Decompose arithmetic block into simpler parts 2019/1/2 cpeg323-05F\Topic3b
1-bit Add Computing A + B Sum= Co = (a* Ci) + (b * Ci) + (a * b) This is called full adder. A half adder assumes no Ci. Can you draw the 1-bit adder according to the above logic? # of gate delays for Sum = 3 # of gate delays for Carry = 2 (a *b*Ci) + (a *b* Ci) + (a * b*Ci) +(a *b*Ci) 2019/1/2 cpeg323-05F\Topic3b
1-bit Subtraction Convert subtraction to addition XOR complements the input B Setting CarryIn adds 1 (if least significant bit) 2019/1/2 cpeg323-05F\Topic3b
Completing the ALU Overflow detection & opcode decoder 2019/1/2 cpeg323-05F\Topic3b
Overflow Overflow can be detected decoding the Carry into MSB and the Carry out of MSB 2019/1/2 cpeg323-05F\Topic3b
Overflow Detection Logic Carry into MSB XOR Carry out of MSB For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1] 2019/1/2 cpeg323-05F\Topic3b
Evaluating Performance Logic path has three gate delays XOR + AND/OR + MUX Add/sub 1 gate delay for XOR 3 gate delays for SUM and 2 for CarryOut Each bit slice depends on Ci: the output of the previous slice. For an N-bit Adder the worst case delay is then 2 *N gate delays This worst case delay describes a ripple adder 2019/1/2 cpeg323-05F\Topic3b
Evaluating Performance – ALU Block The ALU speed is limited by its slowest block. The logic block has 2 gate delays The add/subtract has 2*N + 1 gate delays, where N >> 1 The arithmetic block is significantly limiting performance Consider ways to reduce gate delays in adder 2019/1/2 cpeg323-05F\Topic3b
Speeding up the ripple carry adder Eliminating the ripple c1 = b0*c0 + a0*c0 + a0*b0 c2 = b1*c1 + a1*c1 + a1*b1 c3 = b2*c2 + a2*c2 + a2*b2 c4 = b3*c3 + a3*c3 + a3*b3 2019/1/2 cpeg323-05F\Topic3b
Carry Look Ahead When both inputs 0, no carry When one is 0, the other is 1, propagate carry input When both are 1, then generate a carry 2019/1/2 cpeg323-05F\Topic3b
Carry-lookahead adder Generate gi = ai * bi Propagate pi = ai + bi Write carry out as function of preceding g, p, & co c1 = g0 + p0*c0 c2 = g1 + p1*c1 c3 = g2 + p2*c2 c4 = g3 + p3*c3 2019/1/2 cpeg323-05F\Topic3b
Reducing the complexity C1 = g0 + (p0 * C0) C2 = g1 + (p1 * [g0 + p0 * C0]) = g1 + (p1 * g0) + (p1 * p0 * C0) C3 = g2 + (p2 * g1) + (p2 * p1 * g0) + (p2 * p1 * p0 * c0) C4=? Increase speed at what cost ? Can you illustrate how to build a 32-bit adder with carry look ahead? 2019/1/2 cpeg323-05F\Topic3b
Limitations The number of inputs of the gates drastically increases Technology permits only a certain maximal number of inputs (fan-in) Realization of a gate with high fan-in by a chain of gates with low fan-in. From Prof.Michal G. Wahl 2019/1/2 cpeg323-05F\Topic3b
Use principle to build a 16-bit adders Let us add a second-level abstractions! Using a 4-bit adder as a first-level abstraction 2019/1/2 cpeg323-05F\Topic3b
4-bit wide carry-lookahead P0 = p3 * p2 * p1 * p0 P1 = p7 * p6 * p5 * p4 P2 = p11 * p10 * p9 * p8 P3 = p15 * p14 * p13 * p12 G0 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0) G1 = G2 = G3 = 2019/1/2 cpeg323-05F\Topic3b