Computer Architecture EECS 361 Lecture 5: The Design Process & ALU Design Just like lat time, I like to start today’s lecture with a recap of our last.

Slides:



Advertisements
Similar presentations
©UCB CPSC 161 Lecture 6 Prof. L.N. Bhuyan
Advertisements

Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 7 Arithmetic Logic Unit February 19,
361 design.1 Computer Architecture ECE 361 Lecture 5: The Design Process & ALU Design.
1 Representing Numbers Using Bases Numbers in base 10 are called decimal numbers, they are composed of 10 numerals ( ספרות ) = 9* * *10.
ECE 232 L8.Arithm.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 8 Computer.
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Lecture 8 Arithmetic Logic Circuits
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 6: Logic/Shift Instructions Partially adapted from Computer Organization and Design, 4.
1  1998 Morgan Kaufmann Publishers Chapter Four Arithmetic for Computers.
Chapter 3 Arithmetic for Computers. Arithmetic Where we've been: Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's.
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
1 CS/COE0447 Computer Organization & Assembly Language Chapter 3.
Cs 152 L5 Cost.1 DAP Fa 1997  UCB ECE Computer Architecture Lecture Notes Adders Shantanu Dutt Univ. of Illinois at Chicago Excerpted from.
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
Csci 136 Computer Architecture II – Constructing An Arithmetic Logic Unit Xiuzhen Cheng
EEL-4713C Computer Architecture Introduction: the Logic Design Process
Chapter 10 The Assembly Process. What Assemblers Do Translates assembly language into machine code. Assigns addresses to all symbolic labels (variables.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
MIPS ALU. Building from the adder to ALU ALU – Arithmetic Logic Unit, does the major calculations in the computer, including – Add – And – Or – Sub –
1 Modified from  Modified from 1998 Morgan Kaufmann Publishers Chapter Three: Arithmetic for Computers Section 2 citation and following credit line is.
CPE 232 MIPS Arithmetic1 CPE 232 Computer Organization MIPS Arithmetic – Part I Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin (
1 ELEN 033 Lecture 4 Chapter 4 of Text (COD2E) Chapters 3 and 4 of Goodman and Miller book.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
Computer Arthmetic Chapter Four P&H. Data Representation Why do we not encode numbers as strings of ASCII digits inside computers? What is overflow when.
MIPS ALU. Exercise – Design a selector? I need a circuit that takes two input bits, a and b, and a selector bit s. The function is that if s=0, f=a. if.
9/23/2004Comp 120 Fall September Chapter 4 – Arithmetic and its implementation Assignments 5,6 and 7 posted to the class web page.
EE204 L03-ALUHina Anwar Khan EE204 Computer Architecture Lecture 03- ALU.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Computer Arthmetic Chapter Four P&H.
Combinational Circuits
Single-Cycle Datapath and Control
Integer Multiplication and Division
COMP541 Datapaths I Montek Singh Mar 28, 2012.
Addition and multiplication
Computer Organization and Design Arithmetic & Logic Circuits
Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.
MIPS ALU.
Processor (I).
Computer Organization and Design Instruction Sets
Computer Organization and Design Arithmetic & Logic Circuits
Single-Cycle CPU DataPath.
Lecture 4: MIPS Instruction Set
Arithmetic Where we've been:
CSE Winter 2001 – Arithmetic Unit - 1
Topic 3b Computer Arithmetic: ALU Design
MIPS ALU.
Arithmetic Circuits (Part I) Randy H
Topic 3a Two’s Complement Representation
ECE232: Hardware Organization and Design
Rocky K. C. Chang 6 November 2017
Topic 3b Computer Arithmetic: ALU Design
CS/COE0447 Computer Organization & Assembly Language
Guest Lecturer TA: Shreyas Chand
CS/COE0447 Computer Organization & Assembly Language
Overview Part 1 – Design Procedure Part 2 – Combinational Logic
COMS 361 Computer Organization
Addition and multiplication
A 1-Bit Arithmetic Logic Unit
Addition and multiplication
ECE 352 Digital System Fundamentals
Computer Architecture EECS 361 Lecture 6: ALU Design
ECE 352 Digital System Fundamentals
COMS 361 Computer Organization
Number Representation
The Processor: Datapath & Control.
MIPS ALU.
MIPS ALU.
Arithmetic and Logic Circuits
Presentation transcript:

Computer Architecture EECS 361 Lecture 5: The Design Process & ALU Design Just like lat time, I like to start today’s lecture with a recap of our last lecture Start X:40

Quick Review of Last Lecture

MIPS ISA Design Objectives and Implications Support general OS and C- style language needs Support general and embedded applications Use dynamic workload characteristics from general purpose program traces and SPECint to guide design decisions Implement processsor core with a relatively small number of gates Emphasize performance via fast clock Traditional data types, common operations, typical addressing modes RISC-style: Register-Register / Load-Store

MIPS jump, branch, compare instructions Instruction Example Meaning branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100 Not equal test; PC relative set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; 2’s comp. set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; 2’s comp. set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; natural numbers set l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; natural numbers jump j 10000 go to 10000 Jump to target address jump register jr $31 go to $31 For switch, procedure return jump and link jal 10000 $31 = PC + 4; go to 10000 For procedure call

Example: MIPS Instruction Formats and Addressing Modes All instructions 32 bits wide 6 5 5 5 11 Register (direct) op rs rt rd register Immediate op rs rt immed Base+index op rs rt immed Memory register + PC-relative op rs rt immed Memory PC +

MIPS Instruction Formats

MIPS Operation Overview Arithmetic logical Add, AddU, AddI, ADDIU, Sub, SubU And, AndI, Or, OrI SLT, SLTI, SLTU, SLTIU SLL, SRL Memory Access LW, LB, LBU SW, SB

Branch & Pipelines Time li r3, #7 execute sub r4, r4, 1 ifetch execute bz r4, LL ifetch execute Branch addi r5, r3, 1 ifetch execute Delay Slot LL: slt r1, r3, r5 ifetch execute Branch Target By the end of Branch instruction, the CPU knows whether or not the branch will take place. However, it will have fetched the next instruction by then, regardless of whether or not a branch will be taken. Why not execute it?

The next Destination Arithmetic Single/multicycle Datapaths Pipelining µProc 60%/yr. (2X/1.5yr) DRAM 9%/yr. (2X/10 yrs) 1 10 100 1000 1980 1981 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance Time “Moore’s Law” Single/multicycle Datapaths Begin ALU design using MIPS ISA. IFetch Dcd Exec Mem WB Pipelining Memory Systems I/O

Outline of Today’s Lecture An Overview of the Design Process Illustration using ALU design Refinements Here is an outline of today’s lecture. First we will talk about the design process. Then I will give you a short review of binary arithmetic so that I can show you how to design a simple 4-bit ALU. Finally, I will show you how to keep an on-line design notebook to keep track of your work. +1 = 4 min. (X:44)

Design Process Design Finishes As Assembly CPU -- Design understood in terms of components and how they have been assembled -- Top Down decomposition of complex functions (behaviors) into more primitive functions -- bottom-up composition of primitive building blocks into more complex assemblies Datapath Control ALU Regs Shifter Nand Gate One of the fun part about being a designer is that you got to be a little kid playing “lego” again, except this time, you will be designing the building blocks as well as putting the building blocks together. The two approaches you will use are: Top Down and Bottom Up. You use the Top Down approach to decompose complex function into primitive functions. After the primitive functions are implemented, you then need to integrate them back together to implement the original complex function. For example, when you design a CPU, you use the top down approach to break the CPU into these primitive blocks. Once you have these blocks implemented, you then put them together to form the CPU. This is pretty clean cut. In many other design problems, you cannot just apply the top-down and then bottom up once. You need to repeat the process several times because design is a creative process, NOT a simple method. Top-Down & Bottom-up together +2 = 7 min. (X:47) Design is a "creative process," not a simple method

Design as Search Problem A Strategy 1 Strategy 2 SubProb2 SubProb3 BB1 BB2 BB3 BBn One way to think about the design process is that it is a search for the proper solution through the design space (point to the diagram). How do you know where to find the proper solution? Well usually you don’t. What you need to do is make educated guesses and then verify whether your guesses are correct. If you are correct, you congratulate yourself. You you are wrong, try again. You will have a set of design goals: some are given to you by your supervisors and some may set some for you own. In any case, with a set of goals and some may be contradicting, you must learn how to prioritize them. The way to remember about design is that there are many ways to do the same thing. There is really no such thing as the absolute “right way” to do certain things. Ideally, we like to always pick the best solution but remember your goal should be best solution for the ORIGINAL problem (Problem A). For the Sub-problems down here, you may not need to have to make the optimal choice each time. Sometimes all you need is a reasonable good choice. It takes design time to do best choice at every level, and if run out of design time can jeopardize project in fast moving technology If you have a choice that is good enough for a sub-problem, you should be happy with it and move onto other sub-problems that require your attention. Remember, even the world’s fastest ALU will not do you any good unless you have an equally fast controller to controls it. +3 = 11 min. (X:51) Design involves educated guesses and verification -- Given the goals, how should these be prioritized? -- Given alternative design pieces, which should be selected? -- Given design space of components & assemblies, which part will yield the best solution? Feasible (good) choices vs. Optimal choices

Problem: Design a “fast” ALU for the MIPS ISA Requirements? Must support the Arithmetic / Logic operations Tradeoffs of cost and speed based on frequency of occurrence, hardware budget

MIPS ALU requirements Add, AddU, Sub, SubU, AddI, AddIU => 2’s complement adder/sub with overflow detection And, Or, AndI, OrI, Xor, Xori, Nor => Logical AND, logical OR, XOR, nor SLTI, SLTIU (set less than) => 2’s complement adder with inverter, check sign bit of result

MIPS arithmetic instruction format 31 25 20 15 5 R-type: op Rs Rt Rd funct I-Type: op Rs Rt Immed 16 Type op funct ADDI 10 xx ADDIU 11 xx SLTI 12 xx SLTIU 13 xx ANDI 14 xx ORI 15 xx XORI 16 xx LUI 17 xx Type op funct ADD 00 40 ADDU 00 41 SUB 00 42 SUBU 00 43 AND 00 44 OR 00 45 XOR 00 46 NOR 00 47 Type op funct 00 50 00 51 SLT 00 52 SLTU 00 53 Signed arith generate overflow, no carry

Design Trick: divide & conquer Break the problem into simpler problems, solve them and glue together the solution Example: assume the immediates have been taken care of before the ALU 10 operations (4 bits) 00 add 01 addU 02 sub 03 subU 04 and 05 or 06 xor 07 nor 12 slt 13 sltU

Refined Requirements ALU (1) Functional Specification inputs: 2 x 32-bit operands A, B, 4-bit mode (sort of control) outputs: 32-bit result S, 1-bit carry, 1 bit overflow operations: add, addu, sub, subu, and, or, xor, nor, slt, sltU (2) Block Diagram (CAD-TOOL symbol, VHDL entity) 32 32 A B 4 ALU c m ovf S 32

Behavioral Representation: VHDL Entity ALU is generic (c_delay: integer := 20 ns; S_delay: integer := 20 ns); port ( signal A, B: in vlbit_vector (0 to 31); signal m: in vlbit_vector (0 to 3); signal S: out vlbit_vector (0 to 31); signal c: out vlbit; signal ovf: out vlbit) end ALU; . . . C_delay is the carry delay C_delay is the day fdor the sum (S) Some signals are bit vectors(A,B,S,m), some are single bit(c,ovflw) S <= A + B;

Design Decisions ALU bit slice 7-to-2 C/L 7 3-to-2 C/L PLD Gates CL0 mux Simple bit-slice big combinational problem many little combinational problems partition into 2-step problem Bit slice with carry look-ahead . . .

Refined Diagram: bit-slice ALU 32 A B 32 ALU0 a0 b0 m cin co s0 ALU0 a31 b31 m cin co s31 4 M Ovflw 32 S

7-to-2 Combinational Logic start turning the crank . . . Function Inputs Outputs K-Map M0 M1 M2 M3 A B Cin S Cout add 0 0 0 0 0 0 0 0 0 Just fill in all combinations that you want 127

A One Bit ALU This 1-bit ALU will perform AND, OR, and ADD CarryIn A We will use the bit slice approach to design the ALU. That is, we will first build an one-bit ALU and then connect four of them together to form a four-bit ALU. The one-bit ALU consists of four major parts: (a) an AND gate, (b) an OR gate, (c) an one-bit adder, and (d) a MUX to select the output based on the proper operation. Obviously, the AND gate and OR will perform the bitwise AND and bitwise OR operations. The adder will perform both the Add and Subtract operation. Let’s take a look at the adder design. +1 = 26 min. (Y:06) Result Mux 1-bit Full Adder B CarryOut

A One-bit Full Adder This is also called a (3, 2) adder CarryOut CarryIn A B C This is also called a (3, 2) adder Half Adder: No CarryIn nor CarryOut Truth Table: Inputs Outputs Comments A B CarryIn Sum CarryOut 0 + 0 + 0 = 00 1 0 + 0 + 1 = 01 0 + 1 + 0 = 01 0 + 1 + 1 = 10 1 + 0 + 0 = 01 1 + 0 + 1 = 10 1 + 1 + 0 = 10 1 + 1 + 1 = 11 The adder we need is called a (3, 2) adder because it has 3 inputs and 2 outputs. It is also called a Full header because it provides CarryOut. If we have a Half Adder, we will not have CarryIn nor CarryOut. Here is the truth table for this 1-bit full adder. For example let’s look at the 4th Row: if Input A is 0, Input B is 1, and CarryIn is 1, then we have 0 plus 1 plus 1 that equals to 10 so the Sum bit is 0 and CarryOut is 1. You can derive the rest of the table at home tonight if you want. For now, let’s derive the logic equation for CarryOut from the table. +2 = 28 min. (Y:08)

Logic Equation for CarryOut Inputs Outputs Comments A B CarryIn Sum CarryOut 0 + 0 + 0 = 00 1 0 + 0 + 1 = 01 0 + 1 + 0 = 01 0 + 1 + 1 = 10 1 + 0 + 0 = 01 1 + 0 + 1 = 10 1 + 1 + 0 = 10 1 + 1 + 1 = 11 In order to derive the logic equation, we simply look at the CarryOut column and pick up the rows that contains “1s.” They become the product terms of the equation. Once you have the product terms, you need to OR them together because any one of these product term is enough to make CarryOut goes to 1. This long equation can be simplified to this simpler equation because: (a) The last and the 1st product term, with both NOT A and A makes A a don’t care. (b) Similarly, the last and the 2nd term with both NOT B and B makes B a don’t care. (c) Finally: we have both NOT CarryIn and CarryIn, so CarryIn becomes a don’t care. +2 = 30 min. (Y:10) CarryOut = (!A & B & CarryIn) | (A & !B & CarryIn) | (A & B & !CarryIn) | (A & B & CarryIn) CarryOut = B & CarryIn | A & CarryIn | A & B

Logic Equation for Sum Inputs Outputs Comments A B CarryIn Sum CarryOut 0 + 0 + 0 = 00 1 0 + 0 + 1 = 01 0 + 1 + 0 = 01 0 + 1 + 1 = 10 1 + 0 + 0 = 01 1 + 0 + 1 = 10 1 + 1 + 0 = 10 1 + 1 + 1 = 11 Similarly, we can construct the logic equation for Sum by looking at the rows that has Sum equal to 1. +1 = 31 min. (Y:11) Sum = (!A & !B & CarryIn) | (!A & B & !CarryIn) | (A & !B & !CarryIn) | (A & B & CarryIn)

Logic Equation for Sum (continue) Sum = (!A & !B & CarryIn) | (!A & B & !CarryIn) | (A & !B & !CarryIn) | (A & B & CarryIn) Sum = A XOR B XOR CarryIn Truth Table for XOR: X Y X XOR Y The logic equation for SUM can be simplify if we use the binary operator: XOR. The XOR gate implements the “not equal” function: that is the output of the XOR gate will be 1 ONLY if the outputs are not the same. Using this new XOR operator, we can simplify the Sum equation to: A XOR B XOR CarryIn. +1 = 32 min. (Y:12) 1 1 1 1 1 1

Logic Diagrams for CarryOut and Sum CarryOut = B & CarryIn | A & CarryIn | A & B Sum = A XOR B XOR CarryIn CarryIn CarryOut A B With the logic equations we derived, we can implement the logic for Sum and Carry. +1 = 33 min. (Y:13) CarryIn A Sum B

Seven plus a MUX ? Design trick 2: take pieces you know (or can imagine) and try to put them together Design trick 3: solve part of the problem and extend S-select CarryIn and A Now that I have shown you how to build a 1-bit full adder, we have all the major components needed for this 1-bit ALU. In order to build a 4-bit ALU, we simply connect four 1-bit ALUs in series to feed the CarryOut of one ALU to the CarryIn of the next ALU. Even though I called this an ALU, I actually lied a little. There is something missing about this ALU. This ALU can NOT perform the subtract operation. Let’s see how can we fix this problem. 2 min = 35 min. (Y:15) or Result Mux 1-bit Full Adder add B CarryOut

A 4-bit ALU 1-bit ALU 4-bit ALU CarryIn0 A B 1-bit Full Adder CarryOut Mux CarryIn Result A0 1-bit ALU Result0 B0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 CarryOut0 Now that I have shown you how to build a 1-bit full adder, we have all the major components needed for this 1-bit ALU. In order to build a 4-bit ALU, we simply connect four 1-bit ALUs in series to feed the CarryOut of one ALU to the CarryIn of the next ALU. Even though I called this an ALU, I actually lied a little. There is something missing about this ALU. This ALU can NOT perform the subtract operation. Let’s see how can we fix this problem. 2 min = 35 min. (Y:15) A2 B2 1-bit ALU Result2 CarryIn2 CarryOut2 CarryIn3 A3 1-bit ALU Result3 B3 CarryOut3

How About Subtraction? Keep in mind the followings: (A - B) is the that as: A + (-B) 2’s Complement: Take the inverse of every bit and add 1 Bit-wise inverse of B is !B: A + !B + 1 = A + (!B + 1) = A + (-B) = A - B Subtract Recalled something you learned from grade school that: A - B is the same as A plus (-B). Also recall from earlier slides that in order to calculate the 2’s complement representation of negative B, we simply take the inverse of very bit and add 1. The bitwise inverse of B is easy to compute. Just pass them through the inverter. In order to do the add 1 operation, we simply set the CarryIn to 1. So for the subtract operation, we simply select the output of the inverter and set CarryIn to 1. Then we will be adding A to the negative of B and whola, we have the A minus B operation. +2 = 37 min. (Y:17) CarryIn A 4 Zero “ALU” Result 4 Sel B 4 2x1 Mux 4 1 4 !B CarryOut

Additional operations A - B = A + (– B) form two complement by invert and add one S-select invert CarryIn and A or Result Mux 1-bit Full Adder add B CarryOut Set-less-than? – left as an exercise

Revised Diagram ? LSB and MSB need to do a little extra 32 A B 32 a0 4 ALU0 ALU0 M ? co cin co cin s0 s31 C/L to produce select, comp, c-in Ovflw 32 S

Overflow Decimal Binary Decimal 2’s Complement 0000 0000 1 0001 -1 0000 0000 1 0001 -1 1111 2 0010 -2 1110 3 0011 -3 1101 4 0100 -4 1100 5 0101 -5 1011 6 0110 -6 1010 7 0111 -7 1001 -8 1000 Well so far so good but life is not always perfect. Let’s consider the case 7 plus 3, you will get 10. But if you perform the binary arithmetics on our 4-bit adder you will get 1010, which is negative 6. Similarly, if you try to add negative 4 and negative 5 together, you should get negative 9. But the binary arithmetics will give you 0111, which is 7. So what went wrong? The problem is overflow. The number you get are simply too big, in the positive 10 case, and too small in the negative 9 case, to be represented by four bits. +2 = 39 min. (Y:19) Examples: 7 + 3 = 10 but ... - 4 - 5 = - 9 but ... 1 1 1 1 1 1 1 7 1 1 – 4 3 – 5 + 1 1 + 1 1 1 1 1 – 6 1 1 1 7

Overflow Detection Example: - 8 < = 4-bit binary number <= 7 Overflow: the result is too large (or too small) to represent properly Example: - 8 < = 4-bit binary number <= 7 When adding operands with different signs, overflow cannot occur! Overflow occurs when adding: 2 positive numbers and the sum is negative 2 negative numbers and the sum is positive On your own: Prove you can detect overflow by: Carry into MSB ° Carry out of MSB Recalled from some earlier slides that the biggest positive number you can represent using 4-bit is 7 and the smallest negative you can represent is negative 8. So any time your addition results in a number bigger than 7 or less than negative 8, you have an overflow. Keep in mind is that whenever you try to add two numbers together that have different signs, that is adding a negative number to a positive number, overflow can NOT occur. Overflow occurs when you to add two positive numbers together and the sum has a negative sign. Or, when you try to add negative numbers together and the sum has a positive sign. If you spend some time, you can convince yourself that If the Carry into the most significant bit is NOT the same as the Carry coming out of the MSB, you have a overflow. +2 = 41 min. (Y:21) 1 1 1 1 1 1 1 7 1 1 –4 3 – 5 + 1 1 + 1 1 1 1 1 – 6 1 1 1 7

Overflow Detection Logic Carry into MSB ° Carry out of MSB For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1] CarryIn0 A0 1-bit ALU Result0 X Y X XOR Y B0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 CarryOut0 1 1 Recall the XOR gate implements the not equal function: that is, its output is 1 only if the inputs have different values. Therefore all we need to do is connect the carry into the most significant bit and the carry out of the most significant bit to the XOR gate. Then the output of the XOR gate will give us the Overflow signal. +1 = 42 min. (Y:22) 1 1 1 1 CarryIn2 A2 1-bit ALU Result2 B2 CarryIn3 Overflow A3 1-bit ALU Result3 B3 CarryOut3

Zero Detection Logic Zero Detection Logic is just a one BIG NOR gate Any non-zero input to the NOR gate will cause its output to be zero CarryIn0 A0 B0 1-bit ALU Result0 CarryOut0 A1 B1 Result1 CarryIn1 CarryOut1 A2 B2 Result2 CarryIn2 CarryOut2 A3 B3 Result3 CarryIn3 CarryOut3 Zero Besides detecting overflow, our ALU also needs to indicate if the result is zero. This is easy to do. All we need is a BIG NOR gate. Then if any of the Result bit is not zero, then the output of the NOR gate will be low. The only time the output of the NOR gate is high is when all the result bits are zeroes. +1 = 43 min. (Y:23)

More Revised Diagram LSB and MSB need to do a little extra 32 A B 32 signed-arith and cin xor co a0 b0 a31 b31 4 ALU0 ALU0 M co cin co cin s0 s31 C/L to produce select, comp, c-in Ovflw 32 S

But What about Performance? Critical Path of n-bit Rippled-carry adder is n*CP CarryIn0 A0 1-bit ALU Result0 B0 CarryOut0 CarryIn1 A1 1-bit ALU Result1 B1 CarryOut1 CarryIn2 A2 1-bit ALU Result2 B2 CarryOut2 CarryIn3 A3 1-bit ALU Result3 B3 CarryOut3 Design Trick: throw hardware at it

The Disadvantage of Ripple Carry The adder we just built is called a “Ripple Carry Adder” The carry bit may have to propagate from LSB to MSB Worst case delay for a N-bit adder: 2N-gate delay CarryIn0 A0 1-bit ALU Result0 B0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 CarryOut0 CarryIn CarryOut A B The Adder we just built is called a Ripple Carry Adder because: Carry may have to propagate from the least significant bit to the most significant bit. In other words, the combination of A0, B0, and CarryIn0 may cause CarryOut0 to become 1. As a result of CarryOut0 going 1, CarryOut1 may become 1 and etc., etc., .... etc and propagate down the carry chain. Recall the Carry Logic: CarryIn to CarryOut has a 2-gate delay. So in the worst case, a N-bit ripple carry will have a 2N gate delay. For a 32-bit adder, this means the worst case delay is 64 gates. This can be a problem. So after the break, I will show you some faster way of designing an ALU. +2 = 45 min. (Y:25) CarryIn2 A2 1-bit ALU Result2 B2 CarryOut2 CarryIn3 A3 1-bit ALU Result3 B3 CarryOut3

Carry Look Ahead (Design trick: peek) Cin A B C-out 0 0 0 “kill” 0 1 C-in “propagate” 1 0 C-in “propagate” 1 1 1 “generate” A0 S G B1 P C1 =G0 + C0  P0 A S P = A and B G = A xor B G B P C2 = G1 + G0 P1 + C0  P0  P1 Names: suppose G0 is 1 => carry no matter what else => generates a carry suppose G0 =0 and P0=1 => carry IFF C0 is a 1 => propagates a carry Like dominoes What about more than 4 bits? A S G B P C3 = G2 + G1 P2 + G0  P1  P2 + C0  P0  P1  P2 A S G G B P P C4 = . . .

Plumbing as Carry Lookahead Analogy

The Idea Behind Carry Lookahead (Continue) Using the two new terms we just defined: Generate Carry at Bit i gi = Ai & Bi Propagate Carry via Bit i pi = Ai or Bi We can rewrite: Cin1 = g0 | (p0 & Cin0) Cin2 = g1 | (p1 & g0) | (p1 & p0 & Cin0) Cin3 = g2 | (p2 & g1) | (p2 & p1 & g0) | (p2 & p1 & p0 & Cin0) Carry going into bit 3 is 1 if We generate a carry at bit 2 (g2) Or we generate a carry at bit 1 (g1) and bit 2 allows it to propagate (p2 & g1) Or we generate a carry at bit 0 (g0) and bit 1 as well as bit 2 allows it to propagate (p2 & p1 & g0) Or we have a carry input at bit 0 (Cin0) and bit 0, 1, and 2 all allow it to propagate (p2 & p1 & p0 & Cin0) Using the carry generate and carry propagate terms, we can rewrite the carry lookahead equations like these. For example, the Carry going into bit 3 (Cin 3) is 1 if: (a) We generate a carry at bit 2. (b) Or we generate a carry at bit 1 and bit 2 allows it to propagate ... and so on. +1 = 56 min. (Y:36)

The Idea Behind Carry Lookahead Cin1 Cin2 1-bit ALU 1-bit ALU Cin0 Cout1 Cout0 Recall: CarryOut = (B & CarryIn) | (A & CarryIn) | (A & B) Cin2 = Cout1 = (B1 & Cin1) | (A1 & Cin1) | (A1 & B1) Cin1 = Cout0 = (B0 & Cin0) | (A0 & Cin0) | (A0 & B0) Substituting Cin1 into Cin2: Cin2 = (A1 & A0 & B0) | (A1 & A0 & Cin0) | (A1 & B0 & Cin0) | (B1 & A0 & B0) | (B1 & A0 & Cin0) | (B1 & A0 & Cin0) | (A1 & B1) Now define two new terms: Generate Carry at Bit i gi = Ai & Bi Propagate Carry via Bit i pi = Ai or Bi READ and LEARN Details Carry lookahead is another way to speed up an adder and here is the theory behind it. Remember the logic equation for CarryOut so the carry coming out of bit 1 and into bit 2 (Cin2) looks like this. Notice that, this carry will depends on the carry coming out of bit 0 (Cin1). By substituting the equation of Cin1 into the equation Cin2, we can rewrite the equation of Cin2 so it depends on Cin0, A0, B0, A1, and B1. The beauty of this equation is that it does NOT depend of the carry coming out of bit 0 (Cout0) so it does not have to wait for the carry to propagate through the lower bits. This equation can be simplified if we redefine two terms: carry generate and propagate. +2 = 55 min. (Y:35) Cin2

Cascaded Carry Look-ahead (16-bit): Abstraction G0 P0 C1 =G0 + C0  P0 4-bit Adder C2 = G1 + G0 P1 + C0  P0  P1 4-bit Adder C3 = G2 + G1 P2 + G0  P1  P2 + C0  P0  P1  P2 4-bit Adder G P C4 = . . .

2nd level Carry, Propagate as Plumbing

A Partial Carry Lookahead Adder It is very expensive to build a “full” carry lookahead adder Just imagine the length of the equation for Cin31 Common practices: Connects several N-bit Lookahead Adders to form a big adder Example: connects four 8-bit carry lookahead adders to form a 32-bit partial carry lookahead adder As you can imagine from looking at the carry lookahead equations, it is very expensive if you want to build a full 32-bit carry lookahead adder: the equation for Cin31 will be very long. A common practice is to build smaller N-bit carry lookahead adders and then connect them together to form a bigger adder. For example, here we connect four 8-bit carry lookahead adders to form a 32-bit adder. +1 = 57 min. (Y:37) A[31:24] B[31:24] A[23:16] B[23:16] 8-bit Carry Lookahead Adder C8 8 Result[15:8] B[15:8] A[15:8] 8-bit Carry Lookahead Adder C0 8 Result[7:0] B[7:0] A[7:0] 8 8 8 8 8-bit Carry Lookahead Adder 8-bit Carry Lookahead Adder C24 C16 8 8 Result[31:24] Result[23:16]

Design Trick: Guess Carry-select adder CP(2n) = 2*CP(n) n-bit adder CP(2n) = CP(n) + CP(mux) Use multiplexor to save time: guess both ways and then select (assumes mux is faster than adder) n-bit adder 1 n-bit adder n-bit adder Cout Carry-select adder

Carry Select Consider building a 8-bit ALU Simple: connects two 4-bit ALUs in series A[3:0] CarryIn 4 Result[3:0] ALU 4 B[3:0] 4 Let’s consider building a 8-bit ALU. The easiest way is to connect two 4-bit ALU in series. But the worst cast delay for the Carry in this case will be 2 times 8 or 16 gates delay. This may not be acceptable so here is a more clever way to do it. +1 = 51 min. (Y:31) A[7:4] 4 Result[7:4] ALU 4 B[7:4] 4 CarryOut

Carry Select (Continue) Consider building a 8-bit ALU Expensive but faster: uses three 4-bit ALUs Result[3:0] ALU 4 A[3:0] CarryIn B[3:0] C4 A[7:4] 4 X[7:4] Sel ALU 4 1 B[7:4] A[7:4] Result[7:4] What we can do is to use two 4-bit ALUs in parallel for the upper 4-bit’s addition. For one ALU, we assume the Carry In is zero. For the other ALU, assume the CarryIn is 1. Then we select the outputs (Y[7:4] as well as the Carry) of these two ALU based on the Carry Out of the lower ALU. More specifically, if C4 is 0, we will use of the result of this (LEFT side) ALU. If C4 is 1, we will use the result of this (RIGHT side) ALU. +2 = 53 min. (Y:33) 4 4 2 to 1 MUX C0 4 Y[7:4] ALU 1 4 B[7:4] 4 C1 C4 1 2 to 1 MUX Sel CarryOut

Additional MIPS ALU requirements Mult, MultU, Div, DivU (next lecture) => Need 32-bit multiply and divide, signed and unsigned Sll, Srl, Sra (next lecture) => Need left shift, right shift, right shift arithmetic by 0 to 31 bits Nor (leave as exercise to reader) => logical NOR or use 2 steps: (A OR B) XOR 1111....1111

Elements of the Design Process Divide and Conquer (e.g., ALU) Formulate a solution in terms of simpler components. Design each of the components (subproblems) Generate and Test (e.g., ALU) Given a collection of building blocks, look for ways of putting them together that meets requirement Successive Refinement (e.g., carry lookahead) Solve "most" of the problem (i.e., ignore some constraints or special cases), examine and correct shortcomings. Formulate High-Level Alternatives (e.g., carry select) Articulate many strategies to "keep in mind" while pursuing any one approach. Work on the Things you Know How to Do The unknown will become “obvious” as you make progress. Here are some key elements of the design process. First is divide and conquer. (a) First you formulate a solution in terms of simpler components. (b) Then you concentrate on designing each components. Once you have the individual components built, you need to find a way to put them together to solve our original problem. Unless you are really good or really lucky, you probably won’t have a perfect solution the first time so you will need to apply successive refinement to your design. While you are pursuing any one approach, you need to keep alternate strategies in mind in case what you are pursuing does not work out. One of the most important advice I can give you is that work on the things you know how to do first. As you make forward progress, a lot of the unknowns will become clear. If you sit around and wait until you know everything before you start, you will never get anything done. +2 = 15 min. (X:55)