Building Blocks for a CPU UCSD CSE 141 Larry Carter Winter, 2002 Building Blocks for a CPU 2/1/02 CSE 141 - CPU components
Designing a Processor The Five Classic Components of a Computer UCSD CSE 141 Larry Carter Winter, 2002 Designing a Processor The Five Classic Components of a Computer Processor = Datapath + Control Processor Input Control Memory Datapath Output Before we go any further, let’s step back for a second and take a look at the big picture. All computer consist of five components: (1) Input and (2) output devices. (3) The Memory System. And the (4) Control and (5) Datapath of the Processor. Today’s lecture covers the datapath design. In Friday’s lecture, we will show you how to design the processor’s control unit. +1 = 5 min. (X:45) CSE 141 - CPU components
Middle third of course We’ll implement core MIPS processor three ways: Single cycle implementation Multi-cycle implementation (reduces hardware) Pipelined implementation (improves throughput) But first, we’ll review the building blocks. CSE 141 - CPU components
Two types of logic components Combinational Logic Acyclic – there are no loops in the circuit Output depends only on the current input values (after enough time has elapsed for circuit to stabilize) State elements The output can depend on previous history a a|~(b|c) b c a if a=1, then x=0 if a=0 & b=1, then x=1 if a=b=0, x=stored value x b CSE 141 - CPU components
Some combinational logic blocks Simple gates: and, or, not, nor, nand, xor Multiplexor: control (c) chooses which input to pass through to output lines may be multi-bit busses Decoder: k-bit input selects which of 2k outputs is set to “1”. a 1 a(if c=0) b(if c=1) b c 1 if a=00 1 if a=01 1 if a=10 1 if a=11 a CSE 141 - CPU components
More combinational logic blocks Adder: Here, lines represent multi-bit busses Arithmetic Logic Unit: control (c) chooses which operation will be used op can be +, -, shift, xor, etc. a b a+b c a b a op b CSE 141 - CPU components
3 ways to make combinational circuit Given a truth table of some function from N input bits to M output bits, you can implement it using: Random Logic Build function up from simple gates May have long paths from input to output PLA (Programmable Logic Array) Implements function as sum-of-products PLA is 2 logic levels deep (3 if you count inverters) ROM (Read-Only Memory) Use memory holding 2N M-bit values Each memory cell holds output for one input combination CSE 141 - CPU components
PLA’s in 1 in 2 in 3 Each vertical wire is an “and” of selected inputs (or their negations) Each output is an “or” of selected vertical wires out 1 out 2 out 3 out 4 in1 & in3 (in1&in3) | (~in2&~in3) ~in2 & ~in3 CSE 141 - CPU components
Example: 3-bit adder 000 001 010 011 100 101 110 111 1 Inputs Output 1 Carry This space intentionally left black CSE 141 - CPU components
Which is best depends on concerns Speed: Random logic might be slow (signal can go many levels) PLA can be the fastest (only 3 gates deep) Size: ROM is usually the largest (it always needs 2N cells) PLA often similar to random logic; not always Consider parity (mod-2 sum) of N inputs: Random logic needs N-1 XOR gates (or 3N-3 NAND’s) PLA needs 2N-1 product terms (one for each “1” output) Ease of implementation: ROM (esp. PROM = programmable ROM) is easy to change PLA’s are convenient too CSE 141 - CPU components
State Elements D Latch: When latch is “open”, output = data D flip-flop: Output only changes at clock edge data & output & clk data D latch D latch output clk CSE 141 - CPU components
Storage Element: Register UCSD CSE 141 Larry Carter Winter, 2002 Storage Element: Register Register Like a D Flip-Flop except N-bit input and output there are really N flip-flops Write Enable input Write Enable: 0: Data in register will not change 1: Data Out becomes Data In (on the clock edge) N Data In N flip- flops Data Out & Clk N Write Enable As far as storage elements are concerned, we will need a N-bit register that is similar to the D flip-flop I showed you in class. The significant difference here is that the register will have a Write Enable input. That is the content of the register will NOT be updated if Write Enable is zero. The content is updated at the clock tick ONLY if the Write Enable signal is set to 1. +1 = 31 min. (Y:11) CSE 141 - CPU components
Register File for MIPS We need 32 reg’s and 3 ports: UCSD CSE 141 Larry Carter Winter, 2002 Register File for MIPS RW RA RB Write Enable We need 32 reg’s and 3 ports: Two 32-bit output buses: (A& B) One 32-bit input bus: (W) Register selection: RA selects the register to put on busA RB selects the register to put on busB RW selects the register to be written via busW when Write Enable is 1 What happens if RW = RA and WriteEnable=1 ?? 5 5 5 busA busW 32 32 32-bit Registers 32 busB Clk 32 We will also need a register file that consists of 32 32-bit registers with two output busses (busA and busB) and one input bus. The register specifiers Ra and Rb select the registers to put on busA and busB respectively. When Write Enable is 1, the register specifier Rw selects the register to be written via busW. In our simplified version of the register file, the write operation will occurs at the clock tick. Keep in mind that the clock input is a factor ONLY during the write operation. During read operation, the register file behaves as a combinational logic block. That is if you put a valid value on Ra, then bus A will become valid after the register file’s access time. Similarly if you put a valid value on Rb, bus B will become valid after the register file’s access time. In both cases (Ra and Rb), the clock input is not a factor. +2 = 33 min. (Y:13) CSE 141 - CPU components
Implementing read ports address 1 M u x Register 0 Register 1 ... Register 31 read data 1 read address 2 M u x read data 2 CSE 141 - CPU components
Implementing the write port clk Register 0 Register 1 ... Register 31 & & decoder write address & & write data CSE 141 - CPU components
Storage Element: Memory UCSD CSE 141 Larry Carter Winter, 2002 Storage Element: Memory Memory One input bus: Data In One output bus: Data Out Memory word is selected by: If Write Enable = 0, memory location selected byAddress is put on Data Out bus If Write Enable = 1, the memory location selected by the Address is overwritten by Data In Clock input (CLK) The CLK input is used ONLY during write operation For read, memory acts as combinational logic: Address valid Data Out valid after “access time.” Write Enable Address Data In DataOut 32 32 Clk The last storage element you will need for the datapath is the idealized memory to store your data and instructions. This idealized memory block has just one input bus (DataIn) and one output bus (DataOut). When Write Enable is 0, the address selects the memory word to put on the Data Out bus. When Write Enable is 1, the address selects the memory word to be written via the DataIn bus at the next clock tick. Once again, the clock input is a factor ONLY during the write operation. During read operation, it behaves as a combinational logic block. That is if you put a valid value on the address lines, the output bus DataOut will become valid after the access time of the memory. +2 = 35 min. (Y:15) CSE 141 - CPU components
UCSD CSE 141 Larry Carter Winter, 2002 Clocking Methodology Clk Setup Hold Setup Hold Don’t Care . . Remember, we will be using a clocking methodology where all storage elements are clocked by the same clock edge. Consequently, our cycle time will be the sum of: (a) The Clock-to-Q time of the input registers. (b) The longest delay path through the combinational logic block. (c) The set up time of the output register. (d) And finally the clock skew. In order to avoid hold time violation, you have to make sure this inequality is fulfilled. +2 = 18 min. (X:58) All storage elements are clocked by same clock edge Combinational logic between storage elements must settle to correct output values in time indicated by dark bar. CSE 141 - CPU components
Computer building block of the day CORE STORAGE UCSD CSE 141 Larry Carter Winter, 2002 Computer building block of the day CORE STORAGE Mercury delay lines (Univac I’s storage) were 100x cheaper than vacuum tubes. Replaced by CRT memory (similar, using light instead of sound). But memory was still expensive and unreliable. “Cores” (little donuts) of certain materials are interesting: If you pass enough current through, it magnetizes “0” or “1” If you pass less current through a magnetized core, it sends a pulse down a second wire but doesn’t change. Led to invention of “core storage”: 2-D arrays of cores. “Read” by sending half-critical current through row, sensing column “Write” selected core with half-critical current through row & column. Cores used on were on order of .2 inch in diameter; CSE 141 - CPU components
Core storage used on Whirlwind computer developed at MIT in early 50’s .14 inch in diameter 2048 16-bit words of storage Cores improved steadily over next 20 years .03” core with .019” hole. 4 wires passed through each (X,Y, inhibit, sense) Speeds around 1 microsecond And the inevitable patent problems MIT got 2 cents per core. IBM made billion cores/year In 1964, IBM paid one-time fee of $13M – biggest patent payment to date. CSE 141 - CPU components