COMP541 Datapaths I Montek Singh Mar 28, 2012
Topics Over next 2 classes: datapaths How ALUs are designed How data is stored in a register file Lab 9: Start building a datapath!
What is computer architecture?
Architecture (ISA) Jumping up a few levels of abstraction. Architecture: the programmer’s view of the computer Defined by instructions (operations) and operand locations Microarchitecture: how to implement an architecture in hardware
MIPS Machine Language Three instruction formats: R-Type: register operands I-Type: immediate operand J-Type: for jumps
R-Type instructions Register-type 3 register operands: Other fields: rs, rt: source registers rd: destination register Other fields: op: the operation code or opcode (0 for R-type instructions) funct: the function together, op and funct tell the computer which operation to perform shamt: the shift amount for shift instructions, otherwise it is 0
R-Type Examples Note the order of registers in the assembly code: add rd, rs, rt
I-Type instructions Immediate-type 3 operands: op: the opcode rs, rt: register operands imm: 16-bit two’s complement immediate
I-Type Examples Note the differing order of registers in the assembly and machine codes: addi rt, rs, imm lw rt, imm(rs) sw rt, imm(rs)
J-Type instructions Jump-type 26-bit address operand (addr) Used for jump instructions (j)
Review: Instruction Formats
Microarchitecture Microarchitecture: how to implement an architecture in hardware This is sometimes just called implementation Processor: Datapath: functional blocks Control: control signals
Parts of CPUs Datapath Control unit The registers and logic to perform operations on them Control unit Generates signals to control datapath
Memory and I/O Memories are connected to the data/control in and out lines Example: register to memory ops Will discuss I/O arrangements later
Basic Datapath Basic components of the CPU datapath PC, Instruction Memory, Register File, ALU, Data Memory Copyright © 2007 Elsevier
First: A “lightweight” ALU Arithmetic Logic Unit = ALU
Lightweight ALU A lightweight ALU from textbook: F2:0 Function 000 3-bit function select (7 functions) F2:0 Function 000 A & B 001 A | B 010 A + B 011 not used 100 A & ~B 101 A | ~B 110 A - B 111 SLT
Lightweight ALU: Internals (light-weight version) F2:0 Function 000 A & B 001 A | B 010 A + B 011 not used 100 A & ~B 101 A | ~B 110 A - B 111 SLT
Set Less Than (SLT) Example Configure a 32-bit ALU for the set if less than (SLT) operation. Suppose A = 25 and B = 32. A is less than B, so we expect Y to be the 32-bit representation of 1 (0x00000001). For SLT, F2:0 = 111. F2 = 1 configures the adder unit as a subtracter. So 25 - 32 = -7. The two’s complement representation of -7 has a 1 in the most significant bit, so S31 = 1. With F1:0 = 11, the final multiplexer selects Y = S31 (zero extended) = 0x00000001. 1 bit (MSB)
Next: A “full-feature” ALU
Arithmetic Logic Unit (ALU) Full-feature ALU from COMP411: A B 5-bit ALUFN Sub Bool Shft Math OP 0 XX 0 1 A+B 1 XX 0 1 A-B X X0 1 1 0 X X1 1 1 1 X 00 1 0 B<<A X 10 1 0 B>>A X 11 1 0 B>>>A X 00 0 0 A & B X 01 0 0 A | B X 10 0 0 A ^ B X 11 0 0 A | B Add/Sub Bidirectional Barrel Shifter Boolean Sub Bool 0 1 Shft 1 0 Math … 1 0 Flags V,C N Flag R Z Flag
Shifting Logic Shifting is a common operation For example: applied to groups of bits used for alignment used for “short cut” arithmetic operations X << 1 is often the same as 2*X X >> 1 can be the same as X/2 For example: X = 2010 = 000101002 Left Shift: (X << 1) = 001010002 = 4010 Right Shift: (X >> 1) = 000010102 = 1010 Signed or “Arithmetic” Right Shift: (-X >>> 1) = (111011002 >>> 1) = 111101102 = -1010 1 R7 R6 R5 R4 R3 R2 R1 R0 X7 X6 X5 X4 X3 X2 X1 X0 “0” SHL1
Shifting Logic How do you shift by more than 1 position? feed other bits into the multiplexer e.g., left-shift-by-2 multiplexer for Rk receives input from Xk-2 How do you allow the shift amount to be specified dynamically? need a bigger multiplexer shift amount is applied as the select input will design in class and lab
Boolean Operations It will also be useful to perform logical operations on groups of bits. Which ones? ANDing is useful for “masking” off groups of bits. ex. 10101110 & 00001111 = 00001110 (mask selects last 4 bits) ANDing is also useful for “clearing” groups of bits. ex. 10101110 & 00001111 = 00001110 (0’s clear first 4 bits) ORing is useful for “setting” groups of bits. ex. 10101110 | 00001111 = 10101111 (1’s set last 4 bits) XORing is useful for “complementing” groups of bits. ex. 10101110 ^ 00001111 = 10100001 (1’s invert last 4 bits) NORing is useful for.. uhm… ex. 10101110 # 00001111 = 01010000 (0’s invert, 1’s clear)
Boolean Unit It is simple to build up a Boolean unit using primitive gates and a mux to select the function. Since there is no interconnection between bits, this unit can be simply replicated at each position. The cost is about 7 gates per bit. One for each primitive function, and approx 3 for the 4-input mux. Ai Bi Qi Bool 00 01 10 11 This logic block is repeated for each bit (i.e. 32 times)
An ALU at last! Full-feature ALU from COMP411: A B R 5-bit ALUFN Sub Bool Shft Math OP 0 XX 0 1 A+B 1 XX 0 1 A-B X X0 1 1 0 X X1 1 1 1 X 00 1 0 B<<A X 10 1 0 B>>A X 11 1 0 B>>>A X 00 0 0 A & B X 01 0 0 A | B X 10 0 0 A ^ B X 11 0 0 A | B Add/Sub Bidirectional Barrel Shifter Boolean Sub Bool 0 1 Shft 1 0 Math … 1 0 Flags V,C N Flag R Z Flag
Which one do we implement? We will use the full-feature one! slightly more challenging … I will help you! … but a lot more fun to use supports much more useful set of instructions for your final programming project
Processor Architecture Rather, “microarchitecture” or implementation
Microarchitectures Multiple implementations for a single architecture: Single-cycle Each instruction executes in a single cycle Multicycle Each instruction is broken up into a series of shorter steps Pipelined Each instruction is broken up into a series of steps Multiple instructions execute at once. Directly impacts performance obtained
Processor Performance Program execution time Execution Time = (# instructions) (cycles/instruction)(seconds/cycle) Definitions: Cycles/instruction = CPI Seconds/cycle = clock period 1/CPI = Instructions/cycle = IPC Challenge is to satisfy constraints of: Cost Power Performance
MIPS Processor We will consider a subset of MIPS instructions (in book & lab): R-type instructions: and, or, add, sub, slt, … Memory instructions: lw, sw, … Branch instructions: beq, … Some immediate instructions too: addi, … Jumps as well: j, …
Next Next class: Lab Friday (March 30) We’ll look at single cycle MIPS Then the more complex versions Lab Friday (March 30) Demo your graphics displays (Lab 8) Start on Lab 9 (will post on website by Fri) start building the datapath! ALU Registers