CS35101- Computer Architecture Week 10: Single Cycle Implementation Paul Durand ( www.cs.kent.edu/~durand ) [Adapted from M Irwin (www.cse.psu.edu/~mji) ] [Adapted from COD, Patterson & Hennessy, © 2005, UCB]
Head’s Up This week’s material Next week’s material Building a MIPS datapath Single cycle datapath implementation & control Reading assignment – PH 5.1-5.4 Next week’s material Multiple cycle datapath implementation & control Reading assignment – PH 5.6-5.7
Review: Design Principles Simplicity favors regularity fixed size instructions – 32-bits only three instruction formats Good design demands good compromises three instruction formats Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes Make the common case fast arithmetic operands from the register file (load-store machine) allow instructions to contain immediate operands
The Processor: Datapath & Control We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic implementation: use the program counter (PC) to supply the instruction address and fetch the instruction from memory (and update the PC) decode the instruction (and read registers) execute the instruction All instructions (except j) use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? Fetch PC = PC+4 Decode Exec memory reference use ALU to compute addresses arithmetic use the ALU to do the require arithmetic control use the ALU to compute branch conditions.
Abstract Implementation View Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential) Single cycle operation Split memory (Harvard) model - one memory for instructions and one for data Write Data Instruction Memory Address Read Data Register File Reg Addr Data Memory Read Data PC Address Instruction ALU Reg Addr Read Data Write Data Reg Addr Have the class tell which is which in the picture - combinational and sequential
Single Cycle Implementation – where we are headed
Clocking Methodologies Clocking methodology defines when signals can be read and when they can be written falling (negative) edge rising (positive) edge cycle time clock rate = 1/(cycle time) e.g., 10 nsec cycle time = 100 MHz clock rate 1 nsec cycle time = 1 GHz clock rate State element design choices level sensitive latch master-slave and edge-triggered flipflops
Review: State Elements Set-reset latch Level sensitive D latch latch is transparent when clock is high (copies input to output) R S Q(t+1) !Q(t+1) 1 Q(t) !Q(t) R Q !Q S clock D Q clock !Q D Q
Review: State Elements, con’t Race problem with latch based design … Consider the case when D-latch0 holds a 0 and D- latch1 holds a 1 and you want to transfer the contents of D-latch0 to D-latch1 and vice versa must have the clock high long enough for the transfer to take place must not leave the clock high so long that the transferred data is copied back into the original latch Two-sided clock constraint D Q D Q D-latch0 D-latch1 clock !Q clock !Q clock
Review: State Elements, con’t Solution is to use flipflops that change state (Q) only on clock edge (master-slave) master (first D-latch) copies the input when the clock is high (the slave (second D-latch) is locked in its memory state and the output does not change) slave copies the master when the clock goes low (the master is now locked in its memory state so changes at the input are not loaded into the master D-latch) One-sided clock constraint must have the clock cycle time long enough to accommodate the worst case delay path D clock Q !Q D-latch D clock Q Also have set-up and hold-times to deal with
Our Implementation An edge-triggered methodology Typical execution read contents of some state elements send values through some combinational logic write results to one or more state elements Assumes state elements are written on every clock cycle; if not, need explicit write control signal write occurs only when both the write control is asserted and the clock edge occurs State element 1 State element 2 Combinational logic clock one clock cycle
Fetching Instructions Fetching instructions involves reading the instruction from the Instruction Memory updating the PC to hold the address of the next instruction PC is updated every cycle, so it does not need an explicit write control signal Instruction Memory is read every cycle, so it doesn’t need an explicit read control signal Add 4 Instruction Memory Read Address PC Instruction
Decoding Instructions Decoding instructions involves sending the fetched instruction’s opcode and function field bits to the control unit Control Unit Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 Instruction reading two values from the Register File Register File addresses are contained in the instruction
Executing R Format Operations R format operations (add, sub, slt, and, or) perform the indicated (by op and funct) operation on values in rs and rt store the result back into the Register File (into location rd) Note that Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File R-type: 31 25 20 15 5 op rs rt rd funct shamt 10 RegWrite ALU control Read Addr 1 Read Data 1 Register File Read Addr 2 overflow Instruction zero ALU Write Addr Read Data 2 Write Data
Executing Load and Store Operations compute a memory address by adding the base register (in rs) to the 16-bit signed offset field in the instruction base register was read from the Register File during decode offset value in the low order 16 bits of the instruction must be sign extended to create a 32-bit signed value store value, read from the Register File during decode, must be written to the Data Memory load value, read from the Data Memory, must be stored in the Register File I-Type: op rs rt address offset 31 25 20 15
Executing Load and Store Operations, con’t Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU overflow zero ALU control RegWrite Data Memory Address Read Data Sign Extend MemWrite MemRead For class handout
Executing Load and Store Operations, con’t Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU overflow zero ALU control RegWrite Data Memory Address Read Data Sign Extend MemWrite MemRead 16 32 For lecture
Executing Branch Operations Branch operations have to compare the operands read from the Register File during decode (rs and rt values) for equality (zero ALU output) compute the branch target address by adding the updated PC to the sign extended16-bit signed offset field in the instruction “base register” is the updated PC offset value in the low order 16 bits of the instruction must be sign extended to create a 32-bit signed value and then shifted left 2 bits to turn it into a word address I-Type: op rs rt address offset 31 25 20 15
Executing Branch Operations, con’t Add Branch target address Add 4 Shift left 2 ALU control PC zero (to branch control logic) Read Addr 1 Read Data 1 Register File Read Addr 2 Instruction ALU Write Addr For class handout Read Data 2 Write Data Sign Extend 16 32
Executing Branch Operations, con’t Add Branch target address Add 4 Shift left 2 ALU control PC zero (to branch control logic) Read Addr 1 Read Data 1 Register File Read Addr 2 Instruction ALU Write Addr For lecture Read Data 2 Write Data Sign Extend 16 32
Executing Jump Operations Jump operations have to replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits 31 25 J-Type: op jump target address Add 4 4 Jump address Instruction Memory Shift left 2 28 Read Address PC Instruction 26
Our Simple Control Structure We wait for everything to settle down ALU might not produce “right answer” right away we use write signals along with the clock edge to determine when to write (to the Register File and the Data Memory) Cycle time determined by length of the longest path We are ignoring some details like register setup and hold times
Creating a Single Datapath from the Parts Assemble the datapath segments discussed earlier, add control lines as needed, and design the control path Fetch, decode and execute each instructions in one clock cycle – single cycle design no datapath resource can be used more than once per instruction, so some must be duplicated (e.g., why we have a separate Instruction Memory and Data Memory) to share datapath elements between two different instruction classes will need multiplexors at the input of the shared elements with control lines to do the selection Cycle time is determined by length of the longest path
Fetch, R, and Memory Access Portions Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 ALU ovf zero ALU control RegWrite Data Read Data MemWrite MemRead Sign Extend 16 32
Multiplexor Insertion MemtoReg Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 ALU ovf zero ALU control RegWrite Data Read Data MemWrite MemRead Sign Extend 16 32 ALUSrc
Clock Distribution clock cycle System Clock MemtoReg ovf zero Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 ALU ovf zero ALU control RegWrite Data Read Data MemWrite MemRead Sign Extend 16 32 ALUSrc
Adding the Branch Portion Read Address Instruction Memory Add PC 4 Shift left 2 Add PCSrc Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU ovf zero ALU control RegWrite Data Memory Address Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc
Adding the Control Observations Selecting the operations to perform (ALU, Register File and Memory read/write) Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction R-type: 31 25 20 15 5 op rs rt rd funct shamt 10 Observations op field always in bits 31-26 addr of two registers to be read are always specified by the rs and rt fields (bits 25-21 and 20-16) addr. of register to be written is in one of two places – in rt (bits 20-16) for lw; in rd (bits 15-11) for R-type instructions base register for lw and sw always in rs (bits 25-21) offset for beq, lw, and sw always in bits 15-0 I-Type: op rs rt address offset 31 25 20 15
(Almost) Complete Single Cycle Datapath Add Add 1 4 Shift left 2 PCSrc RegDst 1 Register File Read Data 1 Data 2 RegWrite Sign Extend 16 32 ALUSrc MemWrite MemtoReg ovf Instr[25-21] Read Addr 1 Instruction Memory Address Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU Write Addr Instr[15 -11] Write Data Write Data 1 ALU control ALUOp Instr[5-0] Instr[15-0] MemRead
ALU Control ALU's operation based on instruction type and function code Why is the code for subtract 110 and not 011? ALU control input Function 000 and 001 or 010 add 110 subtract 111 set on less than
ALU Control, Con’t Controlling the ALU makes use of multiple levels of decoding main control unit generates the ALUOp bits ALU control unit generates ALU control inputs Instr op funct ALUOp desired action ALU control input lw xxxxxx 00 sw beq 01 add 100000 10 010 subt 100010 subtract 110 and 100100 000 or 100101 001 slt 101010 111 For class handout
ALU Control, Con’t Controlling the ALU makes use of multiple levels of decoding main control unit generates the ALUOp bits ALU control unit generates ALU control inputs Instr op funct ALUOp desired action ALU control input lw xxxxxx 00 sw beq 01 add 100000 10 010 subt 100010 subtract 110 and 100100 000 or 100101 001 slt 101010 111 add 010 subtract 110 For lecture
ALU Control Truth Table F5 F4 F3 F2 F1 F0 ALUOp1 ALUOp0 Op2 Op1 Op0 X 1 For class handout Can make use of more don’t cares since ALUOp does not use the encoding 11 since F5 and F4 are always 10 Logic comes from the K-maps …
ALU Control Truth Table F5 F4 F3 F2 F1 F0 ALUOp1 ALUOp0 Op2 Op1 Op0 X 1 X For lecture Can make use of more don’t cares since ALUOp does not use the encoding 11 since F5 and F4 are always 10 Logic comes from the K-maps …
ALU Control Combinational Logic From the truth table can design the ALU Control logic
(Almost) Complete Datapath with Control Unit Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU Note mux control inputs have been swapped (for three of the muxes) from the last picture to be consistent with the book. Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Main Control Unit Instr RegDst ALUSrc MemReg RegWr MemRd MemWr Branch ALUOp1 ALUOp2 R-type 000000 lw 100011 sw 101011 beq 000100 For class handout Completely determined by the instruction opcode field Note that a multiplexor whose control input is 0 has a definite action, even if it is not used in performing the operation
R-type Instruction Data/Control Flow Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For class handout – have a student come forward and mark the connections in the datapath that are active. And show the state of the control lines. Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
R-type Instruction Data/Control Flow Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For lecture Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Store Word Instruction Data/Control Flow Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For class handout – have a student come forward and mark the connections in the datapath that are active. And show the state of the control lines. Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Store Word Instruction Data/Control Flow Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For lecture Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Load Word Instruction Data/Control Flow Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For class handout – have a student come forward and mark the connections in the datapath that are active. And show the state of the control lines. Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Load Word Instruction Data/Control Flow Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For lecture Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Branch Instruction Data/Control Flow Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For class handout – have a student come forward and mark the connections in the datapath that are active. And show the state of the control lines. Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Branch Instruction Data/Control Flow Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For lecture Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Main Control Unit Instr RegDst ALUSrc MemReg RegWr MemRd MemWr Branch ALUOp1 ALUOp0 R-type 000000 1 X lw 100011 sw 101011 beq 000100 For lecture Completely determined by the instruction opcode field Note that a multiplexor whose control input is 0 has a definite action, even if it is not used in performing the operation
Control Unit Logic From the truth table can design the Main Control logic Instr[31] Instr[30] Instr[29] Instr[28] Instr[27] Instr[26] R-type lw sw beq RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0
Review: Handling Jump Operations Jump operation have to replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits 31 J-Type: op jump target address Add 4 4 Jump address Instruction Memory Shift left 2 28 Read Address PC Instruction 26
Adding the Jump Operation Instr[25-0] 1 Shift left 2 28 32 26 PC+4[31-28] Add Add 1 4 Shift left 2 PCSrc Jump ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC For class handout Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Adding the Jump Operation Instr[25-0] 1 Shift left 2 28 32 26 PC+4[31-28] Add Add 1 4 Shift left 2 PCSrc Jump ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC For lecture Good exam questions Add jalr rs,rd 0 rs 0 rd 0 9 jump to instr whose addr is in rs and save addr of next inst (PC+4) in rd Add the PowerPC addressing modes of update addressing and indexed addressing (will have to expand the RegFile to be three read port and two write port) Add andi, ori, addi - have to have both a signextend and a zeroextend and choose between the two, will have to augment the ALUop encoding (since can’t get the op information out of the funct bits as with R-type) Add mult rs, rt with the result being left in hi|lo - so also include the mfhi and mflo instructions (will have to add a multiplier, the hi and lo registers and then a couple of muxes and their control). Add barrel shifter Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Single Cycle Implementation Cycle Time Unfortunately, though simple, the single cycle approach is not used because it is inefficient Clock cycle must have the same length for every instruction What is the longest path (slowest instruction)?
Instruction Critical Paths Calculate cycle time assuming negligible delays (for muxes, control unit, sign extend, PC access, shift left 2, wires) except: Instruction and Data Memory (2ns) ALU and adders (2ns) Register File access (reads or writes) (1ns) Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total R-type load store beq jump
Instruction Critical Paths Calculate cycle time assuming negligible delays (for muxes, control unit, sign extend, PC access, shift left 2, wires) except: Instruction and Data Memory (2ns) ALU and adders (2ns) Register File access (reads or writes) (1ns) Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total R-type load store beq jump 2 1 6 2 1 8 Note that PC is updated during I Mem read 2 1 7 2 1 5 2
Where We are Headed Problems with single cycle datapath design uses clock cycle inefficiently and what if we had a more complicated instruction like floating point multiply? wasteful of area Another approach use a “smaller” cycle time have different instructions take different numbers of cycles a “multicycle” datapath: Address Read Data (Instr. or Data) Memory PC Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU IR MDR A B ALUout