ELEN 350 Multi-Cycle Datapath Adapted from the lecture notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)
Abstract View of our single cycle processor °looks like a FSM with PC as state PC Next PC Register Fetch ALU Reg. Wrt Mem Access Data Mem Instruction Fetch ALUctr RegDst ALUSrc ExtOp MemWr Equal nPC_sel RegWr MemWr MemRd Control Unit op fun Ext
What’s wrong with our CPI=1 processor? °All instructions take as much time as the slowest °Long Cycle Time °Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle PCInst Memory mux ALUData Mem mux PCReg FileInst Memory mux ALU mux PCInst Memory mux ALUData Mem PCInst Memorycmp mux Reg File Arithmetic & Logical Load Store Branch Critical Path setup
Reducing Cycle Time °Cut combinational dependency graph and insert register / latch °Do same work in two fast cycles, rather than one slow one °May be able to short-circuit path and remove some components for some instructions! storage element Combinational Logic storage element Combinational Logic (A) storage element Combinational Logic (B)
Partitioning the Singlecycle Datapath °Add registers between smallest steps °Place enables on all registers PC Next PC Operand Fetch Exec Reg. File Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemWr MemRd Equal
Example Multicycle Datapath °Critical Path ? PC Next PC Operand Fetch Instruction Fetch nPC_sel IR Reg File Ext ALU Reg. File Mem Acces s Data Mem Result Store RegDst RegWr MemWr MemRd S M MemToReg Equal ALUctr ALUSrc ExtOp A B E
R-type (add, sub,...) °Instruction °Register Transfers inst Logical Register Transfers ADDUR[rd] <– R[rs] + R[rt]; PC <– PC + 4 cycle Register Transfers 1. IR <– MEM[pc] ADDU 2. A<– R[rs]; B <– R[rt] 3. S <– A + B 4. R[rd] <– S; PC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem Time A B E
Logical immed °Instruction °Register Transfers ORIR[rt] <– R[rs] OR ZExt(Im16); PC <– PC + 4 cycle Register Transfers 1. IR <– MEM[pc] ORI2. A<– R[rs]; B <– R[rt] 3. S <– A or ZExt(Im16) 4. R[rt] <– S; PC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem Time A B E
Load °Instruction °Register Transfers LWR[rt] <– MEM[R[rs] + SExt(Im16)]; PC <– PC + 4 cycle Register Transfers 1. IR <– MEM[pc] LW2. A<– R[rs]; B <– R[rt] 3. S <– A + SExt(Im16) 4. M <– MEM[S] 5. R[rd] <– M; PC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem A B E Time
Store °Instruction °Register Transfers SWMEM[R[rs] + SExt(Im16)] <– R[rt]; PC <– PC + 4 inst Register Transfers IR <– MEM[pc] SWA<– R[rs]; B <– R[rt] S <– A + SExt(Im16); MEM[S] <– BPC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem A B E Time
Branch °Instruction °Register Transfers BEQif R[rs] == R[rt] then PC <= PC + 4+SExt(Im16) || 00 else PC <= PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem inst Register Transfers IR <– MEM[pc] BEQE<– (R[rs] = R[rt]) if !E then PC <– PC + 4 else PC <– PC+4+SExt(Im16)||00 A B E Time
Performance Evaluation °What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type TypeCPI i for typeFrequency CPI i x freqI i Arith/Logic440%1.6 Load530%1.5 Store410%0.4 branch320%0.6 Average CPI:4.1
Verilog Implementation (IM) module IM(IR, PC, clk, IRen); output [31:0] IR; input [31:0] PC; input clk, IRen; reg [31:0] IR; reg [31:0] mem[0:1023]; wire [31:0] IR_next; // OK, but slow // clk) // IR = mem[PC[12:2]]; assign IR_next = mem[PC[12:2]]; clk) if (IRen) IR = IR_next; endmodule IR Inst. Mem PC
Verilog Implementation (REGS) module REGS(A, B, E, RA, RB, RW, W, RegWr, clk, REGSen); output [31:0] A, B; output E; // A == B input [4:0] RA, RB, RW; input [31:0] W; input RegWr, clk, REGSen; reg [31:0] A, B; reg E; wire E_next; reg [31:0] regs[0:31]; assign E_next = (A_next == B_next) ? 1 : 0; clk) begin if (REGSen == 1) begin A = regs[RA]; B = regs[RB]; E = E_next; if (RegWr == 1’b1) regs[RW] = W; regs[0] = 0; end end endmodule Reg File A B E
Verilog Implementation (ALU) module ALU(S, A, B, ALUCtr, clk, ALUen); output [31:0] S; input [31:0] A, B; input [2:0] ALUCtr; input clk, ALUen; reg [31:0] S, S_next; or B or ALUCtr) begin if (ALUCtr == 3'h0) S_next = A + B;... end clk) begin if (ALUen == 1) S = S_next; end endmodule Exec S A B
Control °State specifies control points for Register Transfer °Transfer occurs upon entering state (rising edge) Current State Next State Logic Output Logic inputs Output control signals
State Machine for multicycle MIPS IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] E <= R[rt]==R[rs] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC,Equal) SW “start / instruction fetch” “decode / operand fetch” Execute Memory Write-back
State Machine that Generates Control Signals IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC,Equal) SW “start, instruction fetch” “decode” IRen ALUCtr, ALUen RegDst, RegWr, PCen REGSen Execute Memory Write-back
State Machine Implementation in Verilog 1 module CTRL(clk, rst, opcode, IRen, REGSen, ALUen, ALUCtr, REGDst, REGWr, PCen); input clk, rst; input [5:0] opcode; output IRen, REGSen, ALUen, ALUCtr, REGDst, REGWr, PCen; reg [3:0] state, next_state; reg IRen, REGSen, ALUen, ALUCtr, REGDst, REGWr, PCen; parameter [3:0] START = 0, DECODE = 1, RTYPE_1 = 2, RTYPE_2 = 3; // other states omitted
State Machine in Verilog 2 (posedge clk or negedge rst) begin if (!rst) state = START; else state = next_state; // asynchronous reset end (opcode or state) begin case (state) START: state_next = DECODE; DECODE: if (opcode == 6’h00) state_next = RTYPE_1; else if (opcode == 6’h02) state_next = ORI; else if (opcode == 6’h32) state_next = LW; // other states omitted RTYPE_1: state_next = RTYPE_2; RTYPE_2: state_next = START; endcase end
State Machine in Verilog 3 assign IRen = (state == START) ? 1 : 0; assign REGSen = (state == DECODE) ? 1 : 0; assign ALUen = (state == RTYPE_1 || state == ORI || state == LW || state == SW) ? 1 : 0;
Assigning States IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC) SW “start, instruction fetch” “decode” Execute Memory Write-back
(Mostly) Detailed Control Specification (missing 0) 0000??????? BEQx R-typex ORIx LWx SWx xxxxxx x 0 x 0011xxxxxx x 0 x 0100xxxxxxx fun xxxxxxx xxxxxxx or xxxxxxx xxxxxxx add xxxxxxx xxxxxxx xxxxxxx add xxxxxxx StateOp fieldEqNext IRPCOpsExecMemWrite-Back en selA B EEx Sr ALU S R W MM-R Wr Dst R: ORi: LW: SW: -all same in Moore machine BEQ:
Controller Design Alternative: Microprogramming °The state machines defining the controller for an instruction set processor are highly structured °Use this structure to construct a simple “microsequencer” °Control reduces to programming this very simple device microprogramming sequencer control datapath control micro-PC sequencer microinstruction