Download presentation
Presentation is loading. Please wait.
1
ELEN 350 Multi-Cycle Datapath Adapted from the lecture notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)
2
Abstract View of our single cycle processor °looks like a FSM with PC as state PC Next PC Register Fetch ALU Reg. Wrt Mem Access Data Mem Instruction Fetch ALUctr RegDst ALUSrc ExtOp MemWr Equal nPC_sel RegWr MemWr MemRd Control Unit op fun Ext
3
What’s wrong with our CPI=1 processor? °All instructions take as much time as the slowest °Long Cycle Time °Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle PCInst Memory mux ALUData Mem mux PCReg FileInst Memory mux ALU mux PCInst Memory mux ALUData Mem PCInst Memorycmp mux Reg File Arithmetic & Logical Load Store Branch Critical Path setup
4
Reducing Cycle Time °Cut combinational dependency graph and insert register / latch °Do same work in two fast cycles, rather than one slow one °May be able to short-circuit path and remove some components for some instructions! storage element Combinational Logic storage element Combinational Logic (A) storage element Combinational Logic (B)
5
Partitioning the Singlecycle Datapath °Add registers between smallest steps °Place enables on all registers PC Next PC Operand Fetch Exec Reg. File Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemWr MemRd Equal
6
Example Multicycle Datapath °Critical Path ? PC Next PC Operand Fetch Instruction Fetch nPC_sel IR Reg File Ext ALU Reg. File Mem Acces s Data Mem Result Store RegDst RegWr MemWr MemRd S M MemToReg Equal ALUctr ALUSrc ExtOp A B E
7
R-type (add, sub,...) °Instruction °Register Transfers inst Logical Register Transfers ADDUR[rd] <– R[rs] + R[rt]; PC <– PC + 4 cycle Register Transfers 1. IR <– MEM[pc] ADDU 2. A<– R[rs]; B <– R[rt] 3. S <– A + B 4. R[rd] <– S; PC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem Time A B E
8
Logical immed °Instruction °Register Transfers ORIR[rt] <– R[rs] OR ZExt(Im16); PC <– PC + 4 cycle Register Transfers 1. IR <– MEM[pc] ORI2. A<– R[rs]; B <– R[rt] 3. S <– A or ZExt(Im16) 4. R[rt] <– S; PC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem Time A B E
9
Load °Instruction °Register Transfers LWR[rt] <– MEM[R[rs] + SExt(Im16)]; PC <– PC + 4 cycle Register Transfers 1. IR <– MEM[pc] LW2. A<– R[rs]; B <– R[rt] 3. S <– A + SExt(Im16) 4. M <– MEM[S] 5. R[rd] <– M; PC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem A B E Time
10
Store °Instruction °Register Transfers SWMEM[R[rs] + SExt(Im16)] <– R[rt]; PC <– PC + 4 inst Register Transfers IR <– MEM[pc] SWA<– R[rs]; B <– R[rt] S <– A + SExt(Im16); MEM[S] <– BPC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem A B E Time
11
Branch °Instruction °Register Transfers BEQif R[rs] == R[rt] then PC <= PC + 4+SExt(Im16) || 00 else PC <= PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem inst Register Transfers IR <– MEM[pc] BEQE<– (R[rs] = R[rt]) if !E then PC <– PC + 4 else PC <– PC+4+SExt(Im16)||00 A B E Time
12
Performance Evaluation °What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type TypeCPI i for typeFrequency CPI i x freqI i Arith/Logic440%1.6 Load530%1.5 Store410%0.4 branch320%0.6 Average CPI:4.1
13
Verilog Implementation (IM) module IM(IR, PC, clk, IRen); output [31:0] IR; input [31:0] PC; input clk, IRen; reg [31:0] IR; reg [31:0] mem[0:1023]; wire [31:0] IR_next; // OK, but slow // always @(posedge clk) // IR = mem[PC[12:2]]; assign IR_next = mem[PC[12:2]]; always @(posedge clk) if (IRen) IR = IR_next; endmodule IR Inst. Mem PC
14
Verilog Implementation (REGS) module REGS(A, B, E, RA, RB, RW, W, RegWr, clk, REGSen); output [31:0] A, B; output E; // A == B input [4:0] RA, RB, RW; input [31:0] W; input RegWr, clk, REGSen; reg [31:0] A, B; reg E; wire E_next; reg [31:0] regs[0:31]; assign E_next = (A_next == B_next) ? 1 : 0; always @(posedge clk) begin if (REGSen == 1) begin A = regs[RA]; B = regs[RB]; E = E_next; if (RegWr == 1’b1) regs[RW] = W; regs[0] = 0; end end endmodule Reg File A B E
15
Verilog Implementation (ALU) module ALU(S, A, B, ALUCtr, clk, ALUen); output [31:0] S; input [31:0] A, B; input [2:0] ALUCtr; input clk, ALUen; reg [31:0] S, S_next; always @(A or B or ALUCtr) begin if (ALUCtr == 3'h0) S_next = A + B;... end always @(posedge clk) begin if (ALUen == 1) S = S_next; end endmodule Exec S A B
16
Control °State specifies control points for Register Transfer °Transfer occurs upon entering state (rising edge) Current State Next State Logic Output Logic inputs Output control signals
17
State Machine for multicycle MIPS IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] E <= R[rt]==R[rs] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC,Equal) SW “start / instruction fetch” “decode / operand fetch” Execute Memory Write-back
18
State Machine that Generates Control Signals IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC,Equal) SW “start, instruction fetch” “decode” IRen ALUCtr, ALUen RegDst, RegWr, PCen REGSen Execute Memory Write-back
19
State Machine Implementation in Verilog 1 module CTRL(clk, rst, opcode, IRen, REGSen, ALUen, ALUCtr, REGDst, REGWr, PCen); input clk, rst; input [5:0] opcode; output IRen, REGSen, ALUen, ALUCtr, REGDst, REGWr, PCen; reg [3:0] state, next_state; reg IRen, REGSen, ALUen, ALUCtr, REGDst, REGWr, PCen; parameter [3:0] START = 0, DECODE = 1, RTYPE_1 = 2, RTYPE_2 = 3; // other states omitted
20
State Machine in Verilog 2 always @ (posedge clk or negedge rst) begin if (!rst) state = START; else state = next_state; // asynchronous reset end always @ (opcode or state) begin case (state) START: state_next = DECODE; DECODE: if (opcode == 6’h00) state_next = RTYPE_1; else if (opcode == 6’h02) state_next = ORI; else if (opcode == 6’h32) state_next = LW; // other states omitted RTYPE_1: state_next = RTYPE_2; RTYPE_2: state_next = START; endcase end
21
State Machine in Verilog 3 assign IRen = (state == START) ? 1 : 0; assign REGSen = (state == DECODE) ? 1 : 0; assign ALUen = (state == RTYPE_1 || state == ORI || state == LW || state == SW) ? 1 : 0;
22
Assigning States IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC) SW “start, instruction fetch” “decode” 0000 0001 0100 0101 0110 0111 1000 1001 1010 00111011 1100 Execute Memory Write-back
23
(Mostly) Detailed Control Specification (missing 0) 0000???????00011 0001BEQx00111 1 1 0001R-typex01001 1 1 0001ORIx01101 1 1 0001LWx10001 1 1 0001SWx10111 1 1 0011xxxxxx000001 0 x 0 x 0011xxxxxx100001 1 x 0 x 0100xxxxxxx01010 1 fun 1 0101xxxxxxx00001 0 0 1 1 0110xxxxxxx01110 0 or 1 0111xxxxxxx00001 0 0 1 0 1000xxxxxxx10011 0 add 1 1001xxxxxxx10101 0 1 1010 xxxxxxx00001 0 1 1 0 1011xxxxxxx11001 0 add 1 1100xxxxxxx0000 1 00 1 0 StateOp fieldEqNext IRPCOpsExecMemWrite-Back en selA B EEx Sr ALU S R W MM-R Wr Dst R: ORi: LW: SW: -all same in Moore machine BEQ:
24
Controller Design Alternative: Microprogramming °The state machines defining the controller for an instruction set processor are highly structured °Use this structure to construct a simple “microsequencer” °Control reduces to programming this very simple device microprogramming sequencer control datapath control micro-PC sequencer microinstruction
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.