Download presentation
Presentation is loading. Please wait.
1
COMP541 Datapaths II & Single-Cycle MIPS
Montek Singh Apr 2, 2012
2
Topics Complete the datapath Add control to it
Create a full single-cycle MIPS! Reading Chapter 7 Review MIPS assembly language Chapter 6 of course textbook Or, Patterson Hennessy (inside flap)
3
Top-Level CPU (MIPS) reset clk clk pc[31:2] memwrite Instr Memory MIPS
Data Memory dataadr writedata instr readdata
4
Top-Level CPU: Verilog
module top(input clk, reset, output … ); // add signals here for debugging wire [31:0] pc, instr, readdata, writedata, dataadr; wire memwrite; mips mips(clk, reset, pc, instr, memwrite, dataadr, writedata, readdata); // processor imem imem(pc[31:2], instr); // instr memory dmem dmem(clk, memwrite, dataadr, writedata, readdata); // data memory endmodule
5
Top Level Schematic (ISE)
imem MIPS dmem
6
One level down: Inside MIPS
module mips(input clk, reset, output [31:0] pc, input [31:0] instr, output memwrite, output [31:0] aluout, writedata, input [31:0] readdata); wire memtoreg, branch, pcsrc, alusrc, regdst, regwrite, jump; wire [4:0] alucontrol; // depends on your ALU wire [3:0] flags; // flags = {Z, V, C, N} controller c(instr[31:26], instr[5:0], flags, memtoreg, memwrite, pcsrc, alusrc, regdst, regwrite, jump, alucontrol); datapath dp(clk, reset, memtoreg, pcsrc, alucontrol, flags, pc, instr, aluout, writedata, readdata); endmodule
7
A Note on Flags Book’s design only uses Z (zero)
simple version of MIPS allows beq, bne, slt type of tests Our design uses { Z, V, C, N } flags Z = zero V = overflow C = carry out N = negative Allows richer variety of instructions see next slide wherever you see “zero” in these slides, it should probably read “flags”
8
A Note on Flags 4 flags produced by ALU: Z (zero): result is = 0
big NOR gate N (negative): result is < 0 SN-1 C (carry): indicates that most significant position produced a carry, e.g., “1 + (-1)” Carry from last FA V (overflow): indicates answer doesn’t fit precisely: To compare A and B, perform A–B and use condition codes: Signed comparison: LT NV LE Z+(NV) EQ Z NE ~Z GE ~(NV) GT ~(Z+(NV)) Unsigned comparison: LTU C LEU C+Z GEU ~C GTU ~(C+Z) -or-
9
Datapath flags(3:0)
10
MIPS State Elements We’ll fill out the datapath and control logic for basic single cycle MIPS first the datapath then the control logic
11
Single-Cycle Datapath: lw
Let’s start by implementing lw instruction
12
Single-Cycle Datapath: lw
First consider executing lw How does lw work? STEP 1: Fetch instruction
13
Single-Cycle Datapath: lw
STEP 2: Read source operands from register file
14
Single-Cycle Datapath: lw
STEP 3: Sign-extend the immediate
15
Single-Cycle Datapath: lw
STEP 4: Compute the memory address Note Control
16
Single-Cycle Datapath: lw
STEP 5: Read data from memory and write it back to register file
17
Single-Cycle Datapath: lw
STEP 6: Determine the address of the next instruction
18
Let’s be Clear: CPU is Single-Cycle!
Although the slides said “STEP” … … all that stuff is executed in one cycle!!! Let’s look at sw next … … and then R-type instructions
19
Single-Cycle Datapath: sw
Write data in rt to memory nothing is written back into the register file
20
Single-Cycle Datapath: R-type instr
R-Type instructions: Read from rs and rt Write ALUResult to register file Write to rd (instead of rt)
21
Single-Cycle Datapath: beq
Determine whether values in rs and rt are equal Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4)
22
Complete Single-Cycle Processor (w/control)
23
Note: Difference due to Flags
Our Control Unit will be slightly different … because of the extra flags All flags (Z, V, C, N) are inputs to the control unit Signals such as PCSrc are produced inside the control unit
24
Control Unit Generally as shown below
but some differences because our ALU is more sophisticated flags[3:0] PCSrc Note: This will be different for our full-feature ALU! Note: This will be 5 bits for our full-feature ALU!
25
Review: Lightweight ALU from book
Function 000 A & B 001 A | B 010 A + B 011 not used 100 A & ~B 101 A | ~B 110 A - B 111 SLT
26
Review: Lightweight ALU from book
Function 000 A & B 001 A | B 010 A + B 011 not used 100 A & ~B 101 A | ~B 110 A - B 111 SLT
27
Review: Our “full feature” ALU
Full-feature ALU from COMP411: A B 5-bit ALUFN Sub Bool Shft Math OP 0 XX A+B 1 XX A-B X X X X X B<<A X B>>A X B>>>A X A & B X A | B X A ^ B X A | B Add/Sub Bidirectional Barrel Shifter Boolean Sub Bool Shft Math … Flags V,C N Flag R Z Flag
28
Review: R-Type instructions
Register-type 3 register operands: rs, rt: source registers rd: destination register Other fields: op: the operation code or opcode (0 for R-type instructions) funct: the function together, op and funct tell the computer which operation to perform shamt: the shift amount for shift instructions, otherwise it is 0
29
Controller (2 modules) module controller(input [5:0] op, funct,
input [3:0] flags, output memtoreg, memwrite, output pcsrc, alusrc, output regdst, regwrite, output jump, output [2:0] alucontrol); // 5 bits for our ALU!! wire [1:0] aluop; // This will be different for our ALU wire branch; maindec md(op, memtoreg, memwrite, branch, alusrc, regdst, regwrite, jump, aluop); aludec ad(funct, aluop, alucontrol); assign pcsrc = branch & flags[3]; // flags = {Z, V, C, N} endmodule
30
This entire coding may be different in our design
Main Decoder module maindec(input [5:0] op, output memtoreg, memwrite, branch, alusrc, output regdst, regwrite, jump, output [1:0] aluop); // different for our ALU reg [8:0] controls; assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls; case(op) 6'b000000: controls <= 9'b ; //Rtype 6'b100011: controls <= 9'b ; //LW 6'b101011: controls <= 9'b ; //SW 6'b000100: controls <= 9'b ; //BEQ 6'b001000: controls <= 9'b ; //ADDI 6'b000010: controls <= 9'b ; //J default: controls <= 9'bxxxxxxxxx; //??? endcase endmodule Why do this? This entire coding may be different in our design
31
This entire coding will be different in our design
ALU Decoder module aludec(input [5:0] funct, input [1:0] aluop, output reg [2:0] alucontrol); // 5 bits for our ALU!! case(aluop) 2'b00: alucontrol <= 3'b010; // add 2'b01: alucontrol <= 3'b110; // sub default: case(funct) // RTYPE 6'b100000: alucontrol <= 3'b010; // ADD 6'b100010: alucontrol <= 3'b110; // SUB 6'b100100: alucontrol <= 3'b000; // AND 6'b100101: alucontrol <= 3'b001; // OR 6'b101010: alucontrol <= 3'b111; // SLT default: alucontrol <= 3'bxxx; // ??? endcase endmodule This entire coding will be different in our design
32
Control Unit: ALU Decoder
This entire coding will be different in our design ALUOp1:0 Meaning 00 Add 01 Subtract 10 Look at Funct 11 Not Used ALUOp1:0 Funct ALUControl2:0 00 X 010 (Add) X1 110 (Subtract) 1X (add) (sub) (and) 000 (And) (or) 001 (Or) (slt) 111 (SLT)
33
Control Unit: Main Decoder
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 R-type 000000 1 … lw 100011 sw 101011 X beq 000100
34
Note on controller The actual number and names of control signals may be somewhat different in our/your design compared to the one given in the book because we are implementing more features/instructions SO BE VERY CAREFUL WHEN YOU DESIGN YOUR CPU!
35
Single-Cycle Datapath Example: or
36
Extended Functionality: addi
No change to datapath
37
Control Unit: addi 1 … X R-type 000000 lw 100011 sw 101011 beq 000100
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 R-type 000000 1 … lw 100011 sw 101011 X beq 000100 addi 001000
38
Adding Jumps: j
39
Control Unit: Main Decoder
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump R-type 000000 1 … lw 100011 sw 101011 X beq 000100 j XX
40
Review: Processor Performance
Program Execution Time = (# instructions)(cycles/instruction)(seconds/cycle) = # instructions x CPI x TC
41
Single-Cycle Performance
TC is limited by the critical path (lw)
42
Single-Cycle Performance
Single-cycle critical path: Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem + tmux + tRFsetup In most implementations, limiting paths are: memory, ALU, register file. Tc = tpcq_PC + 2tmem + tRFread + tALU + tRFsetup + tmux
43
Single-Cycle Performance Example
Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup = [30 + 2(250) ] ps = 925 ps What’s the max clock frequency?
44
Single-Cycle Performance Example
For a program with 100 billion instructions executing on a single-cycle MIPS processor, Execution Time = # instructions x CPI x TC = (100 × 109)(1)(925 × s) = 92.5 seconds
45
Next Time Next class: Next lab: We’ll look at multi-cycle MIPS
Adding functionality to our design Next lab: Implement single-cycle CPU!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.