The CPU Processor (CPU): the active part of the computer, which does all the work (data manipulation and decision-making) Datapath: portion of the processor which contains hardware necessary to perform operations required by the processor (the brawn) Control: portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain)
The CPU Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic implementation use the program counter (PC) to supply the instruction address and fetch the instruction from memory (and update the PC) decode the instruction (and read registers) execute the instruction All instructions (except j) use the ALU after reading the registers Fetch PC = PC+4 Decode Exec
A Small Set of Instructions MIPS instruction formats and naming of the various fields. We will refer to this diagram later Seven R-format ALU instructions (add, sub, slt, and, or, xor, nor) Six I-format ALU instructions (lui, addi, slti, andi, ori, xori) Two I-format memory access instructions (lw, sw) Three I-format conditional branch instructions (bltz, beq, bne) Four unconditional jump instructions (j, jr, jal, syscall)
The MIPS Instruction Set op 15 8 10 12 13 14 35 43 2 1 4 5 3 fn 32 34 42 36 37 38 39 Instruction Usage Load upper immediate lui rt,imm Add add rd,rs,rt Subtract sub rd,rs,rt Set less than slt rd,rs,rt Add immediate addi rt,rs,imm Set less than immediate slti rd,rs,imm AND and rd,rs,rt OR or rd,rs,rt XOR xor rd,rs,rt NOR nor rd,rs,rt AND immediate andi rt,rs,imm OR immediate ori rt,rs,imm XOR immediate xori rt,rs,imm Load word lw rt,imm(rs) Store word sw rt,imm(rs) Jump j L Jump register jr rs Branch less than 0 bltz rs,L Branch equal beq rs,rt,L Branch not equal bne rs,rt,L Jump and link jal L System call syscall The MIPS Instruction Set Copy Arithmetic Logic Memory access Control transfer
Fetching Instructions Fetching instructions involves reading the instruction from the Instruction Memory updating the PC to hold the address of the next instruction PC is updated every cycle, so it does not need an explicit write control signal Instruction Memory is read every cycle, so it doesn’t need an explicit read control signal Add 4 Instruction Memory Read Address PC Instruction
Decoding Instructions Decoding instructions involves sending the fetched instruction’s opcode and function field bits to the control unit reading two values from the Register File Register File addresses are contained in the instruction Control Unit Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 Instruction
Executing R Format Operations R format operations (add,sub,slt,and,or) perform the (op and funct) operation on values in rs and rt store the result back into the Register File (into location rd) The Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File R-type: 31 25 20 15 5 op rs rt rd funct shamt 10 RegWrite ALU control Read Addr 1 Read Data 1 Register File Read Addr 2 overflow Instruction zero ALU Write Addr Read Data 2 Write Data
Executing Load and Store Operations Load and store operations involve compute memory address by adding the base register (read from the Register File during decode) to the 16-bit signed-extended offset field in the instruction store value (read from the Register File during decode) written to the Data Memory load value, read from the Data Memory, written to the Register File Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU overflow zero ALU control RegWrite Data Memory Address Read Data Sign Extend MemWrite MemRead 16 32
Executing Branch Operations Add Branch target address Add 4 Shift left 2 ALU control PC zero (to branch control logic) Read Addr 1 Read Data 1 Register File Branch operations involves compare the operands read from the Register File during decode for equality (zero ALU output) compute the branch target address by adding the updated PC to the 16-bit signed-extended offset field in the instr Read Addr 2 Instruction ALU Write Addr Read Data 2 Write Data Sign Extend 16 32
Jump and Branch Instructions Unconditional jump and jump through register instructions j verify # go to mem loc named “verify” jr $ra # go to address that is in $ra; # $ra may hold a return address $ra is the symbolic name for reg. $31 (return address) (incremented) The jump instruction j of MiniMIPS is a J-type instruction which is shown along with how its effective target address is obtained. The jump register (jr) instruction is R-type, with its specified register often being $ra.
Executing Jump Operations Jump operation involves replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits Add 4 4 Jump address Instruction Memory Shift left 2 28 Read Address PC Instruction 26
Creating a Single Datapath from the Parts Assemble the datapath segments and add control lines and multiplexors as needed Single cycle design – fetch, decode and execute each instructions in one clock cycle no datapath resource can be used more than once per instruction, so some must be duplicated (e.g., separate Instruction Memory and Data Memory, several adders) multiplexors needed at the input of shared elements with control lines to do the selection write signals to control writing to the Register File and Data Memory Cycle time is determined by length of the longest path
Morgan Kaufmann Publishers 24 April, 2017 The Main Control Unit Control signals derived from instruction R-type rs rt rd shamt funct 31:26 5:0 25:21 20:16 15:11 10:6 Load/ Store 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 Branch opcode always read read, except for load write for R-type and load sign-extend and add Chapter 4 — The Processor
Single Cycle Datapath with Control Unit Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
R-type Instruction Data/Control Flow Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File zero Instr[20-16] Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data ALU 1 Write Addr Read Data 2 1 Write Data Write Data Instr[15 -11] 1 Instr[5-0] Sign Extend ALU control 16 32 Instr[5-0]
Deriving the Control Signals Control signals for the single-cycle MicroMIPS implementation. Control signal 1 2 3 RegWrite Don’t write Write RegDst rt rd $31 MemtoReg Data out ALU out IncrPC ALUSrc (rt ) imm AddSub Add Subtract LogicFn1, LogicFn0 AND OR XOR NOR FnClass1, FnClass0 lui Set less Arithmetic Logic MemRead Don’t read Read MemWrite BrType1, BrType0 No branch beq bne bltz PCSrc jta (rs) SysCallAddr Reg file ALUop Data cache Next addr
Load Word Instruction Data/Control Flow Add 1 Add 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Read Address Data Memory PC Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Store Word Instruction? Sign Extend ALU control 16 32 Instr[5-0]
Deriving the Control Signals Control signals for the single-cycle MicroMIPS implementation. Control signal 1 2 3 RegWrite Don’t write Write RegDst rt rd $31 MemtoReg Data out ALU out IncrPC ALUSrc (rt ) imm AddSub Add Subtract LogicFn1, LogicFn0 AND OR XOR NOR FnClass1, FnClass0 lui Set less Arithmetic Logic MemRead Don’t read Read MemWrite BrType1, BrType0 No branch beq bne bltz PCSrc jta (rs) SysCallAddr Reg file ALUop Data cache Next addr
Branch Instruction Data/Control Flow Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Instr[20-16] Register File zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Deriving the Control Signals Control signals for the single-cycle MicroMIPS implementation. Control signal 1 2 3 RegWrite Don’t write Write RegDst rt rd $31 MemtoReg Data out ALU out IncrPC ALUSrc (rt ) imm AddSub Add Subtract LogicFn1, LogicFn0 AND OR XOR NOR FnClass1, FnClass0 lui Set less Arithmetic Logic MemRead Don’t read Read MemWrite BrType1, BrType0 No branch beq bne bltz PCSrc jta (rs) SysCallAddr Reg file ALUop Data cache Next addr
Jump Operation? 1 ovf zero 1 1 1 Add Add 4 Shift left 2 PCSrc ALUOp Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Adding the Jump Operation Instr[25-0] 1 Shift left 2 28 32 26 PC+4[31-28] Add Add 1 4 Shift left 2 PCSrc Jump ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Can the Control Unit set all the control signals based solely on the opcode field of the instruction? Instr[25-0] 1 Shift left 2 28 32 26 PC+4[31-28] Add Add 1 4 Shift left 2 PCSrc Jump ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]
Control Signal Settings
Control Signal Generation Auxiliary signals identifying instruction classes arithInst = addInst subInst sltInst addiInst sltiInst logicInst = andInst orInst xorInst norInst andiInst oriInst xoriInst immInst = luiInst addiInst sltiInst andiInst oriInst xoriInst Example logic expressions for control signals RegWrite = luiInst arithInst logicInst lwInst jalInst ALUSrc = immInst lwInst swInst AddSub = subInst sltInst sltiInst DataRead = lwInst PCSrc0 = jInst jalInst syscallInst Control addInst subInst jInst sltInst .
13.6 Performance of the Single-Cycle Design An example combinational-logic data path to compute z := (u + v)(w – x) / y Add/Sub latency 2 ns Multiply latency 6 ns Divide latency 15 ns Total latency 23 ns / + - y u v w x z Note that the divider gets its correct inputs after 9 ns, but this won’t cause a problem if we allow enough total time Beginning with inputs u, v, w, x, and y stored in registers, the entire computation can be completed in 25 ns, allowing 1 ns each for register readout and write
Performance Estimation for Single-Cycle MicroMIPS Instruction access 2 ns Register read 1 ns ALU operation 2 ns Data cache access 2 ns Register write 1 ns Total 8 ns Single-cycle clock = 125 MHz R-type 44% 6 ns Load 24% 8 ns Store 12% 7 ns Branch 18% 5 ns Jump 2% 3 ns Weighted mean 6.36 ns The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies.
How Good is Our Single-Cycle Design? Clock rate of 125 MHz not impressive How does this compare with current processors on the market? Instruction access 2 ns Register read 1 ns ALU operation 2 ns Data cache access 2 ns Register write 1 ns Total 8 ns Single-cycle clock = 125 MHz Not bad, where latency is concerned A 2.5 GHz processor with 20 or so pipeline stages has a latency of about 0.4 ns/cycle 20 cycles = 8 ns Throughput, however, is much better for the pipelined processor: Up to 20 times better with single issue Perhaps up to 100 times better with multiple issue