Download presentation
Presentation is loading. Please wait.
1
5.5 A Multicycle Implementation
A single memory unit is used for both instructions and data. There is a single ALU, rather than an ALU and two adders. One or more registers are added after every major functional unit.
2
Continue Replacing the three ALUs of the single-cycle by a single ALU means that the single ALU must accommodate all the inputs that used to go to the three different ALUs.
3
Continue Control signals:
The programmer-visible state units (PC, Memory, Register file) and IR write Memory Read ALU control: same as single cycle Multiplexor single/two control lines
4
Continue Three possible sources for the PC: PC+4
ALUOut : address of the beq Address for jump ( j ) PC write control signal: PCWrite : PC+4 and jump PCWriteCond : beq
5
Continue
6
Breaking the Instruction Execution into Clock Cycles
Instruction fetch step IR <= Memory[PC]; PC <= PC + 4; IR <= Memory[PC]; MemRead IRWrite IorD = 0 PC <= PC + 4; ALUSrcA = 0 ALUSrcB = 01 ALUOp = 00 (for add) PCSource = 00 PCWrite The increment of the PC and instruction memory access can occur in parallel, how?
7
Breaking the Instruction Execution into Clock Cycles
Instruction decode and register fetch step Actions that are either applicable to all instructions Or are not harmful A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15-0] << 2 );
8
Instruction decode and register fetch step A <= Reg[IR[25:21]];
B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15-0] << 2 ); A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; Since A and B are overwritten on every cycle Done ALUOut <= PC + (sign-extend(IR[15-0]<<2); This requires: ALUSrcA 0 ALUSrcB 11 ALUOp 00 (for add) branch target address will be stored in ALUOut. The register file access and computation of branch target occur in parallel.
9
Breaking the Instruction Execution into Clock Cycles
Execution, memory address computation, or branch completion Memory reference: ALUOut <= A + sign-extend(IR[15:0]); Arithmetic-logical instruction: ALUOut <= A op B; Branch: if (A == B) PC <= ALUOut; Jump: PC <= { PC[31:28], (IR[25:0], 2’b00) };
10
3. Execution, memory address computation, or branch completion
Memory reference: ALUOut <= A + sign-extend(IR[15:0]); ALUSrcA = 1 && ALUSrcB = 10 ALUOp = 00 Arithmetic-logical instruction: ALUOut <= A op B; ALUSrcA = 1 && ALUSrcB = 00 ALUOp = 10 Branch: if (A == B) PC <= ALUOut; ALUOp = 01 (for subtraction) PCSource = 01 PCWriteCond Jump: PC <= { PC[31:28], (IR[25:0],2’b00) }; PCSource = 10 PCWrite
11
Breaking the Instruction Execution into Clock Cycles
Memory access or R-type instruction completion step Memory reference: MDR <= Memory [ALUOut]; MemRead or IorD=1 Memory [ALUOut] <= B; MemWrite Arithmetic-logical instruction (R-type): Reg[IR[15:11]] <= ALUOut; RegDst=1 RegWrite MemtoReg=0 Memory read completion step Load: Reg[IR[20:16]] <= MDR; MemtoReg=1 RegWrite RegDst=0
12
Breaking the Instruction Execution into Clock Cycles
13
Defining the Control Two different techniques to design the control:
Finite state machine Microprogramming Example: CPI in a Multicycle CPU Using the SPECINT2000 instruction mix, which is: 25% load, 10% store, 11% branches, 2% jumps, and 52% ALU. What is the CPI, assuming that each state in the multicycle CPU requires 1 clock cycle? Answer: The number of clock cycles for each instruction class is the following: Load: 5 Stores: 4 ALU instruction: 4 Branches: 3 Jumps: 3
14
Example Continue The CPI is given by the following:
is simply the instruction frequency for the instruction class i. We can therefore substitute to obtain: CPI = 0.25 3 = 4.12 This CPI is better than the worst-case CPI of 5.0 when all instructions take the same number of clock cycles.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.