Presentation is loading. Please wait.

Presentation is loading. Please wait.

5.5 A Multicycle Implementation

Similar presentations


Presentation on theme: "5.5 A Multicycle Implementation"— Presentation transcript:

1 5.5 A Multicycle Implementation
A single memory unit is used for both instructions and data. There is a single ALU, rather than an ALU and two adders. One or more registers are added after every major functional unit.

2 Continue Replacing the three ALUs of the single-cycle by a single ALU means that the single ALU must accommodate all the inputs that used to go to the three different ALUs.

3 Continue Control signals:
The programmer-visible state units (PC, Memory, Register file) and IR  write Memory  Read ALU control: same as single cycle Multiplexor single/two control lines

4 Continue Three possible sources for the PC: PC+4
ALUOut : address of the beq Address for jump ( j ) PC write control signal: PCWrite : PC+4 and jump PCWriteCond : beq

5 Continue

6 Breaking the Instruction Execution into Clock Cycles
Instruction fetch step IR <= Memory[PC]; PC <= PC + 4; IR <= Memory[PC]; MemRead IRWrite IorD = 0 PC <= PC + 4; ALUSrcA = 0 ALUSrcB = 01 ALUOp = 00 (for add) PCSource = 00 PCWrite The increment of the PC and instruction memory access can occur in parallel, how?

7 Breaking the Instruction Execution into Clock Cycles
Instruction decode and register fetch step Actions that are either applicable to all instructions Or are not harmful A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15-0] << 2 );

8 Instruction decode and register fetch step A <= Reg[IR[25:21]];
B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15-0] << 2 ); A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; Since A and B are overwritten on every cycle  Done ALUOut <= PC + (sign-extend(IR[15-0]<<2); This requires: ALUSrcA  0 ALUSrcB  11 ALUOp  00 (for add) branch target address will be stored in ALUOut. The register file access and computation of branch target occur in parallel.

9 Breaking the Instruction Execution into Clock Cycles
Execution, memory address computation, or branch completion Memory reference: ALUOut <= A + sign-extend(IR[15:0]); Arithmetic-logical instruction: ALUOut <= A op B; Branch: if (A == B) PC <= ALUOut; Jump: PC <= { PC[31:28], (IR[25:0], 2’b00) };

10 3. Execution, memory address computation, or branch completion
Memory reference: ALUOut <= A + sign-extend(IR[15:0]); ALUSrcA = 1 && ALUSrcB = 10 ALUOp = 00 Arithmetic-logical instruction: ALUOut <= A op B; ALUSrcA = 1 && ALUSrcB = 00 ALUOp = 10 Branch: if (A == B) PC <= ALUOut; ALUOp = 01 (for subtraction) PCSource = 01 PCWriteCond Jump: PC <= { PC[31:28], (IR[25:0],2’b00) }; PCSource = 10 PCWrite

11 Breaking the Instruction Execution into Clock Cycles
Memory access or R-type instruction completion step Memory reference: MDR <= Memory [ALUOut]; MemRead or IorD=1 Memory [ALUOut] <= B; MemWrite Arithmetic-logical instruction (R-type): Reg[IR[15:11]] <= ALUOut; RegDst=1 RegWrite MemtoReg=0 Memory read completion step Load: Reg[IR[20:16]] <= MDR; MemtoReg=1 RegWrite RegDst=0

12 Breaking the Instruction Execution into Clock Cycles

13 Defining the Control Two different techniques to design the control:
Finite state machine Microprogramming Example: CPI in a Multicycle CPU Using the SPECINT2000 instruction mix, which is: 25% load, 10% store, 11% branches, 2% jumps, and 52% ALU. What is the CPI, assuming that each state in the multicycle CPU requires 1 clock cycle? Answer: The number of clock cycles for each instruction class is the following: Load: 5 Stores: 4 ALU instruction: 4 Branches: 3 Jumps: 3

14 Example Continue The CPI is given by the following:
is simply the instruction frequency for the instruction class i. We can therefore substitute to obtain: CPI = 0.25    3 = 4.12 This CPI is better than the worst-case CPI of 5.0 when all instructions take the same number of clock cycles.


Download ppt "5.5 A Multicycle Implementation"

Similar presentations


Ads by Google