Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP541 Multicycle MIPS Montek Singh Apr 8, 2015.

Similar presentations


Presentation on theme: "COMP541 Multicycle MIPS Montek Singh Apr 8, 2015."— Presentation transcript:

1 COMP541 Multicycle MIPS Montek Singh Apr 8, 2015

2 Topics Challenges w/ single-cycle MIPS implementation Multicycle MIPS
State elements Now add registers between stages How to control Performance

3 Review: Processor Performance
Program execution time Execution Time = (# instructions) (cycles/instruction)(seconds/cycle) = IC x CPI x Tc Definitions: IC = instruction count Cycles/instruction = CPI Seconds/cycle = clock period = Tc 1/CPI = Instructions/cycle = IPC Challenge is to satisfy constraints of: Cost Power Performance

4 Single-Cycle Performance (textbook version)
TC is limited by the critical path (lw) lw is typically the longest instruction

5 Single-Cycle Performance (textbook version)
Single-cycle critical path: Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem + tmux + tRFsetup In most implementations, limiting paths are: memory, ALU, register file. Tc = tpcq_PC + 2tmem + tRFread + tALU + tmux + tRFsetup

6 Single-Cycle Performance Example
Tc = tpcq_PC + 2tmem + tRFread + tALU + tmux + tRFsetup = [30 + 2(250) ] ps = 925 ps What’s the max clock frequency?

7 Single-Cycle Performance Example
For a program with 100 billion instructions executing on a single-cycle MIPS processor, Execution Time = # instructions x CPI x TC = (100 × 109)(1)(925 × s) = 92.5 seconds

8 Key idea: Break instruction execution into multiple clock cycles
Multicycle MIPS Key idea: Break instruction execution into multiple clock cycles

9 Multicycle MIPS Processor
Single-cycle microarchitecture: + simple cycle time limited by longest instruction (lw) two adders/ALUs and two memories Multicycle microarchitecture: + higher clock speed + simpler instructions run faster + reuse expensive hardware on multiple cycles - sequencing overhead Same design steps: datapath & control

10 Multicycle State Elements
Replace Instruction and Data memories with a single unified memory More realistic (buy one big RAM!) Was not possible in single-cycle implementation both instruction and data accesses needed within same clock cycle Now: Use same memory twice if needed instruction fetch and data access are in distinct clock cycles

11 Multicycle Datapath: lw instr fetch
First consider executing lw STEP 1: Fetch instruction introduce Instruction Register to buffer this instruction a “non-architectural register” not accessible to programmer

12 Multicycle Datapath: lw register read
Read register $rs insert another non-architectural register, A buffers the value of $rs read from register file

13 Multicycle Datapath: lw immediate
Immediate field is sign-extended for consistency, could insert another non-architectural register to buffer SignImm skipped in this version because SignImm is a simple combinational function of Instr, which is already being held in Instruction Register

14 Multicycle Datapath: lw address
ALU computes memory address insert another register to buffer ALUOut

15 Multicycle Datapath: lw memory read
Same memory read now for data access insert a mutiplexer in front of memory’s address input choose either PC or ALUOut as address i.e., either instruction fetch or data access controlled by new control signal IorD

16 Multicycle Datapath: lw write register
Data from memory is written into register file

17 Multicycle Datapath: increment PC
PC incremented by re-using the ALU to do PC + 4 in single-cycle, we had to introduce a dedicated +4 adder in multi-cycle, same ALU used twice, in distinct cycles! Now using main ALU when it is not busy (instead of dedicated adder)

18 Multicycle Datapath: sw
Compared to lw address computation is identical to lw write data in $rt to memory MemWrite will be 1 during the appropriate clock cycle $rt is buffered using nonarchitectural register B

19 Multicycle Datapath: R-type Instrs.
Read from $rs and $rt multiplexers in front of ALU choose $rs and $rt as operands rite ALUResult to register file Write to $rd (instead of $rt) multiplexers in front of write address/data to register file

20 Multicycle Datapath: beq
2 tasks Determine whether values in rs and rt are equal Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4) ALU reused!

21 Complete Multicycle Processor
Caveat: Same differences in functionality w.r.t. our lab version as single-cycle MIPS

22 Control Unit

23 Main Controller FSM: Fetch

24 Main Controller FSM: Fetch
Fetch instruction Also increment PC (because ALU not in use) Note: signals only shown when needed and enables only when asserted.

25 Main Controller FSM: Decode
No signals needed for decode Register values also fetched Perhaps will not be used

26 Main Controller FSM: Address Calculation
Now change states depending on instr

27 Main Controller FSM: Address Calculation
For lw or sw, need to compute addr

28 Main Controller FSM: lw
For lw now need to read from memory Then write to register

29 Main Controller FSM: sw
sw just writes to memory One step shorter

30 Main Controller FSM: R-Type
The r-type instructions have two steps: compute result in ALU and write to reg

31 Main Controller FSM: beq
beq needs to use ALU twice, so consumes two cycles One to compute addr Another to decide on eq Can take advantage of decode when ALU not used to compute BTA (no harm if BTA not used)

32 Complete Multicycle Controller FSM

33 Main Controller FSM: addi
Similar to r-type Add Write back

34 Main Controller FSM: addi

35 Extended Functionality: j

36 Control FSM: j

37 Control FSM: j

38 Multicycle Performance
Instructions take different number of cycles: 3 cycles: beq, j 4 cycles: R-Type, sw, addi 5 cycles: lw CPI is weighted average SPECINT2000 benchmark: 25% loads 10% stores 11% branches 2% jumps 52% R-type Average CPI = ( )(3) + ( )(4) + (0.25)(5) = 4.12

39 Multicycle Performance
Multicycle critical path: Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup

40 Multicycle Performance Example
Tc = tpcq_PC + tmux + max(tALU + tmux, tmem) + tsetup = tpcq_PC + tmux + tmem + tsetup = [ ] ps = 325 ps

41 Multicycle Performance Example
For a program with 100 billion instructions executing on a multicycle MIPS processor CPI = 4.12 Tc = 325 ps Execution Time = (# instructions) × CPI × Tc = (100 × 109)(4.12)(325 × 10-12) = seconds This is slower than the single-cycle processor (92.5 seconds). Why? Not all steps the same length Sequencing overhead for each step (tpcq + tsetup= 50 ps)

42 Review: Single-Cycle MIPS Processor

43 Review: Multicycle MIPS Processor

44 Next Time Next topic: We’ll look at pipelined MIPS
Improving throughput (and adding complexity!) by trying to use all of the hardware every cycle


Download ppt "COMP541 Multicycle MIPS Montek Singh Apr 8, 2015."

Similar presentations


Ads by Google