Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.

Similar presentations


Presentation on theme: "1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012."— Presentation transcript:

1 1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012

2 Topics  Issue w/ single cycle  Multicycle MIPS State elements State elements  Now add registers between stages How to control How to control Performance Performance 2

3 Multicycle MIPS Processor  Single-cycle microarchitecture: + simple - cycle time limited by longest instruction ( lw ) - two adders/ALUs and two memories  Multicycle microarchitecture: + higher clock speed + simpler instructions run faster + reuse expensive hardware on multiple cycles - sequencing overhead paid many times  Same design steps: datapath & control

4 Multicycle State Elements  Replace Instruction and Data memories with a single unified memory More realistic More realistic

5 Multicycle Datapath: lw instr fetch  First consider executing lw  STEP 1: Fetch instruction

6 Multicycle Datapath: lw register read

7 Multicycle Datapath: lw immediate

8 Multicycle Datapath: lw address

9 Multicycle Datapath: lw memory read

10 Multicycle Datapath: lw write register

11 Multicycle Datapath: increment PC Now using main ALU when it’s not busy (instead of dedicated adder)

12 Multicycle Datapath: sw  Compared to lw addr generated as for lw addr generated as for lw write data in rt to memory write data in rt to memory

13 Multicycle Datapath: R-type Instrs.  Read from rs and rt  Write ALUResult to register file  Write to rd (instead of rt )

14 Multicycle Datapath: beq  2 tasks Determine whether values in rs and rt are equal Determine whether values in rs and rt are equal Calculate branch target address: Calculate branch target address:  BTA = (sign-extended immediate << 2) + (PC+4)  ALU reused!

15 Complete Multicycle Processor

16 Control Unit

17 Main Controller FSM: Fetch

18 Fetch instruction Also increment PC (because ALU not in use) Note: signals only shown when needed and enables only when asserted.

19 Main Controller FSM: Decode No signals needed for decode Register values also fetched Perhaps will not be used

20 Main Controller FSM: Address Calculation Now change states depending on instr

21 Main Controller FSM: Address Calculation For lw or sw, need to compute addr

22 Main Controller FSM: lw For lw now need to read from memory Then write to register

23 Main Controller FSM: sw sw just writes to memory One step shorter

24 Main Controller FSM: R-Type The r-type instructions have two steps: compute result in ALU and write to reg

25 Main Controller FSM: beq beq needs to use ALU twice, so consumes two cycles One to compute addr Another to decide on eq Can take advantage of decode when ALU not used to compute BTA (no harm if BTA not used)

26 Complete Multicycle Controller FSM

27 Main Controller FSM: addi Similar to r-type Add Write back

28 Main Controller FSM: addi

29 Extended Functionality: j

30 Control FSM: j

31

32 Multicycle Performance  Instructions take different number of cycles: 3 cycles: beq, j 3 cycles: beq, j 4 cycles: R-Type, sw, addi 4 cycles: R-Type, sw, addi 5 cycles: lw 5 cycles: lw  CPI is weighted average  SPECINT2000 benchmark: 25% loads 25% loads 10% stores 10% stores 11% branches 11% branches 2% jumps 2% jumps 52% R-type 52% R-type  Average CPI = (0.11 + 0.2)(3) + (0.52 + 0.10)(4) + (0.25)(5) = 4.12

33 Multicycle Performance Multicycle critical path: T c = t pcq + t mux + max(t ALU + t mux, t mem ) + t setup

34 Multicycle Performance Example T c = t pcq_PC + t mux + max(t ALU + t mux, t mem ) + t setup = t pcq_PC + t mux + t mem + t setup = [30 + 25 + 250 + 20] ps = 325 ps

35 Multicycle Performance Example  For a program with 100 billion instructions executing on a multicycle MIPS processor CPI = 4.12 CPI = 4.12 T c = 325 ps T c = 325 ps  Execution Time = (# instructions) × CPI × T c = (100 × 10 9 )(4.12)(325 × 10 -12 ) = (100 × 10 9 )(4.12)(325 × 10 -12 ) = 133.9 seconds = 133.9 seconds This is slower than the single-cycle processor (92.5 seconds). Why? This is slower than the single-cycle processor (92.5 seconds). Why?  Not all steps the same length  Sequencing overhead for each step (t pcq + t setup = 50 ps)

36 Review: Single-Cycle MIPS Processor

37 Review: Multicycle MIPS Processor

38 Next Time  Next class: We’ll look at pipelined MIPS We’ll look at pipelined MIPS Improving throughput (and adding complexity!) by trying to use all hardware every cycle Improving throughput (and adding complexity!) by trying to use all hardware every cycle  Next lab (Lab 10) See website See website A full mini MIPS processor A full mini MIPS processor 38


Download ppt "1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012."

Similar presentations


Ads by Google