1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012
Topics Issue w/ single cycle Multicycle MIPS State elements State elements Now add registers between stages How to control How to control Performance Performance 2
Multicycle MIPS Processor Single-cycle microarchitecture: + simple - cycle time limited by longest instruction ( lw ) - two adders/ALUs and two memories Multicycle microarchitecture: + higher clock speed + simpler instructions run faster + reuse expensive hardware on multiple cycles - sequencing overhead paid many times Same design steps: datapath & control
Multicycle State Elements Replace Instruction and Data memories with a single unified memory More realistic More realistic
Multicycle Datapath: lw instr fetch First consider executing lw STEP 1: Fetch instruction
Multicycle Datapath: lw register read
Multicycle Datapath: lw immediate
Multicycle Datapath: lw address
Multicycle Datapath: lw memory read
Multicycle Datapath: lw write register
Multicycle Datapath: increment PC Now using main ALU when it’s not busy (instead of dedicated adder)
Multicycle Datapath: sw Compared to lw addr generated as for lw addr generated as for lw write data in rt to memory write data in rt to memory
Multicycle Datapath: R-type Instrs. Read from rs and rt Write ALUResult to register file Write to rd (instead of rt )
Multicycle Datapath: beq 2 tasks Determine whether values in rs and rt are equal Determine whether values in rs and rt are equal Calculate branch target address: Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4) ALU reused!
Complete Multicycle Processor
Control Unit
Main Controller FSM: Fetch
Fetch instruction Also increment PC (because ALU not in use) Note: signals only shown when needed and enables only when asserted.
Main Controller FSM: Decode No signals needed for decode Register values also fetched Perhaps will not be used
Main Controller FSM: Address Calculation Now change states depending on instr
Main Controller FSM: Address Calculation For lw or sw, need to compute addr
Main Controller FSM: lw For lw now need to read from memory Then write to register
Main Controller FSM: sw sw just writes to memory One step shorter
Main Controller FSM: R-Type The r-type instructions have two steps: compute result in ALU and write to reg
Main Controller FSM: beq beq needs to use ALU twice, so consumes two cycles One to compute addr Another to decide on eq Can take advantage of decode when ALU not used to compute BTA (no harm if BTA not used)
Complete Multicycle Controller FSM
Main Controller FSM: addi Similar to r-type Add Write back
Main Controller FSM: addi
Extended Functionality: j
Control FSM: j
Multicycle Performance Instructions take different number of cycles: 3 cycles: beq, j 3 cycles: beq, j 4 cycles: R-Type, sw, addi 4 cycles: R-Type, sw, addi 5 cycles: lw 5 cycles: lw CPI is weighted average SPECINT2000 benchmark: 25% loads 25% loads 10% stores 10% stores 11% branches 11% branches 2% jumps 2% jumps 52% R-type 52% R-type Average CPI = ( )(3) + ( )(4) + (0.25)(5) = 4.12
Multicycle Performance Multicycle critical path: T c = t pcq + t mux + max(t ALU + t mux, t mem ) + t setup
Multicycle Performance Example T c = t pcq_PC + t mux + max(t ALU + t mux, t mem ) + t setup = t pcq_PC + t mux + t mem + t setup = [ ] ps = 325 ps
Multicycle Performance Example For a program with 100 billion instructions executing on a multicycle MIPS processor CPI = 4.12 CPI = 4.12 T c = 325 ps T c = 325 ps Execution Time = (# instructions) × CPI × T c = (100 × 10 9 )(4.12)(325 × ) = (100 × 10 9 )(4.12)(325 × ) = seconds = seconds This is slower than the single-cycle processor (92.5 seconds). Why? This is slower than the single-cycle processor (92.5 seconds). Why? Not all steps the same length Sequencing overhead for each step (t pcq + t setup = 50 ps)
Review: Single-Cycle MIPS Processor
Review: Multicycle MIPS Processor
Next Time Next class: We’ll look at pipelined MIPS We’ll look at pipelined MIPS Improving throughput (and adding complexity!) by trying to use all hardware every cycle Improving throughput (and adding complexity!) by trying to use all hardware every cycle Next lab (Lab 10) See website See website A full mini MIPS processor A full mini MIPS processor 38