Download presentation
Presentation is loading. Please wait.
1
1 1999 ©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan/cs162
2
2 1999 ©UCB Single Cycle Datapath (From Ch 5) Regs Read Reg1 Read data1 ALUALU Read data2 Read Reg2 Write Reg Write Data Zero ALU- con RegWrite Address Read data Write Data Sign Extend Dmem MemRead MemWrite MuxMux MemTo- Reg MuxMux Read Addr Instruc- tion Imem 4 PCPC addadd addadd << 2 MuxMux PCSrc ALUOp ALU- src MuxMux 25:21 20:16 15:11 RegDst 15:0 31:0
3
3 1999 ©UCB Required Changes to Datapath °Introduce registers to separate 5 stages by putting IF/ID, ID/EX, EX/MEM, and MEM/WB registers in the datapath. °Next PC value is computed in the 3 rd step, but we need to bring in next instn in the next cycle – Move PCSrc Mux to 1 st stage. The PC is incremented unless there is a new branch address. °Branch address is computed in 3 rd stage. With pipeline, the PC value has changed! Must carry the PC value along with instn. Width of IF/ID register = (IR)+(PC) = 64 bits.
4
4 1999 ©UCB Changes to Datapath Contd. °For lw instn, we need write register address at stage 5. But the IR is now occupied by another instn! So, we must carry the IR destination field as we move along the stages. See connection in fig. Length of ID/EX register = (Reg1:32)+(Reg2:32)+(offset:32)+ (PC:32)+ (destination register:5) = 133 bits Assignment: What are the lengths of EX/MEM, and MEM/WB registers
5
5 1999 ©UCB Pipelined Datapath (with Pipeline Regs)(6.2) Address 4 32 0 Add Add result Shift left 2 I n s t r u c t i o n M u x 0 1 Add PC 0 Address Write data M u x 1 Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero Imem Dmem Regs IF/ID ID/EX EX/MEM MEM/WB 64 bits 133 bits 102 bits 69 bits 5 Fetch Decode Execute Memory Write Back
6
6 1999 ©UCB Pipelined Control (6.3) Start with single-cycle controller Group control lines by pipeline stage needed Extend pipeline registers with control bits Control EX Mem WB WB WB IF/IDID/EXEX/MEMMEM/WB Instruction RegDst ALUop ALUSrc Branch MemRead MemWrite MemToReg RegWrite
7
7 1999 ©UCB Pipelined Processor: Datapath + Control PC I n s t r u c t i o n Add Instruction [ 2 0 – 16] 4 1632 Instruction [15–0] 0 0 M u x 0 1 Add Add result Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2 C ontrol ALU Instruction [15–11] 6 EX M WB M WB WB IF/ID ID/EX EX/MEM MEM/WB M u x 0 1 Address Address More work to correctly handle pipeline hazards RegWrite ALUSrc ALUOp RegDst MemRead MemToReg MemWrite Branch PCSrc Imem Dmem Regs
8
8 1999 ©UCB Recap °if can keep all pipeline stages busy, can retire (complete) up to one instruction per clock cycle (thereby achieving single-cycle throughput) °The pipeline paradox (for MIPS): any instruction still takes 5 cycles to execute (even though can retire one instruction per cycle)
9
9 1999 ©UCB Problems for Pipelining °Hazards prevent next instruction from executing during its designated clock cycle, limiting speedup Structural hazards: HW cannot support this combination of instructions (single memory for instruction and data) Data hazards: Instruction depends on result of prior instruction still in the pipeline Control hazards: conditional branches & other instructions may stall the pipeline delaying later instructions
10
10 1999 ©UCB M Single Memory is a Structural Hazard Load Instr 1 Instr 2 Instr 3 Instr 4 ALU M Reg M ALU M Reg M ALU M Reg M ALU Reg M ALU M Reg M Can’t read same memory twice in same clock cycle I n s t r. O r d e r Time (clock cycles)
11
11 1999 ©UCB EX: MIPS multicycle datapath: Structural Hazard in Memory Registers Read Reg1 ALUALU Read Reg2 Write Reg Data PCPC Address Instruction or Data Memory A B ALU- Out Instruction Register Data Memory Data Register Read data 1 Read data 2
12
12 1999 ©UCB Structural Hazards limit performance °Example: if 1.3 memory accesses per instruction (30% of instructions execute loads and stores) and only one memory access per cycle then Average CPI 1.3 Otherwise datapath resource is more than 100% utilized Structural Hazard Solution: Add more Hardware
13
13 1999 ©UCB Speed Up Equation for Pipelining CPI pipelined = Ideal CPI + Pipeline stall clock cycles per instn Speedup = Ideal CPI x Pipeline depth Clock Cycle unpipelined ---------------------------------- X ------------------------- Ideal CPI + Pipeline stall CPI Clock Cycle pipelined Speedup = Pipeline depth Clock Cycle unpipelined ------------------------ X --------------------------- 1 + Pipeline stall CPI Clock Cycle pipelined x
14
14 1999 ©UCB Example: Dual-port vs. Single-port °Machine A: Dual ported memory °Machine B: Single ported memory, but its pipelined implementation has a 1.05 times faster clock rate °Ideal CPI = 1 for both °Loads are 40% of instructions executed SpeedUp A = Pipeline Depth/(1 + 0) x (clock unpipe /clock pipe ) = Pipeline Depth SpeedUp B = Pipeline Depth/(1 + 0.4 x 1) x (clock unpipe /(clock unpipe / 1.05) = (Pipeline Depth/1.4) x 1.05 = 0.75 x Pipeline Depth SpeedUp A / SpeedUp B = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33 °Machine A is 1.33 times faster
15
15 1999 ©UCB Data Hazard on Register $1 (6.4) add $1,$2, $3 sub $4, $1,$3 and $6, $1,$7 or $8, $1,$9 xor $10, $1,$11
16
16 1999 ©UCB “Forward” result from one stage to another “or” OK if implement register file properly Data Hazard Solution: add $1,$2,$3 sub $4,$1,$3 and $6,$1,$7 or $8,$1,$9 xor $10,$1,$11 IFID/RFEXMEMWB ALU IM Reg DM Reg ALU IM Reg DMReg ALU IM Reg DMReg IM ALU Reg DMReg ALU IM Reg DMReg I n s t r. O r d e r Time (clock cycles)
17
17 1999 ©UCB Hazard Detection for Forwarding °A hazard must be detected just before execution so that in case of hazard, the data can be forwarded to the input of the ALU. °It can be detected when a source register (Rs or Rt or both) of the instruction at the EX stage is equal to the destination register (Rd) of an instruction in the pipeline (either in MEM or WB stage) °Compare the values of Rs and Rt registers in the ID/EX stage with Rd at EX/MEM and MEM/WB stages => Need to carry Rs, Rt, Rd values to the ID/EX register from the IF/ID register (only Rd was carried before) °If they match, forward the data to the input of the ALU through the multiplexor. See Fig. 6.43 pp. 488 of the text
18
18 1999 ©UCB Dependencies backward in time are hazards Can’t solve with forwarding alone Must stall instruction dependent on load “Load-Use” hazard Forwarding: What about Loads? lw $1,0($2) sub $4,$1,$3 IFID/RFEXMEMWB ALU IM Reg DM Reg ALU IM Reg DMReg
19
19 1999 ©UCB Must stall pipeline 1 cycle (insert 1 bubble) lw $1, 0($2) sub $4,$1,$6 and $6,$1,$7 or $8,$1,$9 IFID/RFEXMEMWB ALU IM Reg DM Reg ALU IM Reg DMReg ALU IM Reg DMReg IM ALU Reg DM Time (clock cycles) bub ble Data Hazard Even with Forwarding
20
20 1999 ©UCB Compiler Schemes to Improve Load Delay °Compiler will detect data dependency and inserts nop instructions until data is available sub $2, $1, $3 nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) °Compiler will find independent instructions to fill in the delay slots
21
21 1999 ©UCB Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SWd,Rd Software Scheduling to Avoid Load Hazards Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SWd,Rd
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.