Download presentation
Presentation is loading. Please wait.
1
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control March 17, 2003 John Kubiatowicz (www.cs.berkeley.edu/~kubitron) lecture slides: http://inst.eecs.berkeley.edu/~cs152/
2
CS152 / Kubiatowicz Lec13.2 3/17/03©UCB Spring 2003 Recap: Sequential Laundry °Sequential laundry takes 6 hours for 4 loads °If they learned pipelining, how long would laundry take? ABCD 304020304020304020304020 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time
3
CS152 / Kubiatowicz Lec13.3 3/17/03©UCB Spring 2003 Recap: Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Pipeline rate limited by slowest pipeline stage °Multiple tasks operating simultaneously using different resources °Potential speedup = Number pipe stages °Unbalanced lengths of pipe stages reduces speedup °Time to “fill” pipeline and time to “drain” it reduces speedup °Stall for Dependences ABCD 6 PM 789 TaskOrderTaskOrder Time 3040 20
4
CS152 / Kubiatowicz Lec13.4 3/17/03©UCB Spring 2003 Recap: Ideal Pipelining IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB Maximum Speedup Number of stages Speedup Time for unpipelined operation Time for longest stage Example: 40ns data path, 5 stages, Longest stage is 10 ns, Speedup 4 Assume instructions are completely independent!
5
CS152 / Kubiatowicz Lec13.5 3/17/03©UCB Spring 2003 °The Five Classic Components of a Computer °Today’s Topics: Recap last lecture/finish datapath Pipelined Control/ Do it yourself Pipelined Control Administrivia Hazards/Forwarding Exceptions Review MIPS R3000 pipeline The Big Picture: Where are We Now? Control Datapath Memory Processor Input Output
6
CS152 / Kubiatowicz Lec13.6 3/17/03©UCB Spring 2003 Recap: Can pipelining get us into trouble? °Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready -E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer -instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaulated -E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in -branch instructions °Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards
7
CS152 / Kubiatowicz Lec13.7 3/17/03©UCB Spring 2003 Mem Single Memory is a Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg Detection is easy in this case! (right half highlight means read, left half write)
8
CS152 / Kubiatowicz Lec13.8 3/17/03©UCB Spring 2003 Structural Hazards limit performance °Example: if 1.3 memory accesses per instruction and only one memory access per cycle then average CPI 1.3 otherwise resource is more than 100% utilized
9
CS152 / Kubiatowicz Lec13.9 3/17/03©UCB Spring 2003 °Stall: wait until decision is clear °Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) => slow °Move decision to end of decode save 1 cycle per branch Control Hazard Solution #1: Stall I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg Mem Lost potential
10
CS152 / Kubiatowicz Lec13.10 3/17/03©UCB Spring 2003 °Predict: guess one direction then back up if wrong °Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right 50% of time) Need to “Squash” and restart following instruction if wrong Produce CPI on branch of (1 *.5 + 2 *.5) = 1.5 Total CPI might then be: 1.5 *.2 + 1 *.8 = 1.1 (20% branch) °More dynamic scheme: history of 1 branch ( 90%) Control Hazard Solution #2: Predict I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg
11
CS152 / Kubiatowicz Lec13.11 3/17/03©UCB Spring 2003 °Delayed Branch: Redefine branch behavior (takes place after next instruction) °Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time) °As launch more instruction per clock cycle, less useful Control Hazard Solution #3: Delayed Branch I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Load Mem ALU Reg MemReg
12
CS152 / Kubiatowicz Lec13.12 3/17/03©UCB Spring 2003 Data Hazard on r1: Read after write hazard (RAW) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11
13
CS152 / Kubiatowicz Lec13.13 3/17/03©UCB Spring 2003 Dependencies backwards in time are hazards I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg Data Hazard on r1: Read after write hazard (RAW)
14
CS152 / Kubiatowicz Lec13.14 3/17/03©UCB Spring 2003 “Forward” result from one stage to another “or” OK if define read/write properly Data Hazard Solution: Forwarding I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg
15
CS152 / Kubiatowicz Lec13.15 3/17/03©UCB Spring 2003 Reg Dependencies backwards in time are hazards Can’t solve with forwarding: Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads? Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm ALU Im Reg DmReg
16
CS152 / Kubiatowicz Lec13.16 3/17/03©UCB Spring 2003 Reg Dependencies backwards in time are hazards Can’t solve with forwarding: Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm ALU Im Reg DmReg Stall
17
CS152 / Kubiatowicz Lec13.17 3/17/03©UCB Spring 2003 Recap: Data Hazards I-Fet ch DCD MemOpFetch OpFetch Exec Store IFetch DCD ° ° ° Structural Hazard I-Fet ch DCD OpFetch Jump IFetch DCD ° ° ° Control Hazard IF DCD EX Mem WB IF DCD OF Ex Mem RAW (read after write) Data Hazard WAW Data Hazard (write after write) IF DCD OF Ex RSWAR Data Hazard (write after read) IF DCD EX Mem WB
18
CS152 / Kubiatowicz Lec13.18 3/17/03©UCB Spring 2003 Designing a Pipelined Processor °Go back and examine your datapath and control diagram °associated resources with states °ensure that flows do not conflict, or figure out how to resolve conflicts °assert control in appropriate stage
19
CS152 / Kubiatowicz Lec13.19 3/17/03©UCB Spring 2003 Control and Datapath: Split state diag into 5 pieces IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem ABS Reg File Equal PC Next PC IR Inst. Mem DM
20
CS152 / Kubiatowicz Lec13.20 3/17/03©UCB Spring 2003 Pipelined Processor (almost) for slides °What happens if we start a new instruction every cycle? Exec Reg. File Mem Acces s Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D
21
CS152 / Kubiatowicz Lec13.21 3/17/03©UCB Spring 2003 Pipelined Datapath (as in book); hard to read
22
CS152 / Kubiatowicz Lec13.22 3/17/03©UCB Spring 2003 Administrivia °Midterm Results: Avg: 70, StdDev= 9.5, Max=88, Low = 50 °Lab 4: How are you doing? Control: Behavioral Verilog. Probably using CASE statement Data: Schematics on top of your components High-level simulation! 505560657075808590 0 1 2 3 4 5 Miderm 1 distribution score
23
CS152 / Kubiatowicz Lec13.23 3/17/03©UCB Spring 2003 Pipelining the Load Instruction °The five independent functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register File’s Read ports (bus A and busB) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register File’s Write port (bus W) for the Wr stage Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7 IfetchReg/DecExecMemWr1st lw IfetchReg/DecExecMemWr2nd lw IfetchReg/DecExecMemWr3rd lw
24
CS152 / Kubiatowicz Lec13.24 3/17/03©UCB Spring 2003 The Four Stages of R-type °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: ALU operates on the two register operands Update PC °Wr: Write the ALU output back to the register file Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecWrR-type
25
CS152 / Kubiatowicz Lec13.25 3/17/03©UCB Spring 2003 Pipelining the R-type and Load Instruction °We have pipeline conflict or structural hazard: Two instructions try to write to the register file at the same time! Only one write port Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type Ops! We have a problem!
26
CS152 / Kubiatowicz Lec13.26 3/17/03©UCB Spring 2003 Important Observation °Each functional unit can only be used once per instruction °Each functional unit must be used at the same stage for all instructions: Load uses Register File’s Write Port during its 5th stage R-type uses Register File’s Write Port during its 4th stage IfetchReg/DecExecMemWrLoad 12345 IfetchReg/DecExecWrR-type 1234 °2 ways to solve this pipeline hazard.
27
CS152 / Kubiatowicz Lec13.27 3/17/03©UCB Spring 2003 Solution 1: Insert “Bubble” into the Pipeline °Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle The control logic can be complex. Lose instruction fetch and issue opportunity. °No instruction is started in Cycle 6! Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExec IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWr R-type IfetchReg/DecExecWr R-type Pipeline Bubble IfetchReg/DecExecWr
28
CS152 / Kubiatowicz Lec13.28 3/17/03©UCB Spring 2003 Solution 2: Delay R-type’s Write by One Cycle °Delay R-type’s register write by one cycle: Now R-type instructions also use Reg File’s write port at Stage 5 Mem stage is a NOOP stage: nothing is being done. Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/Dec Exec WrR-type Mem Exec 123 4 5
29
CS152 / Kubiatowicz Lec13.29 3/17/03©UCB Spring 2003 Modified Control & Datapath IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– M; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– M; S <– A + SX; Mem[S] <- B if Cond PC < PC+SX; M <– S Exec Reg. File Mem Acces s Data Mem AB S Reg File Equal PC Next PC IR Inst. Mem DM M <– S
30
CS152 / Kubiatowicz Lec13.30 3/17/03©UCB Spring 2003 The Four Stages of Store °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Calculate the memory address °Mem: Write the data into the Data Memory Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemStoreWr
31
CS152 / Kubiatowicz Lec13.31 3/17/03©UCB Spring 2003 The Three Stages of Beq °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: compares the two register operand, select correct branch target address latch into PC Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemBeqWr
32
CS152 / Kubiatowicz Lec13.32 3/17/03©UCB Spring 2003 Control Diagram IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem AB S Reg File Equal PC Next PC IR Inst. Mem D M <– S M
33
CS152 / Kubiatowicz Lec13.33 3/17/03©UCB Spring 2003 Recall: Single cycle control! Data Out Clk 5 RwRaRb 32 32-bit Registers Rd ALU Clk Data In Data Address Ideal Data Memory Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 32 A B Next Address Control Datapath Control Signals Conditions
34
CS152 / Kubiatowicz Lec13.34 3/17/03©UCB Spring 2003 Data Stationary Control °The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr
35
CS152 / Kubiatowicz Lec13.35 3/17/03©UCB Spring 2003 Datapath + Data Stationary Control Exec Reg. File Mem Acces s Data Mem ABS Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt op rs rt fun im ex me wb rw v me wb rw v wb rw v
36
CS152 / Kubiatowicz Lec13.36 3/17/03©UCB Spring 2003 Let’s Try it Out 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 these addresses are octal
37
CS152 / Kubiatowicz Lec13.37 3/17/03©UCB Spring 2003 Start: Fetch 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 IF PC Next PC 10 = nnnn
38
CS152 / Kubiatowicz Lec13.38 3/17/03©UCB Spring 2003 Fetch 14, Decode 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1, r2(35) ID IF PC Next PC 14 = nnn
39
CS152 / Kubiatowicz Lec13.39 3/17/03©UCB Spring 2003 Fetch 20, Decode 14, Exec 10 Exec Reg. File Mem Acces s Data Mem r2 BS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt 35 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 addI r2, r2, 3 ID IF EX PC Next PC 20 = n n
40
CS152 / Kubiatowicz Lec13.40 3/17/03©UCB Spring 2003 Fetch 24, Decode 20, Exec 14, Mem 10 Exec Reg. File Mem Acces s Data Mem r2 B r2+35 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 45 3 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 sub r3, r4, r5 addI r2, r2, 3 ID IF EX M PC Next PC 24 = n
41
CS152 / Kubiatowicz Lec13.41 3/17/03©UCB Spring 2003 Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 Exec Reg. File Mem Acces s Data Mem r4 r5 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+35] 67 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 beq r6, r7 100 addI r2 sub r3 ID IF EX M WB PC Next PC 30 = Note Delayed Branch: always execute ori after beq
42
CS152 / Kubiatowicz Lec13.42 3/17/03©UCB Spring 2003 Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 Exec Reg. File Mem Acces s Data Mem r6 r7 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 9xx 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17 ID IF EX M WB PC Next PC 100 =
43
CS152 / Kubiatowicz Lec13.43 3/17/03©UCB Spring 2003 Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 ID EX M WB PC Next PC ___ = Fill it in yourself! ?
44
CS152 / Kubiatowicz Lec13.44 3/17/03©UCB Spring 2003 Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 EX M WB PC Next PC ___ = Fill it in yourself! ?? ? ? ?
45
CS152 / Kubiatowicz Lec13.45 3/17/03©UCB Spring 2003 Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 M WB PC Next PC ___ = Fill it in yourself! ?? ? ? ? ?
46
CS152 / Kubiatowicz Lec13.46 3/17/03©UCB Spring 2003 Pipelined Processor °Separate control at each stage °Stalls propagate backwards to freeze previous stages °Bubbles in pipeline introduced by placing “Noops” into local stage, stall previous stages. Exec Reg. File Mem Acces s Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D Stalls Bubbles
47
CS152 / Kubiatowicz Lec13.47 3/17/03©UCB Spring 2003 Recap: Data Hazards °Avoid some “by design” eliminate WAR by always fetching operands early (DCD) in pipe eleminate WAW by doing all WBs in order (last stage, static) °Detect and resolve remaining ones stall or forward (if possible) IF DCD EX Mem WB IF DCD OF Ex Mem RAW Data Hazard WAW Data Hazard IF DCD OF Ex RSWAR Data Hazard IF DCD EX Mem WB
48
CS152 / Kubiatowicz Lec13.48 3/17/03©UCB Spring 2003 Hazard Detection °Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. °A RAW hazard exists on register if Rregs( i ) Wregs( j ) Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. When instruction issues, reserve its result register. When on operation completes, remove its write reservation. °A WAW hazard exists on register if Wregs( i ) Wregs( j ) °A WAR hazard exists on register if Wregs( i ) Rregs( j ) Window on execution: Only pending instructions can cause exceptions Inst J Inst I New Inst Instruction Movement:
49
CS152 / Kubiatowicz Lec13.49 3/17/03©UCB Spring 2003 Record of Pending Writes In Pipeline Registers °Current operand registers °Pending writes °hazard <= ((rs == rw ex) & regW ex ) OR ((rs == rw mem) & regW me ) OR ((rs == rw wb) & regW wb ) OR ((rt == rw ex) & regW ex ) OR ((rt == rw mem) & regW me ) OR ((rt == rw wb ) & regW wb ) npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt
50
CS152 / Kubiatowicz Lec13.50 3/17/03©UCB Spring 2003 Resolve RAW by “forwarding” (or bypassing) °Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt Forward mux
51
CS152 / Kubiatowicz Lec13.51 3/17/03©UCB Spring 2003 What about memory operations? AB op Rd Ra Rb Rd to reg file R Rd ºIf instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations! º What does delaying WB on arithmetic operations cost? – cycles ? – hardware ? ºWhat about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 “Delayed Loads” ºCan recognize this in decode stage and introduce bubble while stalling fetch stage (hint for lab 5!) ºTricky situation: R1 <- Mem[ R2 + I ] Mem[R3+34] <- R1 Handle with bypass in memory stage! D Mem T
52
CS152 / Kubiatowicz Lec13.52 3/17/03©UCB Spring 2003 Compiler Avoiding Load Stalls:
53
CS152 / Kubiatowicz Lec13.53 3/17/03©UCB Spring 2003 What about Interrupts, Traps, Faults? °External Interrupts: Allow pipeline to drain, Fill with NOPs Load PC with interrupt address °Faults (within instruction, restartable) Force trap instruction into IF disable writes till trap hits WB must save multiple PCs or PC + state °Recall: Precise Exceptions State of the machine is preserved as if program executed up to the offending instruction All previous instructions completed Offending instruction and all following instructions act as if they have not even started Same system code will work on different implementations
54
CS152 / Kubiatowicz Lec13.54 3/17/03©UCB Spring 2003 Exception/Interrupts: Implementation questions 5 instructions, executing in 5 different pipeline stages! °Who caused the interrupt? StageProblem interrupts occurring IFPage fault on instruction fetch; misaligned memory access; memory-protection violation IDUndefined or illegal opcode EXArithmetic exception MEMPage fault on data fetch; misaligned memory access; memory-protection violation; memory error °How do we stop the pipeline? How do we restart it? °Do we interrupt immediately or wait? °How do we sort all of this out to maintain preciseness?
55
CS152 / Kubiatowicz Lec13.55 3/17/03©UCB Spring 2003 Exception Handling npc I mem Regs B alu S D mem m IAU PC lw $2,20($5) Regs A imoprwn detect bad instruction address detect bad instruction detect overflow detect bad data address Allow exception to take effect Excp
56
CS152 / Kubiatowicz Lec13.56 3/17/03©UCB Spring 2003 Another look at the exception problem °Use pipeline to sort this out! Pass exception status along with instruction. Keep track of PCs for every instruction in pipeline. Don’t act on exception until it reache WB stage °Handle interrupts through “faulting noop” in IF stage °When instruction reaches end of MEM stage: Save PC EPC, Interrupt vector addr PC Turn all instructions in earlier stages into noops! Program Flow Time IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB Data TLB Bad Inst Inst TLB fault Overflow
57
CS152 / Kubiatowicz Lec13.57 3/17/03©UCB Spring 2003 Resolution: Freeze above & Bubble Below °Flush accomplished by setting “invalid” bit in pipeline npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt bubble freeze
58
CS152 / Kubiatowicz Lec13.58 3/17/03©UCB Spring 2003 FYI: MIPS R3000 clocking discipline °2-phase non-overlapping clocks °Pipeline stage is two (level sensitive) latches phi1 phi2 phi1 phi2 Edge-triggered
59
CS152 / Kubiatowicz Lec13.59 3/17/03©UCB Spring 2003 MIPS R3000 Instruction Pipeline Inst Fetch Decode Reg. Read ALU / E.AMemoryWrite Reg TLB I-Cache RF Operation WB E.A. TLB D-Cache TLB I-cache RF ALUALU TLB D-Cache WB Resource Usage Write in phase 1, read in phase 2 => eliminates bypass from WB
60
CS152 / Kubiatowicz Lec13.60 3/17/03©UCB Spring 2003 Recall: Data Hazard on r1 I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg With MIPS R3000 pipeline, no need to forward from WB stage
61
CS152 / Kubiatowicz Lec13.61 3/17/03©UCB Spring 2003 MIPS R3000 Multicycle Operations Ex: Multiply, Divide, Cache Miss Use control word of local stage to step through multicycle operation Stall all stages above multicycle operation in the pipeline Drain (bubble) stages below it Alternatively, launch multiply/divide to autonomous unit, only stall pipe if attempt to get result before ready - This means stall mflo/mfhi in decode stage if multiply/divide still executing - Extra credit in Lab 5 does this AB op Rd Ra Rb mul Rd Ra Rb Rd to reg file R T Rd
62
CS152 / Kubiatowicz Lec13.62 3/17/03©UCB Spring 2003 Is CPI = 1 for our pipeline? °Remember that CPI is an “Average # cycles/inst °CPI here is 1, since the average throughput is 1 instruction every cycle. °What if there are stalls or multi-cycle execution? °Usually CPI > 1. How close can we get to 1?? IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB
63
CS152 / Kubiatowicz Lec13.63 3/17/03©UCB Spring 2003 Summary °What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores °Hazards limit performance Structural: need more HW resources Data: need forwarding, compiler scheduling Control: early evaluation & PC, delayed branch, prediction °Data hazards must be handled carefully: RAW data hazards handled by forwarding WAW and WAR hazards don’t exist in 5-stage pipeline °MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) °Exceptions in 5-stage pipeline recorded when they occur, but acted on only at WB (end of MEM) stage Must flush all previous instructions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.