Download presentation
Presentation is loading. Please wait.
Published byCruz Musgrove Modified over 9 years ago
1
CS152 Computer Architecture and Engineering Lecture 12 Introduction to Pipelining: Datapath and Control March 8 th, 2004 John Kubiatowicz (www.cs.berkeley.edu/~kubitron) lecture slides: http://inst.eecs.berkeley.edu/~cs152/
2
CS152 / Kubiatowicz Lec12.2 3/8/04©UCB Spring 2004 °The Five Classic Components of a Computer °Today’s Topics: Recap last lecture/finish datapath Pipelined Control/ Do it yourself Pipelined Control Administrivia Hazards/Forwarding Exceptions Review MIPS R3000 pipeline The Big Picture: Where are We Now? Control Datapath Memory Processor Input Output
3
CS152 / Kubiatowicz Lec12.3 3/8/04©UCB Spring 2004 Can pipelining get us into trouble? °Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready -E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer -instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaulated -E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in -branch instructions °Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards
4
CS152 / Kubiatowicz Lec12.4 3/8/04©UCB Spring 2004 Recap: Data Hazards I-Fet ch DCD MemOpFetch OpFetch Exec Store IFetch DCD ° ° ° Structural Hazard I-Fet ch DCD OpFetch Jump IFetch DCD ° ° ° Control Hazard IF DCD EX Mem WB IF DCD OF Ex Mem RAW (read after write) Data Hazard WAW Data Hazard (write after write) IF DCD OF Ex RSWAR Data Hazard (write after read) IF DCD EX Mem WB
5
CS152 / Kubiatowicz Lec12.5 3/8/04©UCB Spring 2004 Recall: Single cycle control! Data Out Clk 5 RwRaRb 32 32-bit Registers Rd ALU Clk Data In Data Address Ideal Data Memory Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 32 A B Next Address Control Datapath Control Signals Conditions
6
CS152 / Kubiatowicz Lec12.6 3/8/04©UCB Spring 2004 Data Stationary Control °The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr
7
CS152 / Kubiatowicz Lec12.7 3/8/04©UCB Spring 2004 Datapath + Data Stationary Control Exec Reg. File Mem Acces s Data Mem ABS Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt op rs rt fun im ex me wb rw v me wb rw v wb rw v
8
CS152 / Kubiatowicz Lec12.8 3/8/04©UCB Spring 2004 Let’s Try it Out 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 these addresses are octal
9
CS152 / Kubiatowicz Lec12.9 3/8/04©UCB Spring 2004 Start: Fetch 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 IF PC Next PC 10 = nnnn
10
CS152 / Kubiatowicz Lec12.10 3/8/04©UCB Spring 2004 Fetch 14, Decode 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1, r2(35) ID IF PC Next PC 14 = nnn
11
CS152 / Kubiatowicz Lec12.11 3/8/04©UCB Spring 2004 Fetch 20, Decode 14, Exec 10 Exec Reg. File Mem Acces s Data Mem r2 BS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt 35 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 addI r2, r2, 3 ID IF EX PC Next PC 20 = n n
12
CS152 / Kubiatowicz Lec12.12 3/8/04©UCB Spring 2004 Fetch 24, Decode 20, Exec 14, Mem 10 Exec Reg. File Mem Acces s Data Mem r2 B r2+35 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 45 3 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 sub r3, r4, r5 addI r2, r2, 3 ID IF EX M PC Next PC 24 = n
13
CS152 / Kubiatowicz Lec12.13 3/8/04©UCB Spring 2004 Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 Exec Reg. File Mem Acces s Data Mem r4 r5 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+35] 67 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 beq r6, r7 100 addI r2 sub r3 ID IF EX M WB PC Next PC 30 = Note Delayed Branch: always execute ori after beq
14
CS152 / Kubiatowicz Lec12.14 3/8/04©UCB Spring 2004 Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 Exec Reg. File Mem Acces s Data Mem r6 r7 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 9xx 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17 ID IF EX M WB PC Next PC 100 =
15
CS152 / Kubiatowicz Lec12.15 3/8/04©UCB Spring 2004 Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 ID EX M WB PC Next PC ___ = Fill it in yourself! ?
16
CS152 / Kubiatowicz Lec12.16 3/8/04©UCB Spring 2004 Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 EX M WB PC Next PC ___ = Fill it in yourself! ?? ? ? ?
17
CS152 / Kubiatowicz Lec12.17 3/8/04©UCB Spring 2004 Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 M WB PC Next PC ___ = Fill it in yourself! ?? ? ? ? ?
18
CS152 / Kubiatowicz Lec12.18 3/8/04©UCB Spring 2004 Pipelined Processor °Separate control at each stage °Stalls propagate backwards to freeze previous stages °Bubbles in pipeline introduced by placing “Noops” into local stage, stall previous stages. Exec Reg. File Mem Acces s Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D Stalls Bubbles
19
CS152 / Kubiatowicz Lec12.19 3/8/04©UCB Spring 2004 Recap: Data Hazards °Avoid some “by design” eliminate WAR by always fetching operands early (DCD) in pipe eleminate WAW by doing all WBs in order (last stage, static) °Detect and resolve remaining ones stall or forward (if possible) IF DCD EX Mem WB IF DCD OF Ex Mem RAW Data Hazard WAW Data Hazard IF DCD OF Ex RSWAR Data Hazard IF DCD EX Mem WB
20
CS152 / Kubiatowicz Lec12.20 3/8/04©UCB Spring 2004 Is CPI = 1 for our pipeline? °Remember that CPI is an “Average # cycles/inst °CPI here is 1, since the average throughput is 1 instruction every cycle. °What if there are stalls or multi-cycle execution? °Usually CPI > 1. How close can we get to 1?? IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB
21
CS152 / Kubiatowicz Lec12.21 3/8/04©UCB Spring 2004 Administrivia °Midterm I: Wednesday! (3/10) 310 Soda Hall, 5:30 – 8:30 One sheet of notes (both sides) Afterwards, pizza at LaVals (I’ll buy!)b °Topics: Chapters 1-5, some knowledge about about pipelining Should know material in Appendices A-C Could be questions about state machine design (Prereq quiz material is fair game!) °Review Session Tuesday: 7:00 – 9:00 Location TBA (405?) °Homework 4: Not out yet I may move the deadline °Lab 3 due Thursday! (3/11) Demonstrate to Tas in section Report due by midnight that day
22
CS152 / Kubiatowicz Lec12.22 3/8/04©UCB Spring 2004 Hazard Detection °Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. °A RAW hazard exists on register if Rregs( i ) Wregs( j ) Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. When instruction issues, reserve its result register. When on operation completes, remove its write reservation. °A WAW hazard exists on register if Wregs( i ) Wregs( j ) °A WAR hazard exists on register if Wregs( i ) Rregs( j ) Window on execution: Only pending instructions can cause hazards Inst J Inst I New Inst Instruction Movement:
23
CS152 / Kubiatowicz Lec12.23 3/8/04©UCB Spring 2004 “Forward” result from one stage to another “or” OK if define read/write properly Data Hazard Solution: Forwarding I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg
24
CS152 / Kubiatowicz Lec12.24 3/8/04©UCB Spring 2004 Record of Pending Writes In Pipeline Registers °Current operand registers °Pending writes °hazard <= ((rs == rw ex) & regW ex ) OR ((rs == rw mem) & regW me ) OR ((rs == rw wb) & regW wb ) OR ((rt == rw ex) & regW ex ) OR ((rt == rw mem) & regW me ) OR ((rt == rw wb ) & regW wb ) npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt
25
CS152 / Kubiatowicz Lec12.25 3/8/04©UCB Spring 2004 Resolve RAW by “forwarding” (or bypassing) °Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt Forward mux
26
CS152 / Kubiatowicz Lec12.26 3/8/04©UCB Spring 2004 Question: Critical Path??? °Bypass path is invariably trouble °Options? Make logic really fast Move forwarding after muxes -Problem: screws up branches that require forwarding! -Use same tricks as “carry-skip” adder to fix this? -This option may just push delay around….! Insert an extra cycle for branches that need forwarding? -Or: hit common case of forwarding from EX stage and stall for forward from memory? Regs B alu S D mem m Regs A im Forward mux Equal PC Sel
27
CS152 / Kubiatowicz Lec12.27 3/8/04©UCB Spring 2004 What about memory operations? AB op Rd Ra Rb Rd to reg file R Rd ºIf instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations! º What does delaying WB on arithmetic operations cost? – cycles ? – hardware ? ºWhat about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 “Delayed Loads” ºCan recognize this in decode stage and introduce bubble while stalling fetch stage (hint for lab 4!) ºTricky situation: R1 <- Mem[ R2 + I ] Mem[R3+34] <- R1 Handle with bypass in memory stage! D Mem T
28
CS152 / Kubiatowicz Lec12.28 3/8/04©UCB Spring 2004 Compiler Avoiding Load Stalls: °Recall: MIPS I had no pipeline stalls “Microprocessor without Interlocking Pipeline Stages Consequently, the “Unscheduled” code above would be wrong % loads stalling pipeline 0%20%40%60%80% tex spice gcc 25% 14% 31% 65% 42% 54% scheduledunscheduled
29
CS152 / Kubiatowicz Lec12.29 3/8/04©UCB Spring 2004 What about Interrupts, Traps, Faults? °External Interrupts: Allow pipeline to drain, Fill with NOPs Load PC with interrupt address °Faults (within instruction, restartable) Force trap instruction into IF disable writes till trap hits WB must save multiple PCs or PC + state °Recall: Precise Exceptions State of the machine is preserved as if program executed up to the offending instruction All previous instructions completed Offending instruction and all following instructions act as if they have not even started Same system code will work on different implementations
30
CS152 / Kubiatowicz Lec12.30 3/8/04©UCB Spring 2004 Exception/Interrupts: Implementation questions 5 instructions, executing in 5 different pipeline stages! °Who caused the interrupt? StageProblem interrupts occurring IFPage fault on instruction fetch; misaligned memory access; memory-protection violation IDUndefined or illegal opcode EXArithmetic exception MEMPage fault on data fetch; misaligned memory access; memory-protection violation; memory error °How do we stop the pipeline? How do we restart it? °Do we interrupt immediately or wait? °How do we sort all of this out to maintain preciseness?
31
CS152 / Kubiatowicz Lec12.31 3/8/04©UCB Spring 2004 Exception Handling npc I mem Regs B alu S D mem m IAU PC lw $2,20($5) Regs A imoprwn detect bad instruction address detect bad instruction detect overflow detect bad data address Allow exception to take effect Excp
32
CS152 / Kubiatowicz Lec12.32 3/8/04©UCB Spring 2004 Another look at the exception problem °Use pipeline to sort this out! Pass exception status along with instruction. Keep track of PCs for every instruction in pipeline. Don’t act on exception until it reache WB stage °Handle interrupts through “faulting noop” in IF stage °When instruction reaches end of MEM stage: Save PC EPC, Interrupt vector addr PC Turn all instructions in earlier stages into noops! Program Flow Time IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB Data TLB Bad Inst Inst TLB fault Overflow
33
CS152 / Kubiatowicz Lec12.33 3/8/04©UCB Spring 2004 Examples of stalls/bubbles °Exceptions: Flush everything above (later) Prevent instructions following exception from commiting state Put faulting flag on current instruction Freeze fetch until exception resolved °Stalls: Introduce brief stalls into pipeline Decode stage recognizes that current instruction cannot proceed Freeze fetch stage Introduce “bubble” into EX stage (instead of forwarding stalled inst) Can stall until condition is resolved Examples: -mfhi, mflo: need to wait for multiply/divide unit to finish -“Break” instruction for Lab5: stall until release line received -Load delay slot handled this way as well
34
CS152 / Kubiatowicz Lec12.34 3/8/04©UCB Spring 2004 Example: Delayed Load – Freeze above & Bubble Below °Flush accomplished by setting “invalid” bit in pipeline npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt bubble freeze
35
CS152 / Kubiatowicz Lec12.35 3/8/04©UCB Spring 2004 Summary °Hazards limit performance Structural: need more HW resources Data: need forwarding, compiler scheduling Control: early evaluation & PC, delayed branch, prediction °Data hazards must be handled carefully: RAW data hazards handled by forwarding WAW and WAR hazards don’t exist in 5-stage pipeline Some hazards handled by stalling °MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) °Exceptions in 5-stage pipeline recorded when they occur, but acted on only at WB (end of MEM) stage Must flush all previous instructions °Compiler optimizations may be used to avoid stalls Loop unrolling Multiple iterations of loop in software: -Amortizes loop overhead over several iterations -Gives more opportunity for scheduling around stalls
36
CS152 / Kubiatowicz Lec12.36 3/8/04©UCB Spring 2004 Summary: Where this class is going °We’ll build a simple pipeline and look at these issues Lab 4 Pipelined Processor Lab 5 With caches °We’ll talk about modern processors and what’s really hard: Branches (control hazards) are really hard! Exception handling Trying to improve performance with out-of-order execution, etc. Trying to get CPI < 1 (Superscalar execution)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.