Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pipelined Implementation : Part II

Similar presentations


Presentation on theme: "Pipelined Implementation : Part II"— Presentation transcript:

1 Pipelined Implementation : Part II
Seoul National University Pipelined Implementation : Part II

2 Make the pipelined processor work!
Seoul National University Overview Make the pipelined processor work! Data Hazards An instruction having register R as source follows shortly after another instruction having register R as destination A common condition, don’t want to slow down pipeline Control Hazards Mispredicted conditional branch Our design predicts all branches as being taken Naïve pipeline executes two extra instructions Getting return address for ret instruction Naïve pipeline executes three extra instructions Making Sure It Really Works What if multiple special cases happen simultaneously?

3 Pipeline Stages Fetch Decode Execute Memory Write Back
Seoul National University Pipeline Stages Fetch Select current PC Read instruction Compute incremented PC Decode Read program registers Execute Operate ALU Memory Read or write data memory Write Back Update register file

4 Data Dependencies: 2 Nop’s
0x000: irmovq $10,% rdx 1 2 3 4 5 6 7 8 9 F D E M W 0x00a: $3,% rax 0x014: nop 0x015: 0x016: addq % ,% 0x018: halt 10 # demo-h2.ys R[ ] f valA = valB Cycle 6 Error

5 Data Dependencies: No Nop
0x000: irmovq $10,% rdx 1 2 3 4 5 6 7 8 F D E M W 0x00a: $3,% rax 0x014: addq % ,% 0x016: halt # demo-h0.ys valA f R[ ] = valB Cycle 4 Error M_ valE = 10 dstE e_ 0 + 3 = 3 E_

6 Stalling for Data Dependencies
1 2 3 4 5 6 7 8 9 10 11 # demo-h2.ys 0x000: irmovq $10,%rdx F D E M W 0x00a: irmovq $3,%rax F D E M W 0x014: nop F D E M W 0x015: nop F D E M W bubble F E M W 0x016: addq %rdx,%rax D D E M W 0x018: halt F F D E M W If instruction follows too closely after one that writes register, slow it down Hold instruction in decode Dynamically inject nop into execute stage

7 Stall Condition Source Registers Destination Registers Special Case
Seoul National University Stall Condition Source Registers srcA and srcB of the instruction in decode stage Destination Registers dstE and dstM fields Instructions in execute, memory, and write-back stages Special Case Don’t stall for register ID 15 (0xF) Indicates absence of register operand Don’t stall for failed conditional move

8 Detecting Stall Condition
Seoul National University Detecting Stall Condition 1 2 3 4 5 6 7 8 9 10 11 # demo-h2.ys 0x000: irmovq $10,%rdx F D E M W 0x00a: irmovq $3,%rax F D E M W 0x014: nop F D E M W 0x015: nop F D E M W bubble F E M W 0x016: addq %rdx,%rax D D E M W 0x018: halt F F D E M W Cycle 6 W D W_dstE = %rax W_valE = 3 srcA = %rdx srcB = %rax

9 Stalling X3 F D E M W F D E M W E M W E M W E M W F D D D D E M W F F
Seoul National University Stalling X3 1 2 3 4 5 6 7 8 9 10 11 # demo-h0.ys 0x000: irmovq $10,%rdx F D E M W 0x00a: irmovq $3,%rax F D E M W bubble E M W bubble E M W bubble E M W 0x014: addq %rdx,%rax F D D D D E M W 0x016: halt F F F F D E M W Cycle 6 W W_dstE = %rax Cycle 5 M M_dstE = %rax Cycle 4 E e_dstE = %rax D srcA = %rdx srcB = %rax D srcA = %rdx srcB = %rax D srcA = %rdx srcB = %rax

10 What Happens When Stalling?
Seoul National University What Happens When Stalling? 0x000: irmovq $10,%rdx 0x00a: irmovq $3,%rax 0x014: addq %rdx,%rax # demo-h0.ys 0x016: halt bubble 0x014: addq %rdx,%rax Cycle 7 0x016: halt bubble Cycle 8 0x014: addq %rdx,%rax 0x016: halt 0x000: irmovq $10,%rdx 0x00a: irmovq $3,%rax 0x014: addq %rdx,%rax Cycle 4 0x016: halt 0x00a: irmovq $3,%rax bubble 0x014: addq %rdx,%rax Cycle 6 0x016: halt 0x000: irmovq $10,%rdx 0x00a: irmovq $3,%rax bubble 0x014: addq %rdx,%rax Cycle 5 0x016: halt Write Back Memory Execute Decode Fetch Stalling instruction held back in decode stage Following instruction stays in fetch stage Bubbles injected into execute stage Like dynamically generated nop’s Move through later stages

11 Implementing Stalling
Seoul National University Implementing Stalling E M W F D CC rB srcA srcB icode valE valM dstE dstM Cnd valA ifun valC valB valP rA predPC d_srcB d_srcA e_Cnd D_icode E_icode M_icode E_dstM Pipe control logic D_bubble D_stall E_bubble F_stall M_bubble W_stall set_cc stat W_stat m_stat Pipeline Control Combinational logic detects stall condition Sets mode signals for how pipeline registers should be updated

12 Pipeline Register Modes
Seoul National University Pipeline Register Modes Rising clock _ Output = y y Output = x Input = y stall = 0 bubble x Normal Rising clock _ Output = x x Output = x Input = y stall = 1 bubble = 0 x Stall n o p Rising clock _ Output = nop Output = x Input = y stall = 0 bubble = 1 Bubble x x

13 Data Forwarding Naïve Pipeline Observation Trick
Seoul National University Data Forwarding Naïve Pipeline Register isn’t written until completion of write-back stage Source operands read from register file in decode stage Observation Value to be written to register generated much earlier (in execute or memory stage) Trick Pass value directly from execute or memory stage of the generating instruction to decode stage Needs to be available at the end of decode stage

14 Data Forwarding Example
Seoul National University Data Forwarding Example 0x000: irmovq $10,% rdx 1 2 3 4 5 6 7 8 9 F D E M W 0x00a: $3,% rax 0x014: nop 0x015: 0x016: addq % ,% 0x018: halt 10 # demo-h2.ys Cycle 6 R[ ] f valA = valB W_ valE dstE = 3 srcA srcB irmovq in write-back stage Destination value in W pipeline register Forward as valB for decode stage

15 Bypass Paths Decode Stage Forwarding Sources
Seoul National University Bypass Paths Decode Stage Forwarding logic selects valA and valB Normally from register file Forwarding: get valA or valB from later pipeline stages Forwarding Sources Execute: valE Memory: valE, valM Write back: valE, valM

16 Data Forwarding Example #2
Seoul National University Data Forwarding Example #2 0x000: irmovq $10,%rdx 1 2 3 4 5 6 7 8 F D E M W 0x00a: irmovq $3,%rax 0x014: addq %rdx,%rax 0x016: halt # demo-h0.ys Cycle 4 valA f M_valE = 10 valB f e_valE = 3 M_dstE = %rdx M_valE = 10 srcA = %rdx srcB = %rax E_dstE = %rax e_valE f = 3 Register %rdx Generated by ALU during previous cycle Forward from memory as valA Register %rax Value just generated by ALU Forward from execute as valB

17 Forwarding Priority Multiple Forwarding Choices
Seoul National University Forwarding Priority 0x000: irmovq $1, %rax 1 2 3 4 5 6 7 8 9 F D E M W 0x00a: irmovq $2, %rax 0x014: irmovq $3, %rax 0x01e: rrmovq %rax, %rdx 0x020: halt 10 # demo-priority.ys W R[ % rax ] f 3 1 D valA rdx = 10 valB ? Cycle 5 M 2 E Multiple Forwarding Choices Which one should have priority Match serial semantics Use matching value from earliest pipeline stage

18 Implementing Forwarding
Seoul National University Implementing Forwarding Add additional feedback paths from E, M, and W pipeline registers into decode stage Create logic blocks to select from multiple sources for valA and valB in decode stage

19 Implementing Forwarding
Seoul National University Implementing Forwarding ## What should be the A value? int new_E_valA = [ # Use incremented PC D_icode in { ICALL, IJXX } : D_valP; # Forward valE from execute d_srcA == e_dstE : e_valE; # Forward valM from memory d_srcA == M_dstM : m_valM; # Forward valE from memory d_srcA == M_dstE : M_valE; # Forward valM from write back d_srcA == W_dstM : W_valM; # Forward valE from write back d_srcA == W_dstE : W_valE; # Use value read from register file 1 : d_rvalA; ];

20 Limitation of Forwarding
Seoul National University Limitation of Forwarding Load-use dependency Value needed by end of decode stage in cycle 7 Value read from memory in memory stage of cycle 8

21 Avoiding Load/Use Hazard
Seoul National University Avoiding Load/Use Hazard

22 Detecting Load/Use Hazard
Seoul National University Detecting Load/Use Hazard Condition Trigger Load/Use Hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }

23 Control for Load/Use Hazard
Seoul National University Control for Load/Use Hazard 0x000: irmovq $128,% rdx 1 2 3 4 5 6 7 8 9 F D E M W 0x00a: $3,% rcx 0x014: rmmovq % , 0(% ) 0x01e: $10,% ebx 0x028: mrmovq 0(% ), rax # Load % # demo - luh . ys 0x032: addq , # Use % 0x034: halt 10 11 bubble 12 Stall instructions in fetch and decode stages Inject bubble into execute stage Condition F D E M W Load/Use Hazard stall bubble normal

24 Branch Misprediction Example
Seoul National University Branch Misprediction Example demo-j.ys 0x000: xorq %rax,%rax 0x002: jne t # Not taken 0x00b: irmovq $1, %rax # Fall through 0x015: nop 0x016: nop 0x017: nop 0x018: halt 0x019: t: irmovq $3, %rdx # Target 0x023: irmovq $4, %rcx # Should not execute 0x02d: irmovq $5, %rdx # Should not execute Should only execute first 8 instructions

25 Handling Misprediction
Seoul National University Handling Misprediction Predict branch as taken Fetch 2 instructions at target Cancel when mispredicted Detect branch not-taken in execute stage On following cycle, replace instructions in execute and decode by bubbles No side effects have occurred yet

26 Detecting Mispredicted Branch
Seoul National University Detecting Mispredicted Branch Condition Trigger Mispredicted Branch E_icode = IJXX & !e_Cnd

27 Control for Misprediction
Seoul National University Control for Misprediction Condition F D E M W Mispredicted Branch normal bubble

28 Return Example Previously executed three additional instructions
Seoul National University Return Example 0x000: irmovq Stack,%rsp # Intialize stack pointer 0x00a: call p # Procedure call 0x013: irmovq $5,%rsi # Return point 0x01d: halt 0x020: .pos 0x20 0x020: p: irmovq $-1,%rdi # procedure 0x02a: ret 0x02b: irmovq $1,%rax # Should not be executed 0x035: irmovq $2,%rcx # Should not be executed 0x03f: irmovq $3,%rdx # Should not be executed 0x049: irmovq $4,%rbx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer Previously executed three additional instructions

29 Correct Return Example
Seoul National University Correct Return Example # demo - retb 0x026: ret F D E M W bubble F D E M W bubble F D E M W bubble F D E M W 0x013: irmovq $5,% rsi # Return F F D D E E M M W W As ret passes through pipeline, stall at fetch stage While in decode, execute, and memory stage Inject bubble into decode stage Release stall when reach write-back stage W valM = 0x0b 0x013 F F valC valC f f 5 5 rB rB f f % % rsi esi

30 IRET in { D_icode, E_icode, M_icode }
Seoul National University Detecting Return Condition Trigger Processing ret IRET in { D_icode, E_icode, M_icode }

31 Control for Return Condition F D E M W Processing ret stall bubble
Seoul National University Control for Return # demo - retb 0x026: ret F D E M W bubble F D E M W bubble F D E M W bubble F D E M W 0x014: irmovq $5,% rsi # Return F F D D E E M M W W Condition F D E M W Processing ret stall bubble normal

32 Special Control Cases Detection Action (on next cycle) Condition
Seoul National University Special Control Cases Detection Action (on next cycle) Condition Trigger Processing ret IRET in { D_icode, E_icode, M_icode } Load/Use Hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } Mispredicted Branch E_icode = IJXX & !e_Cnd Condition F D E M W Processing ret stall bubble normal Load/Use Hazard Mispredicted Branch

33 Implementing Pipeline Control
Seoul National University Implementing Pipeline Control E M W F D CC rB srcA srcB icode valE valM dstE dstM Cnd valA ifun valC valB valP rA predPC d_srcB d_srcA e_Cnd D_icode E_icode M_icode E_dstM Pipe control logic D_bubble D_stall E_bubble F_stall M_bubble W_stall set_cc stat W_stat m_stat Combinational logic generates pipeline control signals

34 Initial Version of Pipeline Control
Seoul National University Initial Version of Pipeline Control bool F_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; bool D_stall = E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }; bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || bool E_bubble = # Load/use hazard

35 Control Combinations Combination A Combination B
Seoul National University Control Combinations Special cases that can arise during the same clock cycle Combination A Not-taken branch ret instruction at branch target Combination B Instruction that reads from memory to %esp Followed by ret instruction

36 Control Combination A Should handle as mispredicted branch
Seoul National University Control Combination A JXX E D M Mispredict ret 1 Combination A Condition F D E M W Processing ret stall bubble normal Mispredicted Branch Combination Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M_valA anyhow

37 Seoul National University
Control Combination B Load E Use D M Load/use ret 1 Combination B Condition F D E M W Processing ret stall bubble normal Load/Use Hazard Combination bubble + stall Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error

38 Handling Control Combination B
Seoul National University Handling Control Combination B Load E Use D M Load/use ret 1 Combination B Condition F D E M W Processing ret stall bubble normal Load/Use Hazard Combination Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle

39 Corrected Pipeline Control Logic
Seoul National University Corrected Pipeline Control Logic bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }); Condition F D E M W Processing ret stall bubble normal Load/Use Hazard Combination Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle

40 Pipeline Summary Data Hazards Control Hazards Control Combinations
Seoul National University Pipeline Summary Data Hazards Most handled by forwarding No performance penalty Load/use hazard requires one cycle stall Control Hazards Cancel instructions when detect mispredicted branch Two clock cycles wasted Stall fetch stage while ret passes through pipeline Three clock cycles wasted Control Combinations Must analyze carefully First version had subtle bug Only arises with unusual instruction combination


Download ppt "Pipelined Implementation : Part II"

Similar presentations


Ads by Google