Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Similar presentations


Presentation on theme: "1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue."— Presentation transcript:

1 1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue

2 2 The Alpha 21264 Out-of-Order Implementation Branch prediction and instr fetch R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 Instr Fetch Queue Decode & Rename Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Reorder Buffer (ROB) P33  P1+P2 P34  P33+P3 BEQZ P34 P35  P33+P34 P36  P35+P34 Issue Queue (IQ) ALU Register File P1-P64 Results written to regfile and tags broadcast to IQ Speculative Reg Map R1  P36 R2  P34 Committed Reg Map R1  P1 R2  P2

3 Rename A lr1  lr2 + lr3 B lr2  lr4 + lr5 C lr6  lr1 + lr3 D lr6  lr1 + lr2 RAR lr3 RAW lr1 WAR lr2 WAW lr6 A ; BC ; D pr7  pr2 + pr3 pr8  pr4 + pr5 pr9  pr7 + pr3 pr10  pr7 + pr8 RAR pr3 RAW pr7 WAR x WAW x AB ; CD

4 Commit Example A lr1  lr2 + lr3 B lr2  lr4 + lr5 C lr6  lr1 + lr3 D lr6  lr1 + lr2 E lr3  lr6 + lr2 F lr4  lr3 + lr4 pr7  pr2 + pr3 pr8  pr4 + pr5 pr9  pr7 + pr3 pr10  pr7 + pr8 pr1  pr10 + pr8 pr2  pr1 + pr4 Assume a processor with 6 logical regs and 10 physical regs Map Old / New lr1 pr1 pr7 lr2 pr2 pr8 lr6 pr6 pr9 lr6 pr9 pr10 lr3 pr3 pr1 lr4 pr4 pr2

5 5 Out-of-Order Loads/Stores LdR1  [R2] Ld St Ld R3  [R4] R5  [R6] R7  [R8] R9  [R10]

6 6 Memory Dependence Checking Ld0x abcdef Ld St Ld 0x abcdef St0x abcd00 Ld0x abc000 Ld0x abcd00 The issue queue checks for register dependences and executes instructions as soon as registers are ready Loads/stores access memory as well – must check for RAW, WAW, and WAR hazards for memory as well Hence, first check for register dependences to compute effective addresses; then check for memory dependences

7 7 Memory Dependence Checking Ld0x abcdef Ld St Ld 0x abcdef St0x abcd00 Ld0x abc000 Ld0x abcd00 Load and store addresses are maintained in program order in the Load/Store Queue (LSQ) Loads can issue if they are guaranteed to not have true dependences with earlier stores Stores can issue only if we are ready to modify memory (can not recover if an earlier instr raises an exception)

8 8 The Alpha 21264 Out-of-Order Implementation Branch prediction and instr fetch R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 LD R4  8[R3] ST R4  8[R1] Instr Fetch Queue Decode & Rename Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 Instr 7 Reorder Buffer (ROB) P33  P1+P2 P34  P33+P3 BEQZ P34 P35  P33+P34 P36  P35+P34 P37  8[P35] P37  8[P36] Issue Queue (IQ) ALU Register File P1-P64 Results written to regfile and tags broadcast to IQ P37  [P35 + 8] P37  [P36 + 8] LSQ ALU D-Cache Committed Reg Map R1  P1 R2  P2 Speculative Reg Map R1  P36 R2  P34

9 9 Speculative Issue Instr I1 leaves the issue queue at start of cycle 6; the instr then reads operands from the regfile, wires are traversed, instruction executes, result is available at end of cycle 8 If operand availability is broadcast to issue queue in cycle 9, dependent instruction leaves in cycle 10 This causes a 4-cycle gap between successive instrs Hence, if we know that the instruction takes a cycle to execute, the operand is broadcast to the issue queue in cycle 6 and the dependent instr leaves issue queue in cycle 7; the input operand is correctly bypassed at the FU

10 10 Load Hit Speculation The previous optimization assumes that we know the exact latency for every operation This is true for all ops except loads (cache hit or miss?) Assume hit and schedule accordingly; on a cache miss, must squash all speculatively issued instructions; an instruction therefore sits in the queue until load hits are determined

11 Register Rename Logic Map Table Dependence Check Logic Mux Logical Source Regs Logical Dest Regs Logical Source Reg Physical Source Regs Physical Dest Regs Free Pool

12 Map Table – RAM Phys reg id Num entries = Num logical regs Shadow copies (shift register) 7-bits

13 Map Table – CAM Logical reg id Num entries = Num phys regs Shadow copies 5-bits1-bit validvalid

14 Wakeup Logic rdyLrdyRtagRtagL or == tag1tagIW … rdyLrdyRtagRtagL............

15 Selection Logic Issue window reqgrant anyreq enable Arbiter cell For multiple FUs, will need sequential selectors

16 16 Structure Complexities Critical structures: register map tables, issue queue, LSQ, register file, register bypass Cycle time is heavily influenced by: window size (physical register size), issue width (#FUs) Conflict between the desire to increase IPC and clock speed

17 ILP Limits Wall 1993

18 18 Title Bullet


Download ppt "1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue."

Similar presentations


Ads by Google