CS6290 Speculation Recovery. Loose Ends Up to now: –Techniques for handling register dependencies Register renaming for WAR, WAW Tomasulo’s algorithm.

Slides:



Advertisements
Similar presentations
CSE502: Computer Architecture Out-of-Order Memory Access.
Advertisements

CSE502: Computer Architecture Instruction Commit.
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
CS 6290 Instruction Level Parallelism. Instruction Level Parallelism (ILP) Basic idea: Execute several instructions in parallel We already do pipelining…
Lecture 7: Register Renaming. 2 A: R1 = R2 + R3 B: R4 = R1 * R R1 R2 R3 R4 Read-After-Write A A B B
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
EECS 470 Lecture 8 RS/ROB examples True Physical Registers? Project.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
CS6290 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
ECE 4100/6100 Advanced Computer Architecture Lecture 8 Dynamic Scheduling (II) Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering.
ECE 2162 Instruction Level Parallelism. 2 Instruction Level Parallelism (ILP) Basic idea: Execute several instructions in parallel We already do pipelining…
Trace cache and Back-end Oper. CSE 4711 Instruction Fetch Unit Using I-cache I-cache I-TLB Decoder Branch Pred Register renaming Execution units.
Lecture 11: Memory Scheduling. Lecture 13: Memory Scheduling 2 If R1 != R7, then Load R8 gets correct value from cache If R1 == R7, then Load R8 should.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
OOO Pipelines - II Smruti R. Sarangi IIT Delhi 1.
OOO Pipelines - III Smruti R. Sarangi Computer Science and Engineering, IIT Delhi.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.
CS203 – Advanced Computer Architecture ILP and Speculation.
Lecture: Out-of-order Processors
CSE 502: Computer Architecture
Data Prefetching Smruti R. Sarangi.
/ Computer Architecture and Design
Smruti R. Sarangi IIT Delhi
CSE 502: Computer Architecture
CS203 – Advanced Computer Architecture
Lecture: Out-of-order Processors
CS5100 Advanced Computer Architecture Hardware-Based Speculation
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
Sequential Execution Semantics
Lecture 6: Advanced Pipelines
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Smruti R. Sarangi IIT Delhi
Lecture 11: Memory Data Flow Techniques
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
Super-scalar.
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Data Prefetching Smruti R. Sarangi.
Adapted from the slides of Prof
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Lecture 10: ILP Innovations
Lecture 9: ILP Innovations
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
Presentation transcript:

CS6290 Speculation Recovery

Loose Ends Up to now: –Techniques for handling register dependencies Register renaming for WAR, WAW Tomasulo’s algorithm for scheduling RAW Branch prediction for control dependencies Still need to address: –Memory dependencies –Mispredictions, Faults, Exceptions

Speculative Fetch/Non-Spec Exec Fetch instructions, even fetch past a branch, but don’t exec right away fetchdecodeissueexecwritecommit 1. DIV R1 = R2 / R3 2. ADD R4 = R5 + R6 3. BEQZ R1, foo 4. SUB R5 = R4 – R3 5. MUL R7 = R5 * R2 6. MUL R8 = R7 * R3 IEWC

Branch Prediction/Speculative Execution When we hit a branch, guess if it’s T or NT ADD Branch SUB LOAD XOR STORE ADD MUL A B C D Q R …… TNT Guess T Sometime later, if we messed up… Keep scheduling and executing instructions as we knew the outcome was T Just throw it all out And fetch the correct instructions LOAD DIV ADD LOAD Branch SUB STORE

Again, with Speculative Execution Assume fetched direction is correct fetchdecodeissueexecwritecommit 1. DIV R1 = R2 / R3 2. ADD R4 = R5 + R6 3. BEQZ R1, foo 4. SUB R5 = R4 – R3 5. MUL R7 = R5 * R2 6. MUL R8 = R7 * R3 IEWC

Branch Misprediction Recovery br ARF RAT ARF state corresponds to state prior to oldest non-committed instruction As instructions are processed, the RAT corresponds to the register mapping after the most recently renamed instruction On a branch misprediction, wrong-path instructions are flushed from the machine ?!? The RAT is left with an invalid set of mappings corresponding to the wrong- path instruction state

Solution 1: Stall and Drain br ARF RAT ?!? Correct path instructions from fetch; can’t rename because RAT is wrong foo X ARF now corresponds to the state right before the next instruction to be renamed (foo) Allow all instructions to execute and commit; ARF corresponds to last committed instruction Reset RAT so that all mappings refer to the ARF Resume renaming the new correct- path instructions from fetch Pros: Very simple to implement Cons: Performance loss due to stalls

Solution 2: Checkpointing br ARF RAT At each branch, make a copy of the RAT (register mapping at the time of the branch) RAT On a misprediction: Checkpoint Free Pool 1. flush wrong-path instructions 2. deallocate RAT checkpoints 3. recover RAT from checkpoint foo 4. resume renaming

Speculative Execution is OK ROB maintains program order ROB temporarily stores results –If we screw something up, only the ROB knows, but no architected state is affected Register rename recovery makes sure we resume with correct register mapping If we mess up, we: 1.Can undo the effects 2.Know how to resume

Commit, Exceptions What happens if a speculatively executed instruction faults? A B C D E A, B Commit Divide by Zero! Outside world sees: A, B, fault! Should have been: A, B, C, D, fault! A B C D E Divide by Zero! Branch mispred W X Fault should never have happened!

Treat Fault Like a Result Regular register results written to ROB until –Instruction is oldest for in-order state update –Instruction is known to be non-speculative Do the same with faults! –At exec, make note of any faults, but don’t expose to outside world –If the instruction gets to commit, then expose the fault

Fault deferred until architecturally correct point. Other fault “never happened”… Example A B C D E LOAD R6 = 0[R7] LOADP1 LOAD R1 = 0[R2] ADD R3 = R1 + R4 SUB R1 = R5 – R6 DIV R4 = R7 / R1 ADDP2 SUBP3 DIVP4 LOADP5 imm P1 R5 R7 imm R2 R4 R6 P3 R7 ECF X XX X X Miss XX Fault! XX Divide by zero Resolved X X (commit) X (3 commits) Now raise fault Flush rest of ROB, Start fetching Fault handler

Executing Memory Instructions If R1 != R7, then Load R8 gets correct value from cache If R1 == R7, then Load R8 should have gotten value from the Store, but it didn’t! Load R3 = 0[R6] Add R7 = R3 + R9 Store R4  0[R7] Sub R1 = R1 – R2 Load R8 = 0[R1] Issue Cache Miss! IssueCache Hit! Miss serviced… Issue But there was a later load…

Out-of-Order Load Execution So don’t let loads execute out-of- order! A C B D G E F H ILP = 8/3 = 2.67 A C B D G E F H ILP = 8/7 = 1.14 Ok, maybe not a good idea. No support for OOO load execution can reduce your ILP

What Else Could Happen? A B foo bar D$ No problem. A B foo Some sort of data forwarding mechanism A B foo D$ Uh oh. Value from cache is stale Uh oh. A C foo B Should have used B’s store value Uh oh. A B foo bar Luckily, this usually can’t even happen

Memory Disambiguation Problem Why can’t this happen with non-memory insts? –Operand specifiers in non-memory insts are absolute “R1” refers to one specific location –Operand specifiers in memory insts are ambiguous “R1” refers to a memory location specified by the value of R1. As pointers change, so does this location.

Two Problems Memory disambiguation –Are there any earlier unexecuted stores to the same address as myself? (I’m a load) –Binary question: answer is yes or no Store-to-load forwarding problem –Which earlier store do I get my value from? (I’m a load) –Which later load(s) do I forward my value to? (I’m a store) –Non-binary question: answer is one or more instruction identifiers

Load Store Queue (LSQ) L0xF x L/S PCSeqAddrValue S0xF04C417740x S0xF x L0xF x L0xF x L0xF x33001 S0xF85C417790x32900 L0xF x L0xF x32900 L0xF63C417820x33001 Oldest Youngest 0x x x x33001 Data Cache Disambiguation: loads cannot execute until all earlier store addresses computed Forwarding: broadcast/search entire LSQ for match