CSE 502: Computer Architecture

Slides:



Advertisements
Similar presentations
CSE502: Computer Architecture Instruction Commit.
Advertisements

CS 6290 Instruction Level Parallelism. Instruction Level Parallelism (ILP) Basic idea: Execute several instructions in parallel We already do pipelining…
1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
CS6290 Speculation Recovery. Loose Ends Up to now: –Techniques for handling register dependencies Register renaming for WAR, WAW Tomasulo’s algorithm.
Lecture 7: Register Renaming. 2 A: R1 = R2 + R3 B: R4 = R1 * R R1 R2 R3 R4 Read-After-Write A A B B
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Lecture 13: Commit, Exceptions, Interrupts. Commit is typically the last stage of the pipeline Anything that an instruction does at this point is irrevocable.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
Processes and Virtual Memory
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
OOO Pipelines - II Smruti R. Sarangi IIT Delhi 1.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Translation Lookaside Buffer
CSL718 : Superscalar Processors
CS 6560: Operating Systems Design
Computer Architecture
Smruti R. Sarangi IIT Delhi
CSC 4250 Computer Architectures
Modeling Page Replacement Algorithms
CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions
Swapping Segmented paging allows us to have non-contiguous allocations
Lecture: Out-of-order Processors
CS5100 Advanced Computer Architecture Hardware-Based Speculation
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Module: Handling Exceptions
Sequential Execution Semantics
Lecture 6: Advanced Pipelines
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Smruti R. Sarangi IIT Delhi
ECE 2162 Reorder Buffer.
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Super-scalar.
Modeling Page Replacement Algorithms
Control unit extension for data hazards
Architectural Support for OS
CSE 451: Operating Systems Autumn 2005 Memory Management
Translation Buffers (TLB’s)
Adapted from the slides of Prof
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Translation Buffers (TLB’s)
Additional ILP Topics Prof. Eric Rotenberg
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Control Hazards Branches (conditional, unconditional, call-return)
Architectural Support for OS
Translation Buffers (TLBs)
Lecture 9: Dynamic ILP Topics: out-of-order processors
COMP755 Advanced Operating Systems
Conceptual execution on a processor which exploits ILP
Review What are the advantages/disadvantages of pages versus segments?
CMSC 611: Advanced Computer Architecture
Spring 2019 Prof. Eric Rotenberg
Presentation transcript:

CSE 502: Computer Architecture Instruction Commit

The End of the Road (um… Pipe) Commit is typically the last stage of the pipeline Anything an insn. does at this point is irrevocable Only actions following sequential execution allowed E.g., wrong path instructions may not commit They do not exist in the sequential execution

Everything Should Appear In-Order ISA defines program execution in sequential order To the outside, CPU must appear to execute in order When is someone looking?

“Looking” at CPU State When OS swaps contexts OS saves the current program state (requires “looking”) Allows restoring the state later When program has a fault (e.g., page fault) OS steps in and “looks” at the “current” CPU state

Implementation in the CPU ARF keeps state corresponding to committed insns. Commit from ROB happens in order ARF always contains some RF state of sequential execution Whoever wants to “look” should look in ARF What about insns. that executed out of order?

Only the Sequential Part Matterns LSQ PRF RS fPC ROB Memory PC Memory RF PC RF Sequential View of the Processor State of the Superscalar Out-of-Order Processor What if there’s no ARF?

View of the Unified Register File If you need to “see” a register, you go through the aRAT first. sRAT PRF ARF aRAT

View of Branch Mispredictions Wrong-path instructions are flushed… architected state has never been touched ROB LSQ PRF RS fPC PC Memory ARF Mispredicted Branch Fetch correct path instructions Most of the remaining slides should be review for anyone who’s taken a class on basic out-of-order processor organizations. Which can update the architected state when they commit

Committing Instructions (1/2) “Retire” vs. “Commit” Sometimes people use this to mean the same thing Sometimes they mean different things Check the context! Insn. commits by making “effects” visible (A)RF, Memory/$, PC

Committing Instructions (2/2) When an insn. executes, it modifies processor state Update a register Update memory Update the PC To make “effects” visible, core copies values Value from Physical Reg to Architected Reg Value from LSQ to memory/cache Value from ROB to Architected PC

Reduces cost, lowers IPC due to constraints. Blocked Commit To commit N insns. per cycle, ROB needs N ports (in addition to ports for dispatch, issue, exec, and WB) Can’t reuse ROB entries until all in block have committed. Can’t commit across blocks. ROB for four commits Four read ports inst 1 inst 2 inst 3 inst 4 One wide read port ROB inst 1 inst 2 inst 3 inst 4 Reduces cost, lowers IPC due to constraints.

Faults CPU maintains appearance of sequential execution Divide-by-Zero, Overflow, Page-Fault All occur at a specific point in execution (precise) Divide may have executed before other instructions due to OoO scheduling! DBZ! Trap? (when?) DBZ! Trap (resume execution) CPU maintains appearance of sequential execution

Timing of DBZ Fault Need to hold on to your faults On a fault, flush the machine and switch to the kernel ROB Architected State RS Let earlier instructions commit Exec: DBZ The arch. state is the same as just before the divide executed in the sequential order Now, raise the DBZ fault and when you switch to the kernel, everything appears as it should Just make note of the fault, but don’t do anything (yet)

Speculative Faults Faults might not be faults… ROB Branch Mispredict (flush wrong-path) DBZ! The fault goes away Which is what we want, since in a sequential execution, the wrong-path divide would not have executed (and faulted) Buffer faults until commit to avoid speculative faults

Store TLB miss can stall the core Timing of TLB Miss Store must re-execute (or re-commit) Cannot leave the ROB TLB miss Trap … Walk page-table, may find a page fault (resume execution) Re-execute store Store TLB miss can stall the core

Load Faults are Similar Load issues, misses in TLB When load is oldest, switch to kernel for page-table walk …could be painful; there are lots of loads Modern processors use hardware page-table walkers OS loads a few registers with PT information (pointers) Simple logic fetches mapping info from memory Requires page-table format is specified by the ISA

Asynchronous Interrupts Some interrupts are not associated with insns. Timer interrupt I/O interrupt (disk, network, etc…) Low battery, UPS shutdown When the CPU “notices” doesn’t matter (too much) Key Pressed

Two Options for Handling Async Interrupts Handle immediately Use current architected state and flush the pipeline Deferred Stop fetching, let processor drain, then switch to handler What if CPU takes a fault in the mean time? Which came “first”, the async. interrupt or the fault?

Store Retirement (1/2) Stores forward to later loads (for same address) Normally, LSQ provides this facility D$ st ld D$ 17 At commit, store Updates cache D$ ld 17 st st 33 ld ld After store has left the LSQ, the D$ can provide the correct value

Store Retirement (2/2) Can’t free LSQ Store entry until write is done Enables forwarding until loads can get value from cache Have to re-check TLB when doing write TLB contents at Execute were speculative Store may stall commit for a long time If there’s a cache miss If there’s a TLB miss (with HW TLB walk) store All instructions may have successfully executed, but none can commit! Don’t we check DTLB during store-address computation anyway? Do we need to do it again here?

Writeback Buffer (1/2) Usually fast, but potential structural hazard Want to get stores out of the way quickly Even if store misses in cache, entering WB buffer counts as committing. Allows other insns. to commit. D$ ld store ld store WB buffer is part of the cache hierarchy. May need to provide values to later loads. WB Buffer Eventually, the cache update occurs, the WB buffer entry is emptied. Cache can now provide the correct value. Usually fast, but potential structural hazard

No one can “see” this store anymore! Writeback Buffer (2/2) Stores enter WB Buffer in program order Multiple stores can exist to same address Only the last store is “visible” No one can “see” this store anymore! Addr Value Load 42 next to write to cache 42 1234 oldest 13 -1 Store 42 8 90901 5678 42 youngest Load 42

Write Combining Buffer (1/2) Augment WBB to combine writes together Addr Value Only one writeback Now instead of two Load 42 42 5678 42 1234 Store 42 Load 42 If Stores to same address, combine the writes

Write Combining Buffer (2/2) Can combine stores to same cache line One cache write can serve multiple original store instructions $-Line Addr Cache Line Data 80 1234 5678 Store 84 Aggressiveness of write-combining may be limited by memory ordering model Benefit: reduces cache traffic, reduces pressure on store buffers Writeback/combining buffer can be implemented in/integrated with the MSHRs Only certain memory regions may be “write-combinable” (e.g., USWC in x86)

No WBB and no stall on Store commit Senior Store Queue Use STQ as WBB (not necessarily write combining) STQ Store DL1 L2 Store STQ tail STQ head Store STQ head Store STQ head Store Store Store While stores are completing, other accesses (loads, etc…) can continue getting the values from the “senior” STQ Store STQ tail Store Store New stores cannot allocate into Senior STQ entries until stores complete No WBB and no stall on Store commit

Retire Insn. retires by cleaning up all related state Besides updating architected state … needs to deallocate resources ROB/LSQ entries Physical register “colors” of various sorts RAT checkpoints Most are FIFO’s or Queues Alloc/dealloc is usually just inc/dec head/tail pointers Unified PRF requires a little more work Have to return “old” mapping to free list