Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 13: Commit, Exceptions, Interrupts. Commit is typically the last stage of the pipeline Anything that an instruction does at this point is irrevocable.

Similar presentations


Presentation on theme: "Lecture 13: Commit, Exceptions, Interrupts. Commit is typically the last stage of the pipeline Anything that an instruction does at this point is irrevocable."— Presentation transcript:

1 Lecture 13: Commit, Exceptions, Interrupts

2 Commit is typically the last stage of the pipeline Anything that an instruction does at this point is irrevocable –therefore, only actions that correspond to a sequential execution can be allowed to pass this point –E.g., wrong path instructions may not commit because they do not exist in the sequential execution Lecture 13: Commit, Exceptions, Interrupts 2

3 The ISA defines program execution with respect to the sequential order of the instructions To the outside world, the CPU must appear to execute in-order What does it mean to “appear”? –… when someone looks –ok, so what does it mean to “look”? Lecture 13: Commit, Exceptions, Interrupts 3

4 Examples –when the OS task swaps a program out, it copies the current program state (requires “looking”) so the program can be resumed later –when a program has a fault (e.g., page fault), again the OS usually steps in and it needs to “look” at the “current” CPU state Lecture 13: Commit, Exceptions, Interrupts 4

5 Divide-by-Zero –only on a divide Page fault –only on a load or store? –can occur anytime if instruction fetch is on an unmapped page OS Task/Process Scheduling –anytime Lecture 13: Commit, Exceptions, Interrupts 5

6 So what does it mean to have a proper state? –one that corresponds to sequential execution For a superscalar processor with degree N, the commit stage must be able to “retire” at least N instructions per cycle –retirement includes updating processor state in the same fashion that an instruction would for sequential execution Lecture 13: Commit, Exceptions, Interrupts 6

7 7 A A A A B B A A B B C C A A B B C C D D A A B B C C D D E E A A B B C C D D E E F F A A B B C C D D E E F F G G A A B B C C D D E E F F G G H H A A B B C C A A B B C C D D E E A A B B C C D D E E F F G G H H Each “state” in the superscalar machine always corresponds to one state of the scalar machine (but not necessarily the other way around), and the ordering of states is preserved Scalar Commit Processor States Superscalar Commit Processor States

8 The architected register file contains the state of the machine corresponding only to the committed instructions –commit happens in-order, so the ARF states always correspond to some RF state of a sequential execution –… this means anyone who wants to “look” will have to look in the ARF –What about the other instructions that have executed out of order? Lecture 13: Commit, Exceptions, Interrupts 8

9 9 LSQ PRF RS fPC ROB PC Memory Sequential View of the Processor PC Memory RF State of the Superscalar Out-of-Order Processor Just ignore everything else! Just ignore everything else! RF

10 “Retire” vs. “Commit” –sometimes people use this to mean the same thing –sometimes they mean different things check the context! An instruction commits by making its “effects” known/visible to the architected state –architected state: (A)RF, Memory/$, PC, other regs –speculative state: everything else (ROB, RS, LSQ, etc.) Lecture 13: Commit, Exceptions, Interrupts 10

11 When an instruction executes, it modifies processor state –update a register –update memory –update the PC (almost all instructions do this) To “make appear”, the processor just copies values –copy value from Physical Reg to Architected Reg –copy value from LSQ to memory/cache –copy value from ROB to Architected PC Lecture 13: Commit, Exceptions, Interrupts 11

12 To commit N instructions per cycle, the ROB needs to support N read ports –(in addition to other ports such as for reading from the PRF, write ports for allocation, and write ports for updates/writebacks) Lecture 13: Commit, Exceptions, Interrupts 12 inst 1 inst 2 inst 3 inst 4 Four read ports for four commits ROB inst 1 inst 2 inst 3 inst 4 One wide read port ROB Can’t reuse (reallocate) any of these ROB entries until all have committed Helps to keep ROB port requirements under control, at cost of slight IPC loss due to occasional alloc stalls

13 If any N instructions can commit per cycle, may require heavy multi-porting of other structures –Stores: need N extra DL1 write ports, N extra DTLB read ports –Branches: need N branch predictor update ports; also need to deallocate N RAT checkpoints –Others: if processor has memory dependence predictor, then may need N update ports for loads Solution: limit maximum number of commits per cycle of special types of instructions (e.g., max one branch per cycle) Lecture 13: Commit, Exceptions, Interrupts 13 Don’t we need to check DTLB during store-address computation anyway? Do we need to do it again here? Don’t we need to check DTLB during store-address computation anyway? Do we need to do it again here?

14 ROB contains uops, but outside world only knows about macro-ops Lecture 13: Commit, Exceptions, Interrupts 14 uop 1 (SUB) uop 1 (LD) uop 2 (ADD) ROB uop 1 uop 2 uop 3 uop 4 uop 5 uop 6 uop 1 (ADD) commit ADD EAX, EBX SUB EBX, ECX ADD EDX, [EAX] commit POPA ???? uop 7 uop 8 If we take an interrupt right now, we’ll see a half-executed instruction!

15 Works fine when uop-flow length ≤ commit width What to do with long flows? –In all cases: can’t commit until all uops in a flow have completed –Just commit N uops per cycle –... but make commit uninteruptable Lecture 13: Commit, Exceptions, Interrupts 15 ROB uop 1 uop 2 uop 3 uop 4 uop 5 uop 6 uop 7 uop 8 commit POPA Timer interrupt! Defer: Can’t act on this yet... Now do something about the interrupt.

16 Ex. REP STOSB (memset with EAX’s value) –The entire sequence is effectively one x86 instruction –What if this REP’s for 1,000,000,000 iterations? Does the CPU “lock up” for a billion or so cycles while we wait for the entire instruction to commit? We also can’t wait until the entire instruction has completed to commit, because we can’t even fetch the entire instruction! At the ISA level, REP iterations are interruptable... –so treat each iteration as a separate “macro-op” Lecture 13: Commit, Exceptions, Interrupts 16

17 MOV EDI, ; the array we want to memset SUB EAX, EAX; zero CLD; clear direction flag (REP fwd) MOV ECX, 4; do for 100 iterations REP STOSB; memset! ADD EBX, EDX; unrelated instruction Lecture 13: Commit, Exceptions, Interrupts 17 MOV EDI, xxx SUB EAX, EAX CLD MOV ECX, 4 uCMP ECX, 0 uJCZ STA tmp, EDI STD EAX, tmp SUB ECX, 1 ADD EDI, 1 uCMP ECX, 0 uJCZ STA tmp, EDI STD EAX, tmp SUB ECX, 1 ADD EDI, 1 uCMP ECX, 0 uJCZ STA tmp, EDI STD EAX, tmp SUB ECX, 1 ADD EDI, 1 uCMP ECX, 0 uJCZ STA tmp, EDI STD EAX, tmp SUB ECX, 1 ADD EDI, 1 uCMP ECX, 0 uJCZ ADD EBX, EDX Check for zero iterations (could happen with MOV ECX, 0 ) MOVS flow REP overhead for 1 st iteration MOVS flow REP overhead for 2nd iter. MOVS flow REP overhead for 3 rd iter MOVS flow REP 4 th iter All of these are interruptible points (commit can stop and effects be seen by outside world), since they all have well- defined ISA-level states: A: ECX=3, EDI = ptr+1 B: ECX=2, EDI = ptr+2 C: ECX=1, EDI = ptr+3 D: ECX=0, EDI = ptr+4 A: B: C: D:

18 Stores need to forward values to later loads from the same address –Normally, LSQ provides this facility Lecture 13: Commit, Exceptions, Interrupts 18 ld 17 st ld 33 st ld D$ 17 At commit, store Updates cache st ld After store has left the LSQ, the D$ can provide the correct value D$

19 Store shouldn’t vacate LSQ entry until actual writeback has completed –this way it can continue to forward values until it is guaranteed that later loads can get the value from the cache This can stall commit –what if there’s a cache miss? –what if there’s a TLB miss? Lecture 13: Commit, Exceptions, Interrupts 19 store All instructions may have successfully executed, but none can commit! All instructions may have successfully executed, but none can commit!

20 Want to get stores out of the way Lecture 13: Commit, Exceptions, Interrupts 20 store D$ store WB Buffer ld Even if the store misses in cache, entering the WB buffer counts as committing. This allows other insts to commit as well. The WB buffer is part of the cache hierarchy now. It may need to provide values to later loads Eventually, the cache update occurs, the WB buffer entry is emptied ld The cache can now provide the correct value

21 The WBB is part of the “Architected” state Lecture 13: Commit, Exceptions, Interrupts 21 ld D$ WBB Loads must check both structures for a hit –more routing to get request to both places –yet another possible bypass source –also want to normalize latencies for easier scheduling st cache coherence interface cache coherence interface As an “official” write, the store must also be visible to other processors As an “official” write, the store must also be visible to other processors To other CPUs Processor stalls if there are no available writeback buffer entries Processor stalls if there are no available writeback buffer entries

22 Stores enter WB Buffer in program order If there are multiple stores to the same address, only the last one is “visible” Lecture 13: Commit, Exceptions, Interrupts 22 1234 42 AddrValue 13 90901 8 8 oldest youngest next to write to cache Load 42 5678 42 Store 42 Load 42 No one can “see” this store anymore!

23 Augment WBB to combine writes together Lecture 13: Commit, Exceptions, Interrupts 23 1234 42 AddrValue Load 42 5678 42 Store 42 Load 42 Only one writeback Now instead of two If writing to same address, just combine the writes (similar to writeback cache)

24 Can combine stores to same cache line Lecture 13: Commit, Exceptions, Interrupts 24 1234 80 $-Line Addr Cache Line Data Store 84 One cache write can serve multiple original store instructions Benefit: reduces cache traffic, reduces pressure on store buffers Benefit: reduces cache traffic, reduces pressure on store buffers 5678 Aggressiveness of write-combining may be limited by memory ordering model Writeback/combining buffer can be implemented in/integrated with the MSHRs Only certain memory regions may be “write-combinable” (e.g., USWC in x86)

25 Use STQ as WBB (not necessarily write combining) Lecture 13: Commit, Exceptions, Interrupts 25 STQ Store STQ head STQ tail DL1 L2 STQ head While stores are completing, other accesses (loads, snoops) can continue getting the values from the “senior” STQ Store STQ tail New stores cannot allocate into Senior STQ entries until stores complete Can continue allocating stores Benefit: don’t need separate WBB, but we don’t need to stall commit on stores either

26 Many stores write the same value to memory that’s already there Lecture 13: Commit, Exceptions, Interrupts 26 Store 42 This is a “silent store” D$ Ideally, this store shouldn’t have to happen Silent Stores are another way to exploit value locality; they are like the “dual” of value-predictable loads that always (frequently) load the same value Silent Stores are another way to exploit value locality; they are like the “dual” of value-predictable loads that always (frequently) load the same value

27 Store executes as a Load Commit WB Execute Schedule Dispatch Rename Decode Fetch Pro: stores can commit faster (reduces commit-time cache port pressure) Con: more total accesses –every store converted to load –non-silent still has to store Lecture 13: Commit, Exceptions, Interrupts 27 D$ = = data hit Check load value against store data for equality If equal, then store is silent, suppress store at commit Silent?

28 Besides updating architected state, commit needs to deallocate microarchitecture resources –ROB/LSQ entries –Physical register –“colors” of various sorts if used –RAT checkpoints Most are FIFO’s or Queues, so alloc/dealloc is usually just inc/dec head/tail pointers Unified PRF requires a little more work Lecture 13: Commit, Exceptions, Interrupts 28

29 … but not necessary Necessary conditions for retirement: 1.Finished –result produced as defined by ISA semantics 2.No WAR hazard –in the architected register file 3.No unresolved branches 4.No unresolved exceptions 5.No memory ordering violations 6.No memory consistency problems Lecture 13: Commit, Exceptions, Interrupts 29

30 Lecture 13: Commit, Exceptions, Interrupts 30 % of ROB None Not Finished Earlier Branch WAR Combination Earlier agen Many instructions could commit (all requirements satisfied) but don’t due to in-order constraint Many instructions could commit (all requirements satisfied) but don’t due to in-order constraint Figure from Bell and Lipasti, “Deconstructing Commit,” in ISPASS’04

31 Lecture 13: Commit, Exceptions, Interrupts 31

32 Lecture 13: Commit, Exceptions, Interrupts 32 LSQ PRF RS fPC ROB PC Memory ARF If you want to peek in, all you get to see are the (arch) register file, the PC and the cache state (incl. WB buffers) But what if there’s no ARF?

33 Lecture 13: Commit, Exceptions, Interrupts 33 PRFsRAT aRAT If you need to “see” a register, you go through the aRAT first. ARF

34 Lecture 13: Commit, Exceptions, Interrupts 34 ROB LSQ PRF RS fPC PC Memory ARF Mispredicted Branch Wrong-path instructions are flushed… architected state has never been touched Fetch correct path instructions Which can update the architected state when they commit

35 Div-by-Zero, Overflow, Page-Fault All occur at a specific point in execution (precise) Lecture 13: Commit, Exceptions, Interrupts 35 CPU must maintain the appearance of sequential execution DBZ! Trap (resume execution) Divide may have executed before other instructions due to OOO scheduling! DBZ! Trap? (when?)

36 Need to hold on to your faults Lecture 13: Commit, Exceptions, Interrupts 36 RS ROB Exec: DBZ Architected State Architected State Let earlier instructions commit Now, the arch. state is the same as just before the divide executed in the sequential order Now, raise the DBZ fault and when you switch to the kernel, everything appears as it should On a fault, flush the machine and switch to the kernel Just make note of the fault, but don’t do anything (yet)

37 Faults might not be faults… Lecture 13: Commit, Exceptions, Interrupts 37 ROB DBZ! Branch Mispredict The fault goes away, too Which is what we want since in a sequential execution, the wrong-path divide would not have executed (and faulted) Which is what we want since in a sequential execution, the wrong-path divide would not have executed (and faulted) (flush wrong-path) Buffering faults until commit filters out speculative faults Buffering faults until commit filters out speculative faults

38 Store must re-execute (or re-commit, really), but this means it cannot leave the ROB Store TLB miss can stall the processor Lecture 13: Commit, Exceptions, Interrupts 38 TLB miss Trap (resume execution) … Walk page-table, may find a page fault Re-execute store

39 Load issues, misses in TLB Must wait! When load is oldest, allow switch to kernel to do page-table walk …could be painful; there are lots of loads Some processors have hardware page-table walkers –OS loads a few registers with PT information (pointers) and then simple logic fetches mapping info from memory Lecture 13: Commit, Exceptions, Interrupts 39 OS must support particular PT format to make use of HW page-table walker OS must support particular PT format to make use of HW page-table walker

40 Some interrupts are not associated with any particular instruction –timer interrupt –disk, network interrupts (I/O) –low battery, UPS shutdown When the CPU “notices” doesn’t matter (too much) Lecture 13: Commit, Exceptions, Interrupts 40 Key Pressed Key Pressed Key Pressed

41 Handle right away –Just use the current architected state, and flush the pipeline Deferred –Stop fetching, let the processor drain, then switch to the interrupt handler What if CPU takes a fault in the mean time? Which came “first”, the External interrupt or the fault? Lecture 13: Commit, Exceptions, Interrupts 41


Download ppt "Lecture 13: Commit, Exceptions, Interrupts. Commit is typically the last stage of the pipeline Anything that an instruction does at this point is irrevocable."

Similar presentations


Ads by Google