Presentation is loading. Please wait.

Presentation is loading. Please wait.

EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3.

Similar presentations


Presentation on theme: "EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3."— Presentation transcript:

1 EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

2 Optimizing CPU Performance Golden Rule: t CPU = N inst *CPI*t CLK Given this, what are our options –Reduce the number of instructions executed –Reduce the cycles to execute an instruction –Reduce the clock period Our first focus: Reducing CPI –Approach: Instruction Level Parallelism (ILP)

3 Why ILP? Vs. Requirements –Parallelism –Large window –Limited control deps –Eliminate “false” deps –Find run-time deps

4 How Much ILP is There?

5 How Large Must the “Window” Be?

6 ALU Operation GOOD, Branch BAD Expected Number of Branches Between Mispredicts E(X) ~ 1/(1-p) E.g., p = 95%, E(X) ~ 20 brs, 100-ish insts

7 How Accurate are Branch Predictors?

8 Impact of Physical Storage Limitations Each instruction “in flight” must have storage for its result –Really worse than this because of mispeculation…

9 Registers GOOD, Memory BAD Benefits of registers –Well described deps –Fast access –Finite resource Memory loses these benefits for flexibility *p = … *q = … … = *p ?

10 “Bottom Line” for an Ambitious Design

11 First Optimization: Out-of-Order Writeback

12 Playing by the Rules: In-order Writeback DIV.D ADD IFIDD1D2D3D4MEMWB IFIDEXMEMWB D5

13 Playing by the Rules: In-order Writeback DIV.D ADDIFIDEXMEMWB What’s wrong with this picture? Divide by Zero! IFIDD1D2D3D4MEMWBD5

14 Playing by the Rules: In-order Writeback DIV.D ADDIFIDEXMEMWB What’s wrong with this picture? Divide by Zero! IFIDD1D2D3D4MEMWBD5 DIV.D ADDIFIDEXMEMWB IFIDD1D2D3D4MEMWBD5 stall

15 Another Way to Get in the Same Mess Many systems use microcode –Simplifies mapping of complex instructions to CPU resources iA32 add-with-carry –ADC (EAX),EBX tmp = MEM[EAX] tmp = tmp + EBX+CF, update CF MEM[EAX] = tmp Side Effect! Potential Fault!

16 Exceptions and Interrupts Exception Type Sync/AsyncMaskable?Restartable? I/O requestAsyncYes System callSyncNoYes BreakpointSyncYes OverflowSyncYes Page faultSyncNoYes Misaligned access SyncNoYes Memory ProtectSyncNoYes Machine CheckAsync/SyncNo Power failureAsyncNo

17 Solution: Precise Interrupts Implementation approaches –Don’t E.g., Cray-1 –Force in-order WB E.g., ARM SA-1 –Force in-order checks E.g., Alpha 21064 –Buffer speculative results E.g., P4, Alpha 21264 History buffer Future file/Reorder buffer Instructions Completely Finished No Instruction Has Executed At All PC Precise State Speculative State

18 MEM Precise Interrupts via the Reorder Buffer @ Alloc –Allocate result storage at Tail @ Sched –Get inputs (ROB T-to-H then ARF) –Wait until all inputs ready @ WB –Write results/fault to ROB –Indicate result is ready @ CT –Wait until inst @ Head is done –If fault, initiate handler –Else, write results to ARF –Deallocate entry from ROB IFID AllocSched EX ROB CT HeadTail PC Dst regID Dst value Except? Reorder Buffer (ROB) –Circular queue of spec state –May contain multiple definitions of same register In-order Any order ARF

19 Reorder Buffer Example Code Sequence f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2 Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 ROB Time HT regID: f1 result: ? Except: ? HT regID: f1 result: ? Except: ? regID: r3 result: ? Except: ? HT regID: f1 result: ? Except: ? regID: r3 result: 11 Except: N regID: r4 result: ? Except: ? r3 regID: r8 result: 2 Except: n regID: r8 result: 2 Except: n regID: r8 result: 2 Except: n

20 Reorder Buffer Example Code Sequence f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2 Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 ROB Time HT regID: f1 result: ? Except: ? regID: r3 result: 11 Except: n regID: r4 result: 5 Except: n HT regID: f1 result: ? Except: y regID: r3 result: 11 Except: n regID: r4 result: 5 Except: n regID: r8 result: 2 Except: n regID: r8 result: 2 Except: n HT regID: f1 result: ? Except: y regID: r3 result: 11 Except: n regID: r4 result: 5 Except: n

21 Reorder Buffer Example Code Sequence f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2 Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 ROB Time HT HT first inst of fault handler


Download ppt "EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3."

Similar presentations


Ads by Google