Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2019 Prof. Eric Rotenberg

Similar presentations


Presentation on theme: "Spring 2019 Prof. Eric Rotenberg"— Presentation transcript:

1 Spring 2019 Prof. Eric Rotenberg
ECE 721 Trace Processors Spring 2019 Prof. Eric Rotenberg

2 Trace Processor Trace Predictor Trace Cache Global Rename
Branch Predictor I-cache Trace Cache Pre-rename Global Rename Processing Element (PE) GRF (copy) GRF (copy) GRF (copy) GRF (copy) LRF Function Units LRF LRF LRF ECE 721, Spring 2019 Prof. Eric Rotenberg

3 Trace Processor Distribute Issue Queue and Execution Lanes among multiple PEs: Simplifies select logic (only instructions considered for issuing to a small number of lanes) Exploit value hierarchy in traces to simplify: Register file Register rename logic Bypasses and wakeup ports ECE 721, Spring 2019 Prof. Eric Rotenberg

4 Value Hierarchy Local values Global values
Values produced and consumed within a trace Global values Values communicated among traces ECE 721, Spring 2019 Prof. Eric Rotenberg

5 Live-ins, Live-outs Live-in value Live-out value
Global value produced by a previous trace and consumed by this trace Live-out value Global value produced by this trace and (possibly) consumed by later traces ECE 721, Spring 2019 Prof. Eric Rotenberg

6 Local Only Purely local value
A value produced in this trace, consumed in this trace, and then killed in this trace trace begin r5 Purely local value (not consumed by later traces) kill r5 r5 r5 trace end ECE 721, Spring 2019 Prof. Eric Rotenberg

7 Local and Live-out Local & live-out
A value produced in this trace, consumed in this trace, but not killed in this trace trace begin r5 Local value, may be consumed by later traces r5 trace end ECE 721, Spring 2019 Prof. Eric Rotenberg

8 Hierarchical Register File
Single global register file (GRF) Holds all global values Multiple local register files (LRF) One per PE Holds local values of the trace allocated to the PE ECE 721, Spring 2019 Prof. Eric Rotenberg

9 Reducing Register File Complexity
GRF less complex than monolithic register file Purely local values not held in GRF Reduces number of registers in GRF LRFs off-load much of the read and write traffic => GRF can have fewer read and write ports ECE 721, Spring 2019 Prof. Eric Rotenberg

10 Pre-renaming Traces Pre-renaming
Check dependencies among instructions in a trace when it is first built Pre-rename local values to registers in the LRF ECE 721, Spring 2019 Prof. Eric Rotenberg

11 Reducing Renaming Complexity
Rename Stage is now the Global Rename Stage Only rename global values (live-ins & live-outs) to the GRF Two key simplifications to Global Rename Stage No need to check for dependencies within the trace and bypass free list mappings from producers to consumers within the trace. These are local values, which were analyzed and pre-renamed to the LRF when the trace was constructed and filled into Trace Cache. Global RMT has fewer read ports (live-ins only) and write ports (live-outs only) ECE 721, Spring 2019 Prof. Eric Rotenberg

12 Example: Pre-renaming the Trace
Original trace Pre-renamed trace (stored in Trace Cache) add r3, r1, r2 add {--,L0}, r1, r2 add r3, r1, r3 add {--,L1}, r1, L0 add r5, r3, r5 add {--,L2}, L1, r5 add r3, r5, #1 add {--,L3}, L2, #1 add r5, r6, r2 add {r5,L4}, r6, r2 add r3, r5, r3 add {r3,--}, L4, L3 Key for pre-renamed logical destination registers: {--,Ly}: local only {rx,--}: liveout only {rx,Ly}: liveout and local global live-in (GLI) registers: r1, r2, r5, r6 global live-out (GLO) registers: r3, r5 ECE 721, Spring 2019 Prof. Eric Rotenberg

13 Example (cont.) Global Rename Map Table Global Rename Map Table
GLI(r1) r1 p29 r1 p29 GLI(r2) r2 p31 r2 p31 r3 r3 p9 r4 r4 GLI(r5) r5 p17 r5 p11 GLI(r6) r6 p24 r6 p24 r31 r31 p24 p9 p11 p17 Global Free List p31 p11 GLO(r3) p29 p9 GLO(r5) ECE 721, Spring 2019 Prof. Eric Rotenberg

14 Bypass Complexity Bypass complexity
Forward value from any execution lane to any other execution lane With many lanes: Long wires (must span all the lanes) Wires are heavily loaded (tapped by each lane) many:1 MUX within each lane Two conventional options available to monolithic superscalar processor Increase cycle time to allow for slow bypasses, or Producers and consumers can’t execute in consecutive cycles Trace processor exploits value hierarchy for a better compromise ECE 721, Spring 2019 Prof. Eric Rotenberg

15 A Good Compromise w.r.t. Bypasses
Local bypasses Fast: Producer and consumer execute in consecutive cycles Global bypasses Slow: Several cycles to bypass value from producer to consumer This is a good compromise Fast clock Some values, but not all values, are slow to bypass ECE 721, Spring 2019 Prof. Eric Rotenberg

16 Number of Global Bypasses
Number of global bypasses should be determined empirically and depends on live-out traffic Number of global bypasses affects: Number of GRF write ports Number of additional wakeup ports in PE’s issue queue Should be favorable compared to monolithic superscalar processor A PE’s issue queue has fewer wakeup ports than the issue queue of a monolithic superscalar # wakeup ports = # execution lanes in PE + # global bypasses ECE 721, Spring 2019 Prof. Eric Rotenberg

17 Processing Element (PE)
Trace Predictor Branch Predictor I-cache Trace Cache Pre-rename Global Rename Processing Element (PE) Local Register File Local Register File Local Register File Local Register File Function Units Global Register File ECE 721, Spring 2019 Prof. Eric Rotenberg

18 Processing Element (PE)
Trace Predictor Branch Predictor I-cache Trace Cache Pre-rename Global Rename Processing Element (PE) GRF (copy) GRF (copy) GRF (copy) GRF (copy) LRF Function Units LRF LRF LRF ECE 721, Spring 2019 Prof. Eric Rotenberg

19 Trace-Level Sequencing
Trace prediction Trace fetch Trace rename Trace dispatch Trace completion Trace retirement ECE 721, Spring 2019 Prof. Eric Rotenberg

20 Trace Prediction Trace predictor Predicts the next trace
Produces one trace id per cycle Trace id uniquely identifies trace Start PC Bit vector indicating directions (taken/not-taken) of embedded branches Start PC ECE 721, Spring 2019 Prof. Eric Rotenberg

21 Trace Predictor Tidn ... Tid1 Tid0 trace id Hash Function
predicted trace id to T$ Hash Function ECE 721, Spring 2019 Prof. Eric Rotenberg

22 Trace Fetch Lookup predicted trace id in T$ T$ hit T$ miss
Send pre-renamed trace to Trace Rename Stage T$ miss Use predicted trace id to fetch basic blocks from I$ Trace build takes multiple cycles After trace is built, pre-rename it ECE 721, Spring 2019 Prof. Eric Rotenberg

23 Trace Cache Miss Can’t send instructions directly from instruction cache Must package instructions into a trace, pre-rename the trace, and send the trace down the pipeline as a single unit Why? No dependence checking logic in rename stage. ECE 721, Spring 2019 Prof. Eric Rotenberg

24 Step 1 Step 2 Tid Tid Tid Trace Predictor Tid Trace Cache miss
I-cache BTB I-cache BTB I-cache BTB Tid Trace Cache miss Stall flow of instructions into rename stage ECE 721, Spring 2019 Prof. Eric Rotenberg

25 Step 3 Step 4 cont... Trace Predictor Trace Cache Next Tid
pre-rename logic Trace Cache hit Resume to Trace Rename Stage ECE 721, Spring 2019 Prof. Eric Rotenberg

26 Trace Rename Steps Rename globals to GRF
Live-ins: Get mappings from Global Rename Map Table Live-outs: Pop free registers from Global Free List, update Global Rename Map Table ECE 721, Spring 2019 Prof. Eric Rotenberg

27 Trace Dispatch Steps Allocate entry at tail of Trace-Level Active List
Entry holds current mappings of all live-outs in the trace Dispatch trace to a free PE ECE 721, Spring 2019 Prof. Eric Rotenberg

28 Trace Completion Steps
Wait for all instructions in the trace to complete Set “complete” bit in corresponding entry in Trace- Level Active List Free the PE for use by another trace ECE 721, Spring 2019 Prof. Eric Rotenberg

29 Trace Retirement Steps
Wait for entry at head of Trace-Level Active List to have its “complete” bit set Commit/free mappings of live-outs Free previous mappings of live-outs Get previous mappings from Global Architectural Map Table Add them back to Global Free List Commit current mappings of live-outs Get current mappings from Trace-Level Active List Write them into the Global Arch. Map Table Advance head pointer of Trace-Level Active List ECE 721, Spring 2019 Prof. Eric Rotenberg

30 Exceptions & Branch Mispredictions
Can’t back up to middle of trace Values before the exception/branch that were thought to be local may change to global These values must make it into the global register file to be visible to later traces Must squash, re-construct, and re-execute entire trace ECE 721, Spring 2019 Prof. Eric Rotenberg

31 Handling Exceptions Steps
Post exception: Set “exception” bit in Trace-Level Active List Wait until trace reaches head of Trace-Level Active List Squash all later traces Squash all instructions in the trace, even those before the instruction that caused exception ECE 721, Spring 2019 Prof. Eric Rotenberg

32 Handling Exceptions (cont.)
Steps (cont.) Restore Global Rename Map Table from Global Arch. Map Table Construct a modified trace Same trace except terminate it early, just before instruction that caused exception Pre-rename this modified trace Re-execute and commit the modified trace Now ready to service the exception ECE 721, Spring 2019 Prof. Eric Rotenberg

33 Branch Handling Checkpoint Global Rename Map Table between traces
If all branches in a trace resolve without mispredictions: Free corresponding shadow map ECE 721, Spring 2019 Prof. Eric Rotenberg

34 Branch Handling (cont.)
If branch misprediction is detected within a trace: Squash all later traces Squash all instructions in the trace, even those before the mispredicted branch Restore Global Rename Map Table from corresponding shadow map table This restores mappings to what they were before renaming this trace ECE 721, Spring 2019 Prof. Eric Rotenberg

35 Branch Handling (cont.)
Construct a modified trace Same start PC, follow different path after mispredicted branch Fall back to conventional branch predictor, I$, etc. Pre-rename this modified trace Re-dispatch the trace Execution may or may not reveal other mispredictions in the trace... ECE 721, Spring 2019 Prof. Eric Rotenberg


Download ppt "Spring 2019 Prof. Eric Rotenberg"

Similar presentations


Ads by Google