Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michigan Technological University, Houghton MI

Similar presentations


Presentation on theme: "Michigan Technological University, Houghton MI"— Presentation transcript:

1 Michigan Technological University, Houghton MI
Cost Effective Memory Dependence Prediction Using Speculation Levels and Color Sets Soner Önder Michigan Technological University, Houghton MI

2 Outline Background Memory dependence prediction.
Pairing based approach. Store sets. Color sets Notion of color sets. Color set implementation. Color set predictor. Instruction window modifications. Experimental evaluation Basic policy. Aggressive policy.

3 Memory Dependence Prediction
Assume ST-2, ST-p and LD-s all access the same memory location. If we issue LD-s at this point in time, we’ll get a memory order violation. If we know Load Ld-s is dependent on Store St-p, we can issue the load at the right time. Seq. 1 2 3 p p+1 p+2 p+3 Instruction ST-1 ST-2 ST-3 ST-p ST-p+1 ST-p+2 LD-s Ready No Yes St-p

4 Dynamic Memory Disambiguation
Problem: In the presence of unresolved stores in the instruction window, which load(s) must be held? Ideal Solution: Wait only for the producer store. Simple Solutions: Wait for all - no speculation. Issue blindly - blind speculation.

5 Memory dependence prediction (Moshovos et al. 1997-1998)
Earlier work which mainly concentrated on predicting precise dependencies among pairs of load/store instructions : To enable early issuing of loads through memory dependence prediction. To streamline communication so that values can be directly passed from producers to consumers instead of through memory. Emphasis has been given to identifying the precise store instruction a load may depend on.

6 Store-set Memory Dependence Predictor (Chrysos & Emer - 1998)
A store set is the set of all stores a load has been observed to be dependent on. Initially employ blind speculation for loads. Upon memory order violation create a store set for the offending load and store. Next time the same load is encountered make the load wait until the store issues. Store set may contain multiple stores: chain the stores and make load dependant upon the last store.

7 Store-set Implementation
PC LFST SSID Dependence information is digested to create SETS of colliding instructions. Each set tells exactly which stores a load should wait for. Sufficiently large tables yield performance of an ORACLE.

8 Color Set predictor Instead of
predicting precise dependencies among pairs of loads/stores or constructing sets of store and load instructions which collided in the past, We assign the processor, load and store instructions various speculation levels (colors) and predict the speculation level (i.e.,the color) a load or store can be issued without a collision. Predictor size

9 Color Set predictor Since we only try to predict the speculation level, we expect to have: smaller storage for the predictor, better performance at smaller hardware budgets, faster implementations, power savings and more collisions.

10 So, it is something like this
00 01 10 11 Processor 00 01 10 11 Load The rules governing the color change:policies. We investigate two policies, a basic policy and an aggressive policy.

11 Load instruction selection
Eligible load instructions 00 01 10 11 Current processor color

12 Load instruction selection
Eligible load instructions 00 01 10 11 Current processor color

13 Load instruction selection
Eligible load instructions 00 01 10 11 Current processor color

14 Load instruction selection
Eligible load instructions 00 01 10 11 Current processor color

15 Instruction window extensions
Inhibit color Window details Global color 1 + + <= + + + 1 Issue? + + Instructions entering window

16 Collisions 01 load 01 store load store 01 10 00 01 10 11
Current processor color

17 Color Set Predictor Basic Policy
1. Basic policy gradually becomes aggressive when port utilization is low. 2. The load instruction is given a higher color and a store instruction given a lower color upon a collision. 3. Processor runs at the smaller of the current processor color and the color of the store instructions. 4. Rules 2 & 3 together runs the processor at a lower speculation level than the level the prior collision has occurred.

18 Color Set Predictor Aggressive Policy
1. Aggressive policy switches to maximum speculation level when port utilization is low. 2. The load instruction is given a higher color and a store instruction is specifically marked upon a collision. 3. Processor decrements the current processor color when a colliding store is detected. 4. As a result, the processor runs at the highest speculation level that won’t result in a collision and at a different color than the color it had during the collision.

19 Color Set Predictor Accessed early in the pipeline using L/S PC
Updated upon collision/successful speculation Basic Policy 00 No speculation 01 Level 1 10 Level 2 11 Level 3 L/S PC L/S color 10 Aggressive Policy 00 No speculation 01 Level 1 10 Level 2 11 Level 3/Colliding store

20 Processor’s colorful perspective
Basic policy When port utilization is low, the processor moves on to next color. Processor assumes the lowest ranking store’s color. 00 01 10 11 Low port utilization Colliding stores

21 Processor’s colorful perspective
Aggressive policy When a colliding store enters the window, the processor decrements its color. When port utilization is low, processor switches to red. 00 01 10 11 Low port utilization Colliding stores

22 Load instruction color states
Both policies 00 01 10 11 Collision Successful speculation

23 Simulation Framework Aggressive out-of-order superscalar processor:
8 instructions/cycle fetch/dispatch 16 instructions/cycle retire width 64 entry centralized reservation station 8 symmetric functional units Multi-block gshare fetch unit 2 memory ports r/w Perfect D-cache Simulated using cycle-accurate simulators generated automatically from ADL descriptions using the FAST system.

24 Performance Spec Fp Arithmetic Mean

25 Performance Spec Fp Harmonic Mean

26 Performance Spec Int Arithmetic Mean

27 Performance Spec Int Harmonic Mean

28 Individual benchmarks 128-Fp

29 Individual benchmarks 4096-Fp

30 Individual benchmarks 128-Int

31 Individual benchmarks 4096-Int

32 So ... Cost effective dependence prediction. Why does it work?
Design space: Number of colors/number of entries. Confidence mechanisms. Other policies. Power consumption Disable chunks of predictor and use basic policy; Enable and become aggressive.

33 Have a colorful evening
Soner Önder Michigan Technological University Antalya, Turkey


Download ppt "Michigan Technological University, Houghton MI"

Similar presentations


Ads by Google