Presentation is loading. Please wait.

Presentation is loading. Please wait.

Handling Stores and Loads

Similar presentations


Presentation on theme: "Handling Stores and Loads"— Presentation transcript:

1 Handling Stores and Loads
Three key principles Commit stores to memory in program order Dynamic memory disambiguation: determine which store a load depends on Store-load forwarding: forward store value to dependent load All three are facilitated by the Store Queue (SQ) Complication Also need a Load Queue (LQ) to detect mispredicted loads A mispredicted load is a load that executed OOO with respect to a prior store on which it depends. ECE 721, Spring 2019 Prof. Eric Rotenberg

2 Examples: All prior store addresses known
Scenario 1 store B store C load A Scenario 2 store A store C load A Scenario 3 store A load A Get data from D$. Get data from 1st store. Get data from 2nd store. ECE 721, Spring 2019 Prof. Eric Rotenberg

3 Examples: Unknown Prior Store Addresses
Scenario 4 store ? store A load A Scenario 5 store B store ? load A Scenario 6 store A store ? load A Get data from 2nd store. Get data from D$. (speculative) Get data from 1st store. (speculative) ECE 721, Spring 2019 Prof. Eric Rotenberg

4 Load/Store Execution Lane
AGEN unit for computing load and store addresses Three structures L1 D$ (and L1 D-TLB) Store Queue (SQ): contains all active stores in program order Stores are speculative until they reach head of Active List SQ commits stores to D$ non-speculatively and in-order Loads search SQ for store values on which they depend Load Queue (LQ): contains all active loads in program order Loads may execute out-of-order with respect to prior stores Executed load gets wrong value if it depends on an older store that hasn’t executed yet Stores search LQ for mispredicted loads ECE 721, Spring 2019 Prof. Eric Rotenberg

5 The store will get the following indices at dispatch time:
SQ_index = SQ_tail: The store’s entry in the SQ. When the store executes later, it uses SQ_index to place its address and value into the SQ. In turn, these are needed for store-load forwarding and committing stores. LQ_index = LQ_tail: Index of first load after the store, in program order. When the store executes later, it searches the LQ for mispredicted loads: loads after the store, in program order, that depend on the store but executed before the store. Loads between LQ_index and LQ_tail are after the store in program order. ECE 721, Spring 2019 Prof. Eric Rotenberg

6 The load will get the following indices at dispatch time:
LQ_index = LQ_tail: The load’s entry in the LQ. When the load executes later, it uses LQ_index to place its address into the LQ. In turn, the address is needed for detecting mispredicted loads. SQ_index = SQ_tail - 1: Index of the immediately preceding store, in program order. When the load executes later, it searches the SQ for a dependence on a prior store. It only considers stores between SQ_head and SQ_index: these are the stores before the load, in program order. ECE 721, Spring 2019 Prof. Eric Rotenberg

7 (1) Place load or store in Active List at tail.
fetch decode rename (1) Place load or store in Active List at tail. (2) Place load or store in Issue Queue (IQ). (3) Place load or store in Load Queue (LQ) or Store Queue (SQ), respectively, at tail. A load gets LQ tail (LQ_index: where it resides in LQ) and SQ tail minus 1 (SQ_index: index of immediately preceding store in SQ). A store gets SQ tail (SQ_index: where it resides in SQ) and LQ tail (LQ_index: index of to-be-dispatched, immediately succeeding load in LQ). dispatch schedule (1) Calculate address (AGEN). (2) Load: Use address to access D$ and search SQ for matching addresses (D$ and SQ accessed in parallel); based on result of SQ search, load gets value from SQ (closest matching store) or D$ (no matching store in SQ). Also record load’s address in LQ. Store: Use address to search LQ for matching addresses; if there is a future load that already executed, and its address matches, mark that load in the Active List as “mispredicted” Also record store’s address and value in the SQ. register read execute writeback Load: If marked as “mispredicted”, initiate recovery actions (e.g., use “Approach #1” or “Approach #2”); otherwise commit load the same way as other register-producing instructions (Note: Re-executing a mispredicted load after recovery will succeed because all prior stores have committed.) Store: Signal the store at the head of the SQ to write its value to the D$ at its address (Note: The store at the head of the Active List is the same as the store at the head of the SQ.) After load or store successfully commits, pop from LQ or SQ, respectively. retire ECE 721, Spring 2019 Prof. Eric Rotenberg

8 Store execution datapath
B C 13 1 d 14 15 ECE 721, Spring 2019 Prof. Eric Rotenberg

9 Store execution datapath (cont.)
ECE 721, Spring 2019 Prof. Eric Rotenberg

10 Load execution datapath
B C 15 1 d 13 14 store value ECE 721, Spring 2019 Prof. Eric Rotenberg

11 Load execution datapath (cont.)
ECE 721, Spring 2019 Prof. Eric Rotenberg

12 Speculative Load Handling: A Rich Design Space
A ready load is speculative if there are unknown store addresses between it and closest matching store address (if any) Four dimensions of speculative load handling Memory Dependence Prediction Store-load synchronization strategy Load misprediction recovery strategy Impact of store execution (split stores vs. no split stores) ECE 721, Spring 2019 Prof. Eric Rotenberg

13 Memory Dependence Prediction
Static prediction policies Always predict no dependence with prior unexecuted stores (this was our initial policy) Always speculatively execute the load Always predict a dependence with a prior unexecuted store Always stall the load Dynamic memory dependence prediction A speculative load is either stalled or speculatively executed based on history Examples: Table of sticky-bits indexed by load PC (a mispredicted load sets its sticky-bit; periodically reset all sticky-bits to retrain) Store Sets (learn dependencies between store PCs and load PCs) ECE 721, Spring 2019 Prof. Eric Rotenberg

14 Store-Load Synchronization Strategy
How and when is a stalled load, unstalled? Two example approaches IQ based synchronization Augment the Issue Queue (IQ) to synchronize stores and their predicted-dependent loads Linking stores and loads in the IQ requires a memory dependence predictor like Store Sets to set up the linkages LQ based synchronization A load issues unimpeded. If SQ search is inconclusive (speculative load) and prediction is “stall”, don’t complete the load (like a cache miss). Periodically replay the load from the LQ until the SQ search is conclusive; or replay the load when it gets near or reaches head of active list; etc. ECE 721, Spring 2019 Prof. Eric Rotenberg

15 Load Misprediction Recovery Strategies
Squash: Wait for mispredicted load to reach head of Active List Squash pipeline, rollback RMT, etc. Restart fetching from PC of load Selective Re-execution: When store detects a mispredicted load, “replay” the load and its dependent instructions from either the IQ (must hold onto IQ entries until proven non-speculative) or a Replay Buffer How exactly? Wait until value prediction lectures for details on: (1) identifying a load’s dependent instructions, (2) re-injecting load-dependent instructions back into IQ. ECE 721, Spring 2019 Prof. Eric Rotenberg

16 Impact of Store Issue Policy
“Split Stores” Crack a store instruction into its address-generation micro-op (agen) and its value-read micro-op (val) The store takes two IQ entries, for its two independent micro-ops The store takes one SQ entry (agen and val recombine in the SQ) A stalled val does not stall a ready agen Permits stores to deposit their addresses in the SQ as soon as possible Loads have the best information possible when they search the SQ. Prevents unnecessary mispredictions. ECE 721, Spring 2019 Prof. Eric Rotenberg

17 Interactions Accurate memory dependence prediction
Suggests simple, low-performance recovery (easy and low-cost, doesn’t occupy IQ entries longer than needed) May render split stores unnecessary (split stores increase IQ pressure and issue width pressure) Split stores, selective re-execution With these aggressive mechanisms, is explicit memory dependence prediction unnecessary? I.e., always assume no dependencies with unknown store addresses? ECE 721, Spring 2019 Prof. Eric Rotenberg


Download ppt "Handling Stores and Loads"

Similar presentations


Ads by Google