Download presentation
Presentation is loading. Please wait.
Published byMorgan McCoy Modified over 9 years ago
1
CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998
2
Lifetime of a Load
3
LSQ Basics Ld/StAddressDataCompleted StoreUnknown1000-- Loadx40000000-- Storex50000000-- Loadx50000000-- Loadx30000000-- An incomplete store stalls all future loads – No Speculation – the paper is overly conservative because it also waits for store values Most of these stalls are unnecessary – artificial dependences
4
Aggressive Approach Assume that loads do not conflict with earlier stores – all loads and stores execute out of order -- Naive Speculation When there is a conflict, the load behaves like a branch mispredict – all subsequent instructions are squashed and re-fetched Expensive – 30-cycle penalty Rename checkpoints for all instructions Re-execute only the dependent instructions? – more complex, better performance
5
Ideal Model In the perfect model, loads only wait for conflicting stores – no artificial dependences and no memory-order violations
6
False Dependences and Violations
7
Store Sets Concept For every load, keep track of all stores that it has conflicted with in the past A load does not issue if members of its store set have not finished (dependences are introduced at the time of dispatch) The implementation is easy if a load depends on only one store a store is present in only one store set
8
Trivial Implementations Execution time normalized to an ideal store set implementation
9
Ideal Store Set Predictor An occasional memory-order violation can introduce many false dependencies – hence, use saturating counters
10
Implementation Overview Every ld/st depends on the last store in its set Causes serialized stores and false dependences st
11
Store Set Implementation Every load and store belong to one color – keep track of the last writer for each color – mpreds can pose problems Colors are merged as you discover m-o violations
12
Store Set Merging Store set merging improves performance by 12% Note that merging happens gradually – no need to instantly correct all entries in the table
13
Design Details Merging store sets To deal with occasional dependences and conflicts clear the table every million cycles use saturating counters for each entry The SSIT needs 4K entries and the LFST needs 128 entries
14
Results
15
Related Work Store barrier cache: identify stores that are likely to pose conflicts Keep track of all store-load conflict pairs and associatively check for dependences while dispatching instructions
16
Next Week’s Paper “Effective Hardware-Based Prefetching for High-Performance Microprocessors”, T.F. Chen and J.L. Baer, IEEE Transactions on Computers, May 1995
17
Title Bullet
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.