Download presentation
Presentation is loading. Please wait.
1
Revisiting Load Value Speculation:
Lois Orosa and Rodolfo Azevedo University of Campinas Revisiting Load Value Speculation: an Approach to Mainstream Processors
2
Projects Load Value Speculation Frequent Value Locality
On-chip photonics (Jorge González , PhD candidate)
3
Introduction to Value Speculation (I)
It was proposed in the 90´s Improve ILP by breaking true data dependencies (RAW) Speculation in all the instructions The prediction is written in the output register Predictors indexed by PC (at fetch time) The proposals were very complex in that time (many changes in the OoO engine) Recently Perais and Seznec revisited the topic [HPCA´13][ISCA´14][HPCA´15] Propose simplifications in the implementation Propose new predictors
4
Introduction to Value Speculation (II)
Confidence counters [Perais'13] (per instruction) to increase precision Only speculates when the confidence is high Reduce mispredictions Decrease coverage Increase when prediction is ok, reset when misprediction Precision If mispenalty is low, the system could tolerate low precision If mispenalty is high, precision should be high (99% or more) The prediction have to be available before dispatch time Available cycles: from fetch to dispatch The predictor delay is not critical
5
Introduction to Value Speculation (III)
Validation At execution time (OoO changes, small misprediction penalty) At commit time (no OoO changes, higher misprediction penalty) Recovering from misprediction: Selective reissue: faster, more complex (validation at execution time) Pipeline squashing: slower, more simple Two main problems: Register port pressure New extra ports (extra writes for predictions, extra reads for validations and predictor updates) Back-to-back predictions Predictors may depend on previous values
6
Contributions Analysis of the potential of Value Speculation in a narrow processor for different types of instructions Reducing complexity in narrow-width-issue processors by speculating only in load instructions AV predictor: two phase value predictor with prediction of addresses XLStride predictor: multilevel stride predictor
7
Baseline Processor & Benchmarks
Baseline: real narrow-width-issue processor ZSIM simulator: Westmere OoO x86-64bit , 4-issue, 2-level branch predictor 128-entry ROB, 32-entry load queue, 32-entry store queue L1I & L1D : 32KB 4-way, LRU, 4-cycle latency L2 Cache : 256KB, 8-way, LRU, 12-cycle latency Pipeline squashing, validation at commit Benchmarks Splash2, Parsec, SPEC2000, SPEC2006
8
Potential of Value Speculation (I)
Six categories of instructions Loads are the 25% of all dynamic micro-instructions High latency micro-instructions (more than 5 cycles) are not representative (included in “Other”)
9
Potential of Value Speculation (III)
Oracle predictor (no mispredictions) Value Speculation in each category of instruction Loads have almost the same potential than speculating in all instructions LOADS ALL NOTLOADS Loads (25%) have more potential gains than all the other instructions together (75%) LOADS ALL NOTLOADS
10
Advantages of Speculating only in Loads in a narrow processor
Value Speculation in Narrow-issue processors Reduced back-to-back prediction: less on-flight instructions Approach to mainstream processors Reduced misprediction penalty (smaller pipeline) Speculation in ¼ of the instructions (loads), with almost the same potential gains: Reduced Register port presure Reduced back-to-back prediction Still need confidence counters to increase precision “mcf” minimun precision: 76,7 % “tonto” minimum precision: 99,6 %
11
State of the Art Predictors
Last Value Predictor (LVP) Stride predictor {1,2,3,4,5}, Variants: 2D-Stride FCM VTAGE DVTAGE, DFCM
12
XLStride Predictor It detects strides between consecutive values, and also between alternating values: Examples: {2,1,1,4,4,3,6,6,7,8} , {4,0,4,9,4,1,4} It can have several levels X histories, each one containing stride information about the last X occurrences of the instruction. It requires X^2 strides + last value 16 bit strides X predictions: selection by confidence counters We implemented a 2LStride predictor (good relation performance/cost) Example: 2LStride, 1 bit confidence counter
13
AV Predictor Some benchmarks exhibit patterns in the addresses, not in the values Address table (AT): index by PC, result: predicted address Implemented with a state-of-the-art predictor Value Table (VT): index by predicted address, result: predicted value Implemented with a last value predictor VT is also updated in stores Detect patterns in the addresses: results are totally different from traditional predictors
14
Evaluation Load speculation 7 State of the art predictors
2LStride predictor 3 AV predictors Several Hybrid Predictors Uses the half of the entries of state of the art predictors [Perais and Seznec, HPCA'13]
15
Best of the single preditors
Results (I) Individual results: Hybrid Results: Always better than the best of the single predictors Best of the single preditors
16
Results (II) Multicore experiments with 24 cores
To check the influence of shared memory in the precision Precision on the value table => No changes in shared memory by remote processors
17
Conclusions We simulate a real processor (Intel Westmere) to approximate Value Prediction to general purpose processors (narrow-issue processors) Speculating in Loads has better cost/benefit than speculating in all the instructions (in narrow processors) We propose the XLStride predictor Detect more complex stride patterns We propose the AV predictor Complementary to the traditional predictors: ideal for hybrid predictors Speed-up up to 33% (average 10%) Shared memory in multicore processors barely affects the precision of predictors
18
Lois Orosa lois.orosa@ic.unicamp.br
Thank You!!
19
Potential of Value Speculation (II)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.