Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

Similar presentations


Presentation on theme: "Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt."— Presentation transcript:

1 Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

2 August 27, 2003Euro-Par 20032 Instruction Fetch Wide-issue superscalar processors need to fetch multiple branches per cycle –IPC=8 implies fetching ~16 instructions/cycle and predicting ~3 branches/cycle –Multi-ported instruction cache? Trace cache: –Packs fetch groups in a trace –Trace tagged with PC, path, next fetch PC –Multiple branch predictor (MBP) predicts branch directions

3 August 27, 2003Euro-Par 20033 The Trace Cache instruction cache trace cache MBP MUX select hit pred. trace pred. insn fetch address instructions hit/miss legend pred. path fetch address next addressinstructions fill unit only executed paths!

4 August 27, 2003Euro-Par 20034 Overview Observation –Trace cache misses are (sometimes) branch mispredictions Trace Substitution –How to make use of it Evaluation –Is it worth it? Conclusion

5 August 27, 2003Euro-Par 20035 Observation Multiple branch predictor affects trace cache: –Non-perfect branch predictors reduce the trace cache hit rate –FIPA correlates better with TC hit rate than with MBP accuracy TC: 16K-traces, 4-way set-assoc, path associativity MGAg, Mgshare: 12-bit history repeat: 8Kbit hybrid, accessed 3x

6 August 27, 2003Euro-Par 20036 TC Misses Are a Tell-Tale for MBP misses Trace cache misses coincide with branch mispredictions, e.g.: –16K-entry trace cache, 12-bit MGAg: 84.9% of TC misses are also MBP misses 37.6% of MBP misses are also TC misses –256-entry trace cache, 12 bit MGAg: 25.1% of TC misses are also MBP misses 55.9% of MBP misses are also TC misses This work: use TC misses to detect MBP misses and fix them high accuracy, low coverage low accuracy, higher coverage

7 August 27, 2003Euro-Par 20037 Trace Substitution Assumption: TC miss implies MBP miss –Correlation between branches implies that some paths never occur –TC stores only those paths that do occur If the predicted path is wrong … –Fetch a different trace –Override MBP with MRU trace starting at fetch PC Detect MRU trace from LRU bits stored in TC No trace substitution applied if it does not exist

8 August 27, 2003Euro-Par 20038 Implementation instruction cache trace cache MBP MUX select hit MRU hit MRU pred. trace pred. insn fetch address instructions hit/miss legend pred. path fetch address next addressinstructions fill unit

9 August 27, 2003Euro-Par 20039 Evaluation Setup Benchmarks –SPECint95 (except compress, go), reference inputs –500 million instructions from start of program –Compiled for Alpha ISA, Compaq C compiler, -O4 Fetch Unit –TC: 1 trace = 16 instructions, 3 cond. branches, trace ends at system call, indirect jump –TC: 4-way set-assoc., path associativity –MBP: MGAg, varying history length –Instruction cache: 32K, 2-way, 32byte blocks, LRU Metric –FIPA = fetched instructions per fetch unit access

10 August 27, 2003Euro-Par 200310 Evaluation (1) Observations: –Gap MGAg-perfect increases with TC size –20-40% of gap filled with trace substitution –Only on TC miss, thus performance increase drops with TC size TC: 4-way set-associative MGAg: 12-bit history

11 August 27, 2003Euro-Par 200311 Evaluation (2) Observations: –Compensate poor branch predictor –No history ~ 10 bit history –Improvement drops with more accurate predictor TC: 256 traces, 4-ways

12 August 27, 2003Euro-Par 200312 Accuracy vs. Usage Definitions: –Usage = substitutions per fetch unit access –Accuracy = fraction correct substitutions Note –Accuracy limited because correct-path trace is not always present! TC: 256 traces, 4-way

13 August 27, 2003Euro-Par 200313 Conclusion Proposed trace substitution –TC miss flags MBP miss Not always correct, not all MBP misses found Fetch MRU trace instead: cheap implementation Results in –Consistent performance improvement No history+substitution ~ MGAg with 10-bit history In other cases: 0.2 instructions/access or same performance as with 16 times smaller MBP Most effective when MBP or TC is small


Download ppt "Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt."

Similar presentations


Ads by Google