Download presentation
Presentation is loading. Please wait.
Published byMoses Wilkerson Modified over 9 years ago
1
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt
2
August 27, 2003Euro-Par 20032 Instruction Fetch Wide-issue superscalar processors need to fetch multiple branches per cycle –IPC=8 implies fetching ~16 instructions/cycle and predicting ~3 branches/cycle –Multi-ported instruction cache? Trace cache: –Packs fetch groups in a trace –Trace tagged with PC, path, next fetch PC –Multiple branch predictor (MBP) predicts branch directions
3
August 27, 2003Euro-Par 20033 The Trace Cache instruction cache trace cache MBP MUX select hit pred. trace pred. insn fetch address instructions hit/miss legend pred. path fetch address next addressinstructions fill unit only executed paths!
4
August 27, 2003Euro-Par 20034 Overview Observation –Trace cache misses are (sometimes) branch mispredictions Trace Substitution –How to make use of it Evaluation –Is it worth it? Conclusion
5
August 27, 2003Euro-Par 20035 Observation Multiple branch predictor affects trace cache: –Non-perfect branch predictors reduce the trace cache hit rate –FIPA correlates better with TC hit rate than with MBP accuracy TC: 16K-traces, 4-way set-assoc, path associativity MGAg, Mgshare: 12-bit history repeat: 8Kbit hybrid, accessed 3x
6
August 27, 2003Euro-Par 20036 TC Misses Are a Tell-Tale for MBP misses Trace cache misses coincide with branch mispredictions, e.g.: –16K-entry trace cache, 12-bit MGAg: 84.9% of TC misses are also MBP misses 37.6% of MBP misses are also TC misses –256-entry trace cache, 12 bit MGAg: 25.1% of TC misses are also MBP misses 55.9% of MBP misses are also TC misses This work: use TC misses to detect MBP misses and fix them high accuracy, low coverage low accuracy, higher coverage
7
August 27, 2003Euro-Par 20037 Trace Substitution Assumption: TC miss implies MBP miss –Correlation between branches implies that some paths never occur –TC stores only those paths that do occur If the predicted path is wrong … –Fetch a different trace –Override MBP with MRU trace starting at fetch PC Detect MRU trace from LRU bits stored in TC No trace substitution applied if it does not exist
8
August 27, 2003Euro-Par 20038 Implementation instruction cache trace cache MBP MUX select hit MRU hit MRU pred. trace pred. insn fetch address instructions hit/miss legend pred. path fetch address next addressinstructions fill unit
9
August 27, 2003Euro-Par 20039 Evaluation Setup Benchmarks –SPECint95 (except compress, go), reference inputs –500 million instructions from start of program –Compiled for Alpha ISA, Compaq C compiler, -O4 Fetch Unit –TC: 1 trace = 16 instructions, 3 cond. branches, trace ends at system call, indirect jump –TC: 4-way set-assoc., path associativity –MBP: MGAg, varying history length –Instruction cache: 32K, 2-way, 32byte blocks, LRU Metric –FIPA = fetched instructions per fetch unit access
10
August 27, 2003Euro-Par 200310 Evaluation (1) Observations: –Gap MGAg-perfect increases with TC size –20-40% of gap filled with trace substitution –Only on TC miss, thus performance increase drops with TC size TC: 4-way set-associative MGAg: 12-bit history
11
August 27, 2003Euro-Par 200311 Evaluation (2) Observations: –Compensate poor branch predictor –No history ~ 10 bit history –Improvement drops with more accurate predictor TC: 256 traces, 4-ways
12
August 27, 2003Euro-Par 200312 Accuracy vs. Usage Definitions: –Usage = substitutions per fetch unit access –Accuracy = fraction correct substitutions Note –Accuracy limited because correct-path trace is not always present! TC: 256 traces, 4-way
13
August 27, 2003Euro-Par 200313 Conclusion Proposed trace substitution –TC miss flags MBP miss Not always correct, not all MBP misses found Fetch MRU trace instead: cheap implementation Results in –Consistent performance improvement No history+substitution ~ MGAg with 10-bit history In other cases: 0.2 instructions/access or same performance as with 16 times smaller MBP Most effective when MBP or TC is small
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.