Exploring Value Prediction with the EVES predictor

Slides:



Advertisements
Similar presentations
André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.
Advertisements

H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second.
1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.
Federation: Repurposing Scalar Cores for Out- of-Order Instruction Issue David Tarjan*, Michael Boyer, and Kevin Skadron* University of Virginia Department.
Exploring Correlation for Indirect Branch Prediction 1 Nikunj Bhansali, Chintan Panirwala, Huiyang Zhou Department of Electrical and Computer Engineering.
André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
Yue Hu David M. Koppelman Lu Peng A Penalty-Sensitive Branch Predictor Department of Electrical and Computer Engineering Louisiana State University.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
TAGE-SC-L Branch Predictors
1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
Address-Value Delta (AVD) Prediction Onur Mutlu Hyesoon Kim Yale N. Patt.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Revisiting Load Value Speculation:
1 Practical Selective Replay for Reduced-Tag Schedulers Dan Ernst and Todd Austin Advanced Computer Architecture Lab The University of Michigan June 8.
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
1 The Inner Most Loop Iteration counter a new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Value Prediction Kyaw Kyaw, Min Pan Final Project.
CSL718 : Pipelined Processors
Lecture: Out-of-order Processors
Data Prefetching Smruti R. Sarangi.
Multiscalar Processors
CS 704 Advanced Computer Architecture
5.2 Eleven Advanced Optimizations of Cache Performance
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.
Practical Value Speculation for Future High-End Processors
Arthur Perais & André Seznec
CMSC 611: Advanced Computer Architecture
Looking for limits in branch prediction with the GTL predictor
Milad Hashemi, Onur Mutlu, Yale N. Patt
Address-Value Delta (AVD) Prediction
Lecture on High Performance Processor Architecture (CS05162)
Lecture 17: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Ka-Ming Keung Swamy D Ponpandi
Indian Institute of Technology Kanpur
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Lecture: Out-of-order Processors
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Advanced Computer Architecture
Lecture 19: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
* From AMD 1996 Publication #18522 Revision E
Data Prefetching Smruti R. Sarangi.
TAGE-SC-L Again MTAGE-SC
Dynamic Hardware Prediction
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
The O-GEHL branch predictor
Conceptual execution on a processor which exploits ILP
Ka-Ming Keung Swamy D Ponpandi
Spring 2019 Prof. Eric Rotenberg
Project Guidelines Prof. Eric Rotenberg.
Presentation transcript:

Exploring Value Prediction with the EVES predictor André Seznec IRISA/INRIA EVES 14/11/2018

Remove Data depencies with Value Prediction [Lipasti96][Mendelson97] ILP = 1 ILP = 2 I1 I1 I4 I2 I2 I5 Predict result of I3 in advance I3 I3 I6 I4 Validate prediction when I3 executes I5 I6 t EVES 14/11/2018

Why ? Value Prediction Large body of research 96-02 Quite efficient: Surprisingly high number of predictable results Sometimes up to 90 % !! Not implemented so far: Why ? EVES 14/11/2018

The conventional (expert) wisdom on VP Complexity/power at any pipeline stage: Untractable critical path for prediction Untractable repair on misprediction Huge extra complexity in the execution core Extra register file ports for prediction writes and validations Extra data paths for validations EVES 14/11/2018

The conventional (expert) wisdom on VP Complexity/power at any pipeline stage: Untractable critical path for prediction Untractable repair on misprediction Huge extra complexity in the execution core Extra register file ports for prediction writes and validations Extra data paths for validations 2013-2015 That’s not true !! EVES 14/11/2018

VTAGE predictor: high coverage, no critical path What we showed: VTAGE predictor: high coverage, no critical path Validation/repair can be delayed till commit only use high confidence predictions EOLE: VP enables to simplify the OoO core EVES 14/11/2018

Championship to promote VP ? I was skeptical: Framework: Independence from the microarchitecture? Prediction validation? Which metric? What can be learned? But I add to compete and did learn 😇 EVES 14/11/2018

The value predictors Two families of predictors: Context based predictors: -predict a value already been seen in a similar context PC: last value predictor PC+ value history: FCM predictor PC + branch history: VTAGE Computational predictors: -compute the value from previous occurrence The stride predictor + hybrid predictors D-FCM, D-VTAGE EVES 14/11/2018

EVES: Enhanced VTAGE Enhanced Stride Combines a context predictor and a computational predictor VTAGE based Stride based Eliminate null strides: Address distinct instructions EVES 14/11/2018

Conventional stride predictor High Prediction Retired Last Value High Low Low Speculative Window Prediction can be used only if it is high confidence If any low confidence prediction occurrence in the pipeline: low confidence EVES 14/11/2018

E-Stride predictor Confidence Inflight Val Stride Val + (Inflight+1) *Stride Speculative window Val/Stride table Update at commit PC Confidence Inflight = number of speculative occurrences of the instruction EVES 14/11/2018

VTAGE: global branch history pred use ghist[0,L(N)] VT1 VTN pred ctr pred ctr tag … =? hash pc ghist[0,L(1)] The rightmost matching component Predicts Sat? VT0 EVES 14/11/2018

What VTAGE is Really About Global history is non-speculative The result of the previous instance is not required: No speculative window. No prediction critical loop In our simulations (HPCA 2014), more efficient than FCM predictors EVES 14/11/2018

What is not that great with VTAGE Tagless entries on the base predictor: Inherited from ITTAGE, but VP is different: Prediction is not mandatory Wide entries: 64 bits + tag + confidence + replacement Most allocated entries never useful EVES 14/11/2018

E-VTAGE Tagged base predictor + associativity Do not store the 64-bit values, but a hash or a pointer Share the overall storage space among all logical tables EVES 14/11/2018

Values Increasing history length Hit or miss Hash/pointer tag confidence Values Increasing history length Validity EVES 14/11/2018

That’s all for the predictors layout !! But most of the benefits come from: Confidence management Insertion/replacement EVES 14/11/2018

All predictions are not equal !! Better predicting a load missing the LLC than a simple ALU ! need for less correct pred. to amortize a misp. Better predicting a large value than a null (or small) value EVES 14/11/2018

Update confidence on the E-stride predictor Extensive use of probabilistic confidence counters 5-bit High confidence if C>6 Decrease by 4 on mispredictions restart predicting immediately after most misp Increment probabilities by inst. types and latencies From 1 for a load LLC miss to 1 128 for an ALU op And less if the stride is small EVES 14/11/2018

Insert/Evict on the E-stride predictor Favor high reward instructions Evict useless entries Insert with probability 1 on a load LLC miss with probability 1 64 on a ALU op 48 entries capture the overall potential on the distributed traces EVES 14/11/2018

Update confidence on E-VTAGE Probabilistic confidence counters 3-bit High confidence if C==7 Decrease by 2 on misp. with C==7, reset otherwise Increment probabilities by inst. types, latencies and values From 1 for a load LLC miss to 1 64 for an ALU op Much less if the value is null or small EVES 14/11/2018

Insert/Evict on E-VTAGE Favor high reward instructions/value Evict useless entries Systematic allocation on a ”misprediction” (already medium confidence) On the 32KB predictor: Insert with probability 1 on a load LLC miss to probability 1 2048 on a ALU op with null result EVES 14/11/2018

All the probabilities were defined for the CVP framework Should be adapted for effective designs: Fetch/Issue width Instruction window size Instruction latency Memory hierarchy EVES 14/11/2018

Miscelleanous To avoid slowdowns: Safety nets to limit bursts of mispredictions EVES 14/11/2018

Performance of E-Stride 48-entry, 106 bits per entry ➡ 5,088 bits conventional stride: + 3.1 % 😖😖 + confidence and insertion prob.: + 3.7 % 😖 + E-Stride algorithm: +10.4 % 🤗😇 + smart confidence reset : +16.1% 🤗😇🤠😎 EVES 14/11/2018

Performance Analysis of E-Stride More than 10 % on 41 out of 135 traces Up to 1090 %, more than 100 % on 12 traces Very limited prediction coverage: 3.4 % (up to 56.7 %) Not that high accuracy: 99.3 % Predicting only loads: 15.8 % (1.3 % coverage) EVES 14/11/2018

Performance of E-VTAGE 8KB ➡ 1,504 entries, 384 64-bit values 7.0 % performance increase 14.5 % prediction coverage 32KB ➡ 6,272 entries, 1,536 64-bit values 11.2 % performance increase 23 % prediction coverage EVES 14/11/2018

Performance Analysis of 32KB E-VTAGE More than 10 % on 40 out of 135 traces (up to 149 %) Very high accuracy: 99.95 % ➡Less high reward predictions than E-Stride Predicting only loads: 10.4 %⬌11.2 % (17 % prediction coverage) Last Value Predictor: 6.7 % EVES 14/11/2018

Performance of EVES 8KB predictor 25.3% performance increase More than 20 % on 50 out of 135 traces Up to 1090 %, more than 100 % on 15 traces 32KB predictor 30.8% performance increase More than 20 % on 54 out of 135 traces Up to 1090 %, more than 100 % on 15 traces EVES 14/11/2018

Performance Analysis of EVES E-VTAGE and E-Stride do not predict the same instructions Performance benefits are multiplicative E-Stride is the most rewarding EVES 14/11/2018

Unlimited EVES predictor 37.3 % performance increase 49 % prediction coverage Could be augmented with other prediction schemes FCM, address prediction, D-FCM, D-VTAGE EVES 14/11/2018

Conclusion On a EOLE architecture with EVES, EVES combines 2 simple predictors: Limited access to speculative information No critical loop in prediction Differentiated confidence updating is key for performance High performance potential at limited storage budget On a EOLE architecture with EVES, 40 % of the instructions would not enter the OoO core EVES 14/11/2018