Indian Institute of Technology Kanpur

Slides:



Advertisements
Similar presentations
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Advertisements

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Pipelining and Control Hazards Oct
André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
Branch Target Buffers BPB: Tag + Prediction
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
Branch Prediction Dimitris Karteris Rafael Pasvantidιs.
Dynamic Branch Prediction
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Revisiting Load Value Speculation:
A STUDY OF BRANCH PREDICTION STRATEGIES JAMES E.SMITH Presented By: Prasanth Kanakadandi.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
Prophet/Critic Hybrid Branch Prediction B B B
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Dynamic Branch Prediction
CSL718 : Pipelined Processors
Lecture: Out-of-order Processors
Data Prefetching Smruti R. Sarangi.
CS203 – Advanced Computer Architecture
Dynamic Branch Prediction
Dynamically Sizing the TAGE Branch Predictor
Exploring Value Prediction with the EVES predictor
Lecture 6: Advanced Pipelines
Welcome to the 1st Championship Value Prediction (CVP) Workshop
So far we have dealt with control hazards in instruction pipelines by:
Lecture: Static ILP, Branch Prediction
Address-Value Delta (AVD) Prediction
Lecture: Branch Prediction
Could you explain _____ in another way?
Dynamic Branch Prediction
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
Data Prefetching Smruti R. Sarangi.
TAGE-SC-L Again MTAGE-SC
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Wackiness Algorithm A: Algorithm B:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
rePLay: A Hardware Framework for Dynamic Optimization
So far we have dealt with control hazards in instruction pipelines by:
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
The O-GEHL branch predictor
Multi-Lookahead Offset Prefetching
Project Guidelines Prof. Eric Rotenberg.
Lecture 7: Branch Prediction, Dynamic ILP
Presentation transcript:

Indian Institute of Technology Kanpur DFCM++: Augmenting DFCM with Early Update and Data Dependency-driven Value Estimation Nayan Deshmukh*, Snehil Verma*, Prakhar Agrawal*, Biswabandan Panda, Mainak Chaudhuri Indian Institute of Technology Kanpur CVP1, ISCA 2018

Introduction True data dependencies cause frequent stalls Value prediction breaks true data dependencies to extract more ILP What is Value prediction? Talk about the potential of value prediction

Introduction State-of-the-art DFCM DFCM to DFCM++ Future work Outline Introduction State-of-the-art DFCM DFCM to DFCM++ Future work Questions?

State-of-the-art VTAGE: Best performance among existing predictors Started with VTAGE as the base predictor Unable to improve performance significantly The number of correct prediction made by VTAGE were significantly less as compared to other predictors Do we need the principle of VTAGE

State-of-the-art Increasing correct predictions much harder than reducing incorrect predictions Set VTAGE here

State-of-the-art DFCM as base predictor: Capable of learning any given random pattern Based on the principle that if two PCs have the same context (same last N values) then there next value will be the same Lower performance due to the mispredictions

Outline Introduction State-of-the-art DFCM DFCM to DFCM++ Future work Questions?

Use the predicted value DFCM a1, a2, a3, a4, a5 Strides: a2-a1, a3-a2,... History Table Prediction Table Base Value Hash of last N strides Stride Confidence . . ① ② PC . . Saturated ? ADD Use the predicted value Add a diagram to explain strides ③ Predicted Value ③

Introduction State-of-the-art DFCM DFCM to DFCM++ Future work Outline Introduction State-of-the-art DFCM DFCM to DFCM++ Future work Questions?

DFCM++ We make 4 improvements to DFCM: Early Update Value estimator PC blacklisting Dynamic context length

Early Update Update to the predictor happens at the commit stage when the output of the instruction is available Should we explain back-to-back instructions? Maybe using firstlevel table instead of SHT might be easier for people Add animation to show incorrect prediction

Early Update Values attained by PC0: 6 8 10 12 14 16 History Table Prediction Table continuous Base Value Hash of last N values Stride Confidence . . ① ② PC0 10 0000100 2 3 . . ADD ③ Predicted Value 12

Early Update Values attained by PC0: 6 8 10 12 14 16 History Table Prediction Table continuous Base Value Hash of last N values Stride Confidence . . Note that, the entries aren’t updated as the PC0 haven’t committed yet! ① ② PC0 10 0000100 2 3 . . ADD ③ Predicted Value 12

We often mispredict inflight instructions Early Update Values attained by PC0: 6 8 10 12 14 16 History Table Prediction Table continuous Base Value Hash of last N values Stride Confidence . . We often mispredict inflight instructions After the first prediction the History table is rendered inconsistent, leads to incorrect predictions ① ② PC0 10 0000100 2 3 . . ADD ③ Predicted Value 12

Early Update We add a predictive state for each of the History table entries Have animation to show adding of fields in table

Predicted Hash of last N strides Early Update Values attained by PC0: 6 8 10 12 14 16 continuous History Table Prediction Table Base Value Predicted Base Value Hash of last N strides Predicted Hash of last N strides Stride Confidence . . ① ② PC0 10 10 0000100 0000100 2 3 . . ADD ③ Predicted Value 12

Predicted Hash of last N strides Early Update Values attained by PC0: 6 8 10 12 14 16 continuous History Table Prediction Table Base Value Predicted Base Value Hash of last N strides Predicted Hash of last N strides Stride Confidence . . ① ② PC0 10 12 0000100 0000100 2 3 . . ADD ③ Predicted Value Luckily the hash function didn’t change! 14

Predicted Hash of last N strides Early Update Values attained by PC0: 6 8 10 12 14 16 continuous History Table Prediction Table Base Value Predicted Base Value Hash of last N strides Predicted Hash of last N strides Stride Confidence . . ① ② PC0 10 12 0000100 0000100 2 3 . . ADD ③ Predicted Value Luckily the hash function didn’t change! 14

Early Update At the commit stage we update the hash of last N strides and the base value If there are no inflight instruction for the given PC then we restore the predictive fields (index and base value) to the commit fields

Value Estimator Inspired by Early Execution in EOLE Include Early Execution as part predictor itself Highly inefficient implementation due to rigidity of CVP framework

Value Estimator No estimation PCIT N ① Y PC Estimate ③ RFB Y ⑤ Valid? src dst OP Available? Calculate ② ⑤ ③ RFB Y ⑤ 1 2 . . . ④ Valid? N ⑥

PC Blacklister Performance is very sensitive to incorrect predictions S.No Baseline IPC Correct pred Incorrect pred IPC 1. 1.21 26.5% 0.9% 1.13 2. 1.98 5.1% 0.3% 1.95 Performance is very sensitive to incorrect predictions We observed that some instructions showed a specific behavior which leads to incorrect predictions Unable to add arrow in linux Add to specify that green once are correct predictions, red are incorrect , patch in no prediction but shows increase of confidence Correct Predictions Incorrect Predictions Increasing confidence

PC Blacklister A very small fraction of PCs showed this behavior This could be avoided by increasing the confidence threshold But that reduces the overall number of correct prediction and reduces performance That why we blacklist the PCs that show this behavior

Dynamic Context Length Context length refers to the number of values used to create hash in the History table Trace Baseline IPC IPC (with context length) 64 32 16 fp_9 7.95 10.46 11.59 10.99 int_19 5.95 6.36 5.96 5.77 srv_34 1.36 1.53 1.68 1.98 Depending on the program behavior it gives better performance with different context length

Dynamic Context Length We maintain different context length in each of the history table entry And different prediction tables for each of the context length Use a simple meta predictor to decide between the context length Better do it via animation

DFCM++ Order 64 DFCM Order 32 DFCM Value Estimator Preference Counters ② ② Selector ① PC Mux ③ . Mux ④ PC Blacklisted ? Yes – Don’t use the predicted value No - Use the predicted value ⑤

DFCM++ provide a performance improvement of 28.1% Add a stats graph showing distribution of the improvements

Introduction State-of-the-art DFCM DFCM to DFCM++ Future work Outline Introduction State-of-the-art DFCM DFCM to DFCM++ Future work Questions?

Future work As is now Value Estimator is unrealistic for today’s processors We are exploring ways to better accommodate it in existing architecture Study the behavior of these improvements for the limited storage case

Introduction State-of-the-art DFCM DFCM to DFCM++ Future work Outline Introduction State-of-the-art DFCM DFCM to DFCM++ Future work Questions?

Questions?

Thanks to Microsoft Research India and Research-I foundation for the financial support to attend and present the paper at the CVP-1. Thank You