Phase based adaptive Branch predictor: Seeing the forest for the trees

Slides:



Advertisements
Similar presentations
Bimode Cascading: Adaptive Rehashing for ITTAGE Indirect Branch Predictor Y.Ishii, K.Kuroyanagi, T.Sawada, M.Inaba, and K.Hiraki.
Advertisements

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.
CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
CS752 Decoupled Architecture for Data Prefetching Jichuan Chang Kai Xu.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Phase Detection Jonathan Winter Casey Smith CS /05/05.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
Data Cache Prefetching using a Global History Buffer Presented by: Chuck (Chengyan) Zhao Mar 30, 2004 Written by: - Kyle Nesbit - James Smith Department.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
Dynamically Trading Frequency for Complexity in a GALS Microprocessor Steven Dropsho, Greg Semeraro, David H. Albonesi, Grigorios Magklis, Michael L. Scott.
Analysis of Branch Predictors
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
Methodologies for Performance Simulation of Super-scalar OOO processors Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project.
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division.
Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Dynamic Branch Prediction
Dynamically Sizing the TAGE Branch Predictor
Outline Motivation Project Goals Methodology Preliminary Results
Samira Khan University of Virginia Dec 4, 2017
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
CMSC 611: Advanced Computer Architecture
Tosiron Adegbija and Ann Gordon-Ross+
Module 3: Branch Prediction
So far we have dealt with control hazards in instruction pipelines by:
EE 382N Guest Lecture Wish Branches
Phase Capture and Prediction with Applications
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Yiyu Shi*, Jinjun Xiong+, Howard Chen+ and Lei He*
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
pipelining: static branch prediction Prof. Eric Rotenberg
Adapted from the slides of Prof
Program Phase Directed Dynamic Cache Way Reconfiguration
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Aliasing and Anti-Aliasing in Branch History Table Prediction
rePLay: A Hardware Framework for Dynamic Optimization
So far we have dealt with control hazards in instruction pipelines by:
Gang Luo, Hongfei Guo {gangluo,
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Phase based adaptive Branch predictor: Seeing the forest for the trees Karthik Jayaraman Vivek Shrivastava Brian Pellin Martin Hock Mikko H. Lipasti University of Wisconsin-Madison

Motivation Understanding and exploiting dynamic program behavior More powerful than static techniques Program’s flow control is a dynamic behavior Executes in “phases” or “repeated patterns”

Motivation Phase: Programs go through different phases of execution A period of execution that exhibits relatively stable program characteristics Programs go through different phases of execution Phases are often repeated at different times in execution During each phase hardware is exercised differently Hardware requirements may vary per phase

Motivation Microprocessors designed to provide good average performance Inefficient for individual programs Inefficient for various phase within the same program Configure Micro architecture features dynamically Use reconfigurable hardware to take advantage of phase information Reconfigurable caches Instruction window size Dynamic branch predictor

Motivation Dynamic reconfiguration algorithms Detect the current phase of program execution Tune the reconfigurable hardware for current phase Portions of configurable units can be turned on/off depending on specific requirements of phase

Sample Phase Behavior : gcc

Outline Phase Tracking Phase Prediction Phase Based Branch Prediction Experiments and Results Conclusions Future Work

Phase Tracking Goal: Track groups of 10 million instructions Identify program phases with different behavior Based on “Phase Tracking and Prediction” [Sherwood, Sair, Calder] Track groups of 10 million instructions Collect information about instructions and store Build a phase footprint After each 10 m instructions, compare footprint with past footprints If footprint close enough, it is considered a repetition of the phase

Accumulator Branch PC Hash # of inst. since branch +

Accumulator Branch PC 2 Hash # of inst. since branch 20 + Branch occurs, must increment entry 2 by 20.

Accumulator Branch PC 20 3 Hash # of inst. since branch 80 + New branch, increment entry 3 by 80.

Accumulator Branch PC 20 80 Hash # of inst. since branch + After a phase completes we need somewhere to store data about previous phases.

Past Footprint Table Accumulator Branch PC 20 80 Hash # of inst. since branch + *At 100 instructions

Past Footprint Past Footprint Table Accumulator Branch PC 20 80 Hash # of inst. since branch + Accumulator Data is stored in Past Footprint table

Past Footprint Table Past Footprint Accumulator 90 Branch PC 20 5 80 Hash # of inst. since branch 5 + *At 200 instructions Take the Manhattan distance between accumulator and Past Footprints 90+20+75+5 = 190

Past Footprint Table Past Footprint Accumulator 90 Branch PC 20 80 5 Hash # of inst. since branch 5 + *At 200 instructions

Past Footprint Past Footprint Table Accumulator 90 Branch PC 21 20 79 80 5 Hash # of inst. since branch 5 + *At 300 instructions Manhattan distance between this phase and first phase is 2. This phase is close enough to the first phase to be considered the same as phase one.

Past Footprint Past Footprint Table Accumulator 430 Branch PC 21 20 9 10 80 Hash # of inst. since branch 70 + *At 30 million instructions Manhattan distance between this phase and first phase is 2. This phase is close enough to the first phase to be considered the same as phase one.

Outline Phase Tracking Phase Prediction Phase Based Branch Prediction Experiments and Results Conclusions Future Work

Phase prediction To adjust hardware Three strategies Need to know in advance what phase we will be in Three strategies Last seen Markov with RLE Perceptron

Last seen Predict next phase = last phase Because last seen is so simple, another predictor would have to beat it significantly to justify the added cost

RLE Markov Adapted from Sherwood Assumes that if we see phase X exactly Y times in a row, followed by phase Z, then if we see phase X exactly Y times again, it will again be followed by Z

Perceptron Individual perceptrons work in binary (±1) Compute S as a function of history If S ≥ 0, predict “yes”, else predict “no” Train by adjusting weights for different components of history But there are many phases, not just 2 Combine perceptrons for multivalue prediction

Multivalue perceptron We have perceptrons P1, P2, …, Pn Perceptron Pi tries to predict phase i Train Pi to compute Si only if in phase i The perceptron with the maximum value above a certain threshold wins

Phase prediction results GCC: Last phase: 96% accurate RLE Markov: 94% accurate Perceptron: much lower

Phase prediction comments Training cost of multiple perceptron means that it does not always adapt quickly Not worth improving due to the accuracy of last phase

Outline Phase Tracking Phase Prediction Phase Based Branch Prediction Experiments and Results Conclusions Future Work

Phase Based Dynamic Branch Predictor Previous research shows the usefulness of adapting branch predictors at run time “Dynamic history-length fitting: a third level of adaptivity for branch prediction” [Juan, Sanjeevan, Navarro]. “Combining Branch Predictors” [McFarling] Single branch predictor may not perform well within and across different executions. “A study of Branch Prediction Strategies” [Smith] Program behavior almost uniform within a phase -> choose best predictor for each phase

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs Phase 1

Methodology Select a small group of relevant predictors At the beginning of each new phase, sample all the predictors and choose the best Save the best for each phase and use it if a phase reoccurs Phase 1 Phase 2

Outline Phase Tracking Phase Prediction Phase Based Branch Prediction Experiments and Results Conclusions Future Work

Multiple Branch Predictors Set of predictors 2level [1:1024:8] (Baseline predictor) Bimodal [1024] 2level [8: 512 :8] 2level [1: 512 :8] Profiling period 10 million instructions

Multiple Branch Predictors Simulator Used Simplescalar v3.0d Set of benchmarks gcc, vpr, mcf Selection Criterion Least Miss Rate If miss rates of two predictors are within 1 %, select the less expensive (simpler) one

Multiple Branch Predictor : Results IPC (gcc)

Multiple Branch Predictors: Results Branch Predictor Misses (gcc)

Multiple Branch Predictors: Results Branch Predictor Misses (mcf)

Multiple Branch Predictors IPC Comparison

Multiple Branch Predictors Branch Prediction Misses Comparison

Outline Phase Tracking Phase Prediction Phase Based Branch Prediction Experiments and Results Summary and Conclusions Future Work

Summary Significant reduction in branch mispredictions (29.88% - 44.35%) using phase based branch predictors Simple predictors beat more complex predictor in many phases Marginal gains in IPC using multiple branch predictor (2.24% - 4.70%)

Conclusions Phase based optimizations provides scope for improvements using reconfigurable hardware Using phase specific branch predictor provides good improvements in mis predictions A good strategy for saving power because of significant reductions in mispredictions.

Outline Phase Tracking Phase Prediction Phase Based Branch Prediction Experiments and Results Summary and Conclusions Future Work

Future Work Investigate in detail the impact of design parameters Phase detection threshold Impact of hash functions Investigate the perceptron model in detail Investigate more benchmarks Comparing against more complex baseline predictors McFarling Branch Predictor Other Hybrid predictors Direct measurement of power characteristics Measure power using simulators like WATTCH Power consumed during switching branch predictors

Thank You

Questions??

Multiple Branch Predictors: Results Branch Predictor Misses (mcf)

Multiple Branch Predictor : Results IPC (vpr)

Multiple Branch Predictors: Results Branch Predictor Misses (vpr)

Phase prediction comments Sherwood had lower accuracy for last phase (70%), perhaps due to oscillation Training cost of multiple perceptron means that it does not always adapt quickly Not worth improving due to the accuracy of last phase