Better Branch Prediction Through Prophet/Critic Hybrids A. Falcón, J. Stark, A. Ramirez, K. Lai, M. Valero Paper Presentation and Discussion.

Slides:

Advertisements

Similar presentations

Branch prediction Titov Alexander MDSP November, 2009.

Advertisements

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Spring 2006.

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University.

A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.

CS752 Decoupled Architecture for Data Prefetching Jichuan Chang Kai Xu.

Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.

June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.

Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department.

Combining Branch Predictors

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton University M. Franklin – University of Maryland Presented by:

EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.

Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.

Branch Target Buffers BPB: Tag + Prediction

Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches from a Large Global History Renjiu Thomas, Manoij Franklin,

1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.

EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.

Predictor-Directed Stream Buffers Timothy Sherwood Suleyman Sair Brad Calder.

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.

Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.

1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.

Revisiting Load Value Speculation:

Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,

1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA.

ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.

Power and Frequency Analysis for Data and Control Independence in Embedded Processors Farzad Samie Amirali Baniasadi Sharif University of Technology University.

Trace cache and Back-end Oper. CSE 4711 Instruction Fetch Unit Using I-cache I-cache I-TLB Decoder Branch Pred Register renaming Execution units.

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

Sampling Dead Block Prediction for Last-Level Caches

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

The TM3270 Media-Processor. Introduction Design objective – exploit the high level of parallelism available. GPPs with Multi-media extensions (Ex: Intel’s.

Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.

Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.

Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,

UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.

Power Awareness through Selective Dynamically Optimized Traces Roni Rosner, Yoav Almog, Micha Moffie, Naftali Schwartz and Avi Mendelson – Intel Labs,

Fetch Directed Prefetching - a Study

André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.

Effective ahead pipelining of instruction block address generation André Seznec and Antony Fraboulet IRISA/ INRIA.

On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.

Varun Mathur Mingwei Liu Sanghyun Park, Aviral Shrivastava and Yunheung Paek.

Prophet/Critic Hybrid Branch Prediction B B B

1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.

Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.

Samira Khan University of Virginia April 12, 2016

Dynamic Branch Prediction

Exploring Value Prediction with the EVES predictor

Chang Joo Lee Hyesoon Kim* Onur Mutlu** Yale N. Patt

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

EE 382N Guest Lecture Wish Branches

Ka-Ming Keung Swamy D Ponpandi

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

15-740/ Computer Architecture Lecture 16: Prefetching Wrap-up

Lecture 1 An Overview of High-Performance Computer Architecture

Lois Orosa, Rodolfo Azevedo and Onur Mutlu

The O-GEHL branch predictor

Ka-Ming Keung Swamy D Ponpandi

Project Guidelines Prof. Eric Rotenberg.

Presentation transcript:

Better Branch Prediction Through Prophet/Critic Hybrids A. Falcón, J. Stark, A. Ramirez, K. Lai, M. Valero Paper Presentation and Discussion

Processor Design Objectives Higher performance Higher performance Lower power Lower power Lower energy Lower energy

Better Branch Prediction Increases performance through less time spent speculating on mispredicted paths Increases performance through less time spent speculating on mispredicted paths Reduces power through lower processor frequency Reduces power through lower processor frequency Reduces energy consumption through less work wasted on misspecualtion Reduces energy consumption through less work wasted on misspecualtion

Branch Predictors = Taxi Drivers ! Taxi = Processor Driver = Branch Predictor Passenger = Pipeline Intersections = Control Branches

Branch Predictors = Taxi Drivers ! Wrong turns waste passenger time Therefore need to make less mispredictions or Lower misprediction rate

Prophet/Critic Hybrid Predictors Driver is the Prophet Driver is the Prophet Another co-driver in the backseat of the Taxi is the Critic Another co-driver in the backseat of the Taxi is the Critic Critic waits until sure that they are lost (branch misprediction) Critic waits until sure that they are lost (branch misprediction) Critic points out the mistake and they backtrack to the wrongly taken intersection (branch) Critic points out the mistake and they backtrack to the wrongly taken intersection (branch)

Related Techniques McFarling first proposed two component predictors and a selection mechanism McFarling first proposed two component predictors and a selection mechanism Jiménez et al. proposed two predictors different in their accuracy, size, and latency Jiménez et al. proposed two predictors different in their accuracy, size, and latency Grunwald et al. shows that using a confidence estimator history register improves speculation control, uses on future bit. Grunwald et al. shows that using a confidence estimator history register improves speculation control, uses on future bit.

Prophet/Critic Better ! No need for selection mechanism No need for selection mechanism Critic uses branch future bits Critic uses branch future bits Prophet and Critic operating autonomously predicting the same branch at different times, greatly improves accuracy Prophet and Critic operating autonomously predicting the same branch at different times, greatly improves accuracy

Prophet/Critic Hybrid Structure

Decoupled Front-End Architecture

Filtering the Critic Sounds a lot like a Cache !!

Theory Behind P/C Hybrids Prophet is like stream mode compressor encoders Prophet is like stream mode compressor encoders Critic uses past and future probabilities approximating a Markov model Critic uses past and future probabilities approximating a Markov model

Simulation Tools An enhanced version of Intel P4 An enhanced version of Intel P4 Gshare, 2Bc-gskew, Percepteron for Prophet Gshare, 2Bc-gskew, Percepteron for Prophet Tagged Gshare for Critic Tagged Gshare for Critic

Simulation Results (108 Benchmarks) - On average, 0-12 future bits  35% lower mispredictions - Adding just the first future bit results in 15% lower mispredictions

More results Prophet/Critic Hybrid was same size as Prophet alone, But with 25% – 31% reduced misprediction rate

Processor Performance With 4-bits  4.7 % speedup With 12-bits  8 % speedup Intel P4 with 8.6 % reduced energy

Our turn to critique Needs fast hardware to compute prediction/mispredictions and refile the FTQ before branches consumed by I-Cache Needs fast hardware to compute prediction/mispredictions and refile the FTQ before branches consumed by I-Cache Large tag  small coverage, small tag  contention; Therefore it is not universal Large tag  small coverage, small tag  contention; Therefore it is not universal How to select branches to cover in the filtered critic How to select branches to cover in the filtered critic