Correct Alignment of a RAS after Call and Return Mispredictions Ghent University Veerle Desmet Yiannakis Sazeides Constantinos Kourouyiannis Koen De Bosschere.

Slides:



Advertisements
Similar presentations
NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer.
Advertisements

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Dynamic Branch Prediction
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides.
Computer Architecture 2011 – Branch Prediction 1 Computer Architecture Advanced Branch Prediction Lihu Rappoport and Adi Yoaz.
CS 7810 Lecture 7 Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching E. Rotenberg, S. Bennett, J.E. Smith Proceedings of MICRO-29.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
A Dynamic Binary Translation Approach to Architectural Simulation Harold “Trey” Cain, Kevin Lepak, and Mikko Lipasti Computer Sciences Department Department.
1 Lecture 19: Core Design Today: issue queue, ILP, clock speed, ILP innovations.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Better Branch Prediction Through Prophet/Critic Hybrids A. Falcón, J. Stark, A. Ramirez, K. Lai, M. Valero Paper Presentation and Discussion.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 20: Core Design Today: Innovations for ILP, TLP, power ISCA workshops Sign up for class presentations.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
1 Practical Selective Replay for Reduced-Tag Schedulers Dan Ernst and Todd Austin Advanced Computer Architecture Lab The University of Michigan June 8.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
Lecture 3. RAS Issues in Lucida Prof. Taeweon Suh Computer Science Education Korea University COM609 Topics in Embedded Systems.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
Power and Frequency Analysis for Data and Control Independence in Embedded Processors Farzad Samie Amirali Baniasadi Sharif University of Technology University.
TAC O » ACM Transactions on Architecture and Code Optimization (TACO) ˃Volume 5 Issue 3, Nov 2008 » Author: ˃Hans Vandierendonck Ghent University, Gent,
Code Size Efficiency in Global Scheduling for ILP Processors TINKER Research Group Department of Electrical & Computer Engineering North Carolina State.
On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
Idempotent Processor Architecture Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group UW-Madison MICRO 2011, Porto Alegre.
M. Mateen Yaqoob The University of Lahore Spring 2014.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.
Performance Implications of Faults in Prediction Arrays Nikolas Ladas Yiannakis Sazeides Veerle Desmet University of Cyprus Ghent University DFR’ 10 Pisa,
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Out-of-Order Execution & Register Renaming Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Asanovic/Devadas Spring.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Effective ahead pipelining of instruction block address generation André Seznec and Antony Fraboulet IRISA/ INRIA.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Exploiting Value Locality in Physical Register Files Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison 36 th Annual International Symposium.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
Optimizing Multipliers for the CPU: A ROM based approach Michael Moeng Jason Wei Electrical Engineering and Computer Science University of California:
Ghent University Veerle Desmet Lieven Eeckhout Koen De Bosschere Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Computer Structure Advanced Branch Prediction
COMP 740: Computer Architecture and Implementation
Computer Architecture Advanced Branch Prediction
‘99 ACM/IEEE International Symposium on Computer Architecture
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Pipeline Implementation (4.6)
Module 3: Branch Prediction
Address-Value Delta (AVD) Prediction
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Lecture 17: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Branch Prediction: Direction Predictors
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Pipelining and control flow
Lecture 19: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Serene Banerjee, Lizy K. John, Brian L. Evans
Branch Prediction: Direction Predictors
Wackiness Algorithm A: Algorithm B:
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
The O-GEHL branch predictor
Gang Luo, Hongfei Guo {gangluo,
Project Guidelines Prof. Eric Rotenberg.
Presentation transcript:

Correct Alignment of a RAS after Call and Return Mispredictions Ghent University Veerle Desmet Yiannakis Sazeides Constantinos Kourouyiannis Koen De Bosschere University of Cyprus

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Motivation Return-Address-Stack (RAS) 1. Correct alignment 2. Effect deeper pipelines by Veerle Desmet Ghent University

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Return-Address-Stack (RAS) return from printf main(): my_printf() my_function() my_printf() return main: my_printf() my_function() my_printf() return function calls push return address on RAS 1 my_printf() RAS return from printf my_printf: for(condition){ printf() } return printf() TOS printf: /* print */ return returns predicted by popping from RAS 2 return from my_printf return from my_printf return from my_function return from main my_function: /* fun */ return my_function()

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Mispredictions fetch checkpoint RAS return from printf TOS return from my_printf return from main  recovery wrong path Speculative RAS updates due to wrong path calls

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Bottlenecks for RAS performance De-alignment `unbalanced # of call/return’ Corruption `RAS content overwritten by wrong path calls’ Overflow `call depth exceeds stack size’ return from main return from printf RAS return from my_printf return from my_printf TOS

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Correct Alignment… RAS return from printf return from main TOS Checkpoint TOS 1 Conditional branch misprediction 2 Wrong path 3 Recovery to checkpointed TOS 4 return from wrong path e.g. 1 misspeculated call  return from wrong path return from my_function

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, … after Call Mispredictions RAS return from printf return from main TOS Checkpoint TOS 1 Call target misprediction + RAS update 2 Wrong path 3 Recovery to checkpointed TOS 4 e.g. 1 misspeculated call  return from my_function return from wrong path return from wrong path return from mispr. call

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, … after Return Mispredictions RAS return from printf TOS Checkpoint TOS 1 Return target misprediction + RAS update 2 Wrong path 3 Recovery to checkpointed TOS 4 e.g. no misspeculated calls or returns  return from mispr return return from mispr return return from main

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, return from printf TOS return from mispred call Correct Alignment Incorrect Alignment Correct Alignment TOS mispred. return addr Conditional branch misprediction Call misprediction Return misprediction RAS

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Correct Alignment 0% 5% 10% 15% 20% 25% 30% compress95 gcc95 go95 ijpeg95 li95 m88ksim95 vortex95 mcf00 parser00 vortex00 mesa00 average RAS misprediction rate 0,94 0,96 0,98 1,00 1,02 1,04 1,06 1,08 1,10 1,12 compress95 gcc95 go95 ijpeg95 li95 m88ksim95 vortex95 mcf00 parser00 vortex00 mesa00 average Speedup Incorrect Alignment Correct Alignment Speedup can be affected by up to 10%

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, A lot of published work… SimpleScalar HydraScalar Yeh, Intel patent SimWattch Simca Jourdan Eickemeyer, Hoyt, Hummel, McDonald, McMahan Steely INCORRECT = LOWER PERFORMING CORRECTUNCLEAR

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Bottlenecks for RAS performance De-alignment `unbalanced # of call/return’ Corruption `RAS content overwritten by wrong path calls’ Overflow `call depth exceeds stack size’ return from main return from printf RAS return from my_printf return from my_printf TOS

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, RAS content recovery 0,90 0,95 1,00 1,05 1,10 1,15 1,20 compress95 gcc95 go95 ijpeg95 li95 m88ksim95 vortex95 mcf00 parser00 vortex00 mesa00 average Speedup Incorrect Alignment Correct Alignment Also checkpoints/recovers top of stack data 2% speedup on average [Skadron et MICRO 1998] top of stack data

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Motivation Return-Address-Stack (RAS) 1. Correct alignment 2. Effect deeper pipelines by Veerle Desmet Ghent University

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Deeper Pipelines… 0,88 0,90 0,92 0,94 0,96 0,98 1, pipeline stages relative IPC to full RAS recovery 32-entry RAS 16-entry RAS 8-entry RAS 64-entry RAS On average, reasonable scaling… Independence on RAS size

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, But... per benchmark -7% -5% 32-entry RAS 15 stages 10 stages 5 stages 20 stages 25 stages 30 stages

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, RAS Corruptions per kilo instructions 32-entry RAS … Backward (tos,1,…) more destructive than forward corruption (31,30,…) 31 tos corruption distance 1

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Individual benchmarks 32-entry RAS 20 stage pipeline

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, TOS behavior time li95gcc95 time 32-entry RAS 20 stage pipeline Mainly forward corruption More backward corruption wrong path RAS updates

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Bottlenecks for RAS performance De-alignment `unbalanced # of call/return’ Corruption `RAS content overwritten by wrong path calls’ Overflow `call depth exceeds stack size’ return from main return from printf RAS return from my_printf return from my_printf TOS

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, TOS Behavior no overflow 32-entry RAS 20 stage pipeline time li95gcc95 overflow

Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Summary Correct Aligned RAS Return misprediction decrease with 40% Speedup of up to 10% Deeper Pipelines One of the best performing RAS recovery techniques Satisfactory on average Performance decrease up to 7% for some programs May need to checkpoint more content Paper: Possible implementations Call uncorruption optimization for free How to fix correct alignment in SimpleScalar

Correct Alignment of a RAS after Call and Return Mispredictions Ghent University Veerle Desmet Yiannakis Sazeides Constantinos Kourouyiannis Koen De Bosschere University of Cyprus