Correct Alignment of a RAS after Call and Return Mispredictions Ghent University Veerle Desmet Yiannakis Sazeides Constantinos Kourouyiannis Koen De Bosschere University of Cyprus
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Motivation Return-Address-Stack (RAS) 1. Correct alignment 2. Effect deeper pipelines by Veerle Desmet Ghent University
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Return-Address-Stack (RAS) return from printf main(): my_printf() my_function() my_printf() return main: my_printf() my_function() my_printf() return function calls push return address on RAS 1 my_printf() RAS return from printf my_printf: for(condition){ printf() } return printf() TOS printf: /* print */ return returns predicted by popping from RAS 2 return from my_printf return from my_printf return from my_function return from main my_function: /* fun */ return my_function()
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Mispredictions fetch checkpoint RAS return from printf TOS return from my_printf return from main recovery wrong path Speculative RAS updates due to wrong path calls
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Bottlenecks for RAS performance De-alignment `unbalanced # of call/return’ Corruption `RAS content overwritten by wrong path calls’ Overflow `call depth exceeds stack size’ return from main return from printf RAS return from my_printf return from my_printf TOS
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Correct Alignment… RAS return from printf return from main TOS Checkpoint TOS 1 Conditional branch misprediction 2 Wrong path 3 Recovery to checkpointed TOS 4 return from wrong path e.g. 1 misspeculated call return from wrong path return from my_function
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, … after Call Mispredictions RAS return from printf return from main TOS Checkpoint TOS 1 Call target misprediction + RAS update 2 Wrong path 3 Recovery to checkpointed TOS 4 e.g. 1 misspeculated call return from my_function return from wrong path return from wrong path return from mispr. call
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, … after Return Mispredictions RAS return from printf TOS Checkpoint TOS 1 Return target misprediction + RAS update 2 Wrong path 3 Recovery to checkpointed TOS 4 e.g. no misspeculated calls or returns return from mispr return return from mispr return return from main
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, return from printf TOS return from mispred call Correct Alignment Incorrect Alignment Correct Alignment TOS mispred. return addr Conditional branch misprediction Call misprediction Return misprediction RAS
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Correct Alignment 0% 5% 10% 15% 20% 25% 30% compress95 gcc95 go95 ijpeg95 li95 m88ksim95 vortex95 mcf00 parser00 vortex00 mesa00 average RAS misprediction rate 0,94 0,96 0,98 1,00 1,02 1,04 1,06 1,08 1,10 1,12 compress95 gcc95 go95 ijpeg95 li95 m88ksim95 vortex95 mcf00 parser00 vortex00 mesa00 average Speedup Incorrect Alignment Correct Alignment Speedup can be affected by up to 10%
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, A lot of published work… SimpleScalar HydraScalar Yeh, Intel patent SimWattch Simca Jourdan Eickemeyer, Hoyt, Hummel, McDonald, McMahan Steely INCORRECT = LOWER PERFORMING CORRECTUNCLEAR
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Bottlenecks for RAS performance De-alignment `unbalanced # of call/return’ Corruption `RAS content overwritten by wrong path calls’ Overflow `call depth exceeds stack size’ return from main return from printf RAS return from my_printf return from my_printf TOS
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, RAS content recovery 0,90 0,95 1,00 1,05 1,10 1,15 1,20 compress95 gcc95 go95 ijpeg95 li95 m88ksim95 vortex95 mcf00 parser00 vortex00 mesa00 average Speedup Incorrect Alignment Correct Alignment Also checkpoints/recovers top of stack data 2% speedup on average [Skadron et MICRO 1998] top of stack data
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Motivation Return-Address-Stack (RAS) 1. Correct alignment 2. Effect deeper pipelines by Veerle Desmet Ghent University
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Deeper Pipelines… 0,88 0,90 0,92 0,94 0,96 0,98 1, pipeline stages relative IPC to full RAS recovery 32-entry RAS 16-entry RAS 8-entry RAS 64-entry RAS On average, reasonable scaling… Independence on RAS size
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, But... per benchmark -7% -5% 32-entry RAS 15 stages 10 stages 5 stages 20 stages 25 stages 30 stages
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, RAS Corruptions per kilo instructions 32-entry RAS … Backward (tos,1,…) more destructive than forward corruption (31,30,…) 31 tos corruption distance 1
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Individual benchmarks 32-entry RAS 20 stage pipeline
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, TOS behavior time li95gcc95 time 32-entry RAS 20 stage pipeline Mainly forward corruption More backward corruption wrong path RAS updates
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Bottlenecks for RAS performance De-alignment `unbalanced # of call/return’ Corruption `RAS content overwritten by wrong path calls’ Overflow `call depth exceeds stack size’ return from main return from printf RAS return from my_printf return from my_printf TOS
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, TOS Behavior no overflow 32-entry RAS 20 stage pipeline time li95gcc95 overflow
Workshop on Duplicating Deconstructing and Debunking (WDDD 2005) --- June 4, Summary Correct Aligned RAS Return misprediction decrease with 40% Speedup of up to 10% Deeper Pipelines One of the best performing RAS recovery techniques Satisfactory on average Performance decrease up to 7% for some programs May need to checkpoint more content Paper: Possible implementations Call uncorruption optimization for free How to fix correct alignment in SimpleScalar
Correct Alignment of a RAS after Call and Return Mispredictions Ghent University Veerle Desmet Yiannakis Sazeides Constantinos Kourouyiannis Koen De Bosschere University of Cyprus