Post Pass Binary Adaptation for Software based Speculative Precomputation Steve S. Liao Perry H. Wang Hong Wang Gerolf Hoflehner Daniel lavery John P. Shen
SMT Vs Superscalar
Speculative Precomputation Reduces cache misses Spawns a thread for precomputing address of load Does not modify architectural state So need not be correct! Basically another prefetcher!
Post Pass Tool Identify delinquent loads –Small number of loads => majority of cache misses Slicing Scheduling Trigger identification SSP enabled binary generation
Slicing Region based –Loop, Loop body or a procedure –Graph with regions as nodes –Edges connect parents and child regions Speculative slicing –To reduce the slice length –Memory disambiguation –Data speculation –Removing unexecuted paths and unrealised calls
Scheduling For Chain SP –Graph partioning Forward dependencies Level sort –Schedule the resulting acyclic graph –Include synchronisation –Dependence reduction Loop rotation –Reduce loop carried dependency Branch Prediction
Trigger Identification Why can’t you move far ahead? –Copying overhead Vs Slack SSP enabled binary genration Choose Precomputation model –Chain or Basic?
Results