Compiler Optimization-Space Exploration Adrian Pop IDA/PELAB Authors Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David I. August
November 11, Outline Introduction The Problem: Predictive Heuristics and A Priori Evaluation Some Solutions: Iterative Compilation and A Posteriori Evaluation Our Solution Optimization-Space Exploration Evaluation Conclusion
November 11, Introduction Processors become more complex incorporate additional computational resources Consequence Compilers become more complex use aggressive optimizations have to use predictive heuristics in order to decide where and to what extend optimizations should be applied
November 11, The Problem: Predictive Heuristics Predictive Heuristics tries to determine a priori the benefits of certain optimization are tuned to give the highest average performance The Result significant performance gains are unrealized!
November 11, Some Solutions: Iterative Compilation Iterative Compilation optimize the programs in many ways choose a posteriori the best code version Pitfall of current schemes prohibitive compilation times! limitation to specific architectures embedded systems limited to specific optimizations
November 11, Our solution: Optimization-Space Exploration OSE Compiler ( Practical Iterative Compilation ) explores the space of optimization configurations through multiple compilations it uses the experience of the compiler writer to prune the number of configurations that should be explored uses a performance estimator to not evaluate the code by execution selects a custom configuration for each code segment selects next optimization configuration by examining the previous configurations characteristics
November 11, OSE over many conigurations
November 11, OSE – Limiting the Search Space Optimization Space derived from a set of optimization parameters Optimization Parameters Optimization level High Level Optimization (HLO) level Micro-architecture type Coalesce adjacent loads and stores HLO phase order Loop unroll limit Update dependencies after unrolling Perform software pipelining
November 11, OSE – Limiting the Search Space Optimization Parameters Heuristic to disable software pipelining Allow control speculation during software pipelining Software pipeline outer loops Enable if-conversion heuristic for software pipelining Software pipeline loops with early exists Enable if conversion Enable non-standard predication Enable pre-scheduling Scheduler ready criterion
November 11, OSE – Limiting the Search Space Compiler Construction-time Pruning limit the total number of configurations that will be considered at compile time construct a set S with at most N configurations S is chosen by determining the impact on a representative set of code segments C as follows: S’ = default configuration + configurations with non-default parameters a) run C compiled with S’ on real hardware and retain in S’ only the valuable configurations b) consider the combination of configurations in S’ as S’’ repeat a) for S’’ and retain only the best N configurations repeat b) until no new configurations can be generated or the speedup does not improve
November 11, OSE – Limiting the Search Space Characterizing Configuration Correlations build a optimization configuration tree critical configurations = conf. at the same level 1. Construct O = set of m most important configurations in S for all code segments in C 2. Choose all oi in O as the successor of the root node. 3. For each configurations oi in O: 4. Construct Ci = {cj: argmax(pj,k) = i} k=1…m 5. Repeat steps 3, 4 to find oi successors limiting the code segments to Ci and configurations to S\O.
November 11, OSE – Limiting the Search Space Compile-time search do a breadth first search on the optimization configuration tree choose the configuration that yields the best estimated performance
November 11, OSE – Limiting the Search Space Limit the OSE application to hot code segments hot code segments are identified through profiling or hardware performance counters during a program run
November 11, Evaluation OSE Compiler Algorithm 1. Profile the code 2. For each Function: 3. Compile to the high level IR 4. Optimize using HLO 5. For each Function: 6. If the function is hot: 7. Perform OSE on second HLO and CG 8. Emit the function using the best configuration 9. If the function is not hot use the standard configuration
November 11, Compile-time Performance Estimation Model Based on: Ideal Cycle Count – T Data cache performance, Lambda, L Instruction cache performance, I Branch misprediction, B
November 11, Results
November 11, Conclusions