IBM Haifa Labs © 2005 IBM Corporation Performance Tools developed in IBM Haifa Gad Haber
IBM Haifa Labs © 2005 IBM Corporation 2 HRL Performance Tools FDPR-Pro Feedback-based optimizer operating on binary executable files Part of the AIX 5L Available on Linux on Power via alphaworks Under development for Mac OS X – to be available soon via alphaworks z/OS CodeAnalyzer Eclipse plugin tool for analyzing executable files Under development To be added as part of the Performance Work Bench (PerfWB) BProber Utility for instrumenting binary executable files Under development ESTO Utility for identifying the optimal set of optimization options Under development
IBM Haifa Labs © 2005 IBM Corporation FDPR-Pro Feedback Directed Program Restructuring
IBM Haifa Labs © 2005 IBM Corporation 4 FDPR-Pro - Feedback Directed Program Restructuring Using a global view of the entire program Operating on the executable file after linkage These properties enable FDPR-Pro to do: Global Code Reordering Inter Procedure Boundaries Optimizations Static Data Rearrangement Constant Area Rearrangement Data Prefetching Examples of FDPR-Pro additional optimizations: Usage of Branch Tables Usage of TOC load instructions More..
IBM Haifa Labs © 2005 IBM Corporation 5 Method Phase 1: Code instrumentation Basic block level Phase 2: Profile information gathering Selection of "right" input set (representative workload) Accumulation over several input sets Phase 3: Global Code & Data Optimizations Complements the compiler
IBM Haifa Labs © 2005 IBM Corporation 6 FDPR-Pro Optimization Options -RCReorder Code -bfBranch folding -bpBranch prediction bit setting -alignCode alignment -nop Eliminate nop instructions -uceUnreachable code elimination -hco_resched Hot/Cold instruction scheduling -RD, -build_dcg Static data reordering -tocload, -reduce_toc Tocload optimizations -si, -ipht, -ihf, -isf Aggressive function inlining options -ptrgl_optimization Optimize function calls via pointers -dcbt_optimizationInject data prefetching instructions -link_reg_optimization Eliminate stores/restore of link register -volatile_regsEliminate stores/restores using available volatile regs -killed_regs Eliminate stores/restores of killed registers -load_after_store Separate between frequent load and store to same address -loop_unroll Loop unrolling -stack_opt Reduce stack frame size of Hot functions -dceDead code elimination
IBM Haifa Labs © 2005 IBM Corporation CodeAnalyzer
IBM Haifa Labs © 2005 IBM Corporation 8 CodeAnalyzer - Motivation Architectures are becoming more complex Using only hardware simulators to detect information about potential performance bottlenecks in a given program is hard There is a need for performance tools that can statically analyze and visualize programs for a platform design, to be used by: Hardware architects Compiler writers Application developers
IBM Haifa Labs © 2005 IBM Corporation 9 CodeAnalyzer CodeAnalyzer is an eclipse plugin which performs comprehensive static analysis on given executable files and DLLs Relies on the FDPR-Pro tool for the analysis phase CodeAnalyzer displays the analyzed information together with profiling data collected by: tprof FDPR-Pro The code is then colored according to: Frequency counters - gathered by FDPR-Pro Hardware event ticks - gathered by tprof
IBM Haifa Labs © 2005 IBM Corporation 10 CodeAnalyzer – (continued) Provides several views of the input binary Assembly instructions Basic blocks Procedures CSECT modules control flow graph Hot loops Call graph Annotated source code Dispatch group formation Pipeline slots and functional units
IBM Haifa Labs © 2005 IBM Corporation 11 CodeAnalyzer – (continued)
IBM Haifa Labs © 2005 IBM Corporation 12 CodeAnalyzer – (continued)
IBM Haifa Labs © 2005 IBM Corporation 13 CodeAnalyzer – (continued)
IBM Haifa Labs © 2005 IBM Corporation 14 CodeAnalyzer – (continued)
IBM Haifa Labs © 2005 IBM Corporation 15 CodeAnalyzer – (continued)
IBM Haifa Labs © 2005 IBM Corporation 16 CodeAnalyzer – (continued)
IBM Haifa Labs © 2005 IBM Corporation 17 CodeAnalyzer – (continued)
IBM Haifa Labs © 2005 IBM Corporation 18 CodeAnalyzer – (continued)
IBM Haifa Labs © 2005 IBM Corporation 19 CodeAnalyzer – Performance Comments Performance comments displayed by CodeAnalyzer Comments which do not require profiling Pipeline stalls for the Power architecture Unreachable code and non-used data Profile-based comments Non-variant instructions within Hot loops Hot function calls proceeded by overwriting non-volatile registers Hot saves and restores of registers which could be relocated to cold spill areas Hot instructions that could be scheduled to colder areas in the code Removable hot branches Hot direct unconditional branches Hot direct conditional branches that are taken, which have a colder fallthru Hot call sites that are appropriate candidates for function inlining Hot call sites that are appropriate for function specialization Hot loops that are appropriate for loop unrolling Hot TOC load instructions that can be replaced by immediate add instructions
IBM Haifa Labs © 2005 IBM Corporation Performance Workbench (PerfWB)
IBM Haifa Labs © 2005 IBM Corporation 21 PerfWB CodeAnalyzer is part of the Performance Workbench (PerfWB) utility PerfWB is a collection of eclipse plugins that provide performance monitoring, tuning and analysis PerfWB consists of the following eclipse plugins: ProcMon - system-level monitoring tool for displaying system state and for monitoring running processes and threads E-Tune - visualizer of feedback information produced by tprof CodeAnalyzer – performance analyzer of executables and DLLs
IBM Haifa Labs © 2005 IBM Corporation 22 ProcMon
IBM Haifa Labs © 2005 IBM Corporation 23 E-Tune with CodeAnalyzer