Download presentation
Presentation is loading. Please wait.
1
1 Feedback-directed optimizations with estimated edge profiles from hardware event sampling Open64 workshop, CGO 2008 April 6, 2008 Vinodha Ramasamy, Robert Hundt Google Inc., Dehao Chen, Wenguang Chen Tsinghua University
2
2 Background Traditional FDO model: Instrument – Run – Recompile Usage Model Difficulties in generating representative training datasets High overhead of profile collection Requires dual-compilation - tightly coupled builds Benefits Supports both value and edge profiling High performance potential cd INSTRUMENTED BINARY INSTRUMENTATION BUILD PROFILE DATA FDO BUILD OPTIMIZED BINARY TRAINING DATA
3
3 Overview Our methodology Skip the instrumentation step Use INST_RETIRED event samples for feedback Source position information used to correlate samples to basic blocks Generate traditional edge profiles from basic block samples Feedback data stored in same data structures as instrumented FDO Leverage feedback-directed optimizations, validation and propagation OPTIMIZED BINARY FDO BUILD Input Data SAMPLE PROFILE
4
4 Algorithm Basic block counts Scale samples per source line by # of instructions Samples per source line stored in profile datafile Annotate IR statements in basic blocks with source line sample counts Scale basic block sample count BB.count = (∑ IR.count) / num_IR_stmts IR1 = 70 IR2 = 10 IR3 = 70 IR4 = 0 IR5 = 0 ∑IR.count = 70 + 10 + 70 + 0 + 0 = 150 BB.count = 150 ÷ 5 = 30 pbla.c:60 iplus = iplus->pred; // 280 ÷ 4 = 70 100 : 804a8b7: mov 0x10(%ebp),%eax 30 : 804a8ba: mov 0x8(%eax),%eax 70 : 804a8bd: mov %eax,0x10(%ebp) 80 : 804a8c0: jmp 804a94b
5
5 Edge frequency estimation Edge counts from basic block counts Uses higher level program structure - branch, loop etc., Recursive algorithm used to smooth sample counts ENTRY: 0 BODY: 0 BODY: 7954 ENTRY: 500 500 BR: 7954 NT: 30T: 7922 JOIN: 420 EXIT: 0 BACK: 0 BR: 7954 → NT: 32T: 7922 JOIN: 7954 EXIT: 500 BACK: 7454
6
6 Challenges Inaccuracies inherent to sampling Source position information issues Missing information due to optimization transformations Disambiguating samples per source line if (cond) {stmt1; stmt2;} Edge estimation heuristics Evaluate algorithm proposed by Levin et. al. Inlining Annotate early inlined functions with scaled sample counts
7
7 Results SPEC2006 C benchmarks Intel Core-2 platform using 64-bit binaries -O2 FDO with instrumented runs 4–5% gain over default –O2 runs -O2 FDO with sampled profiles Profile collection using –O2 binaries ~60% of FDO instrumented gain
8
Google Confidential and Proprietary 8 Thank You! Q&A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.