Download presentation
Presentation is loading. Please wait.
Published byDerek Malone Modified over 8 years ago
1
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder Intel Mentors: Robert S. Cohn & C.K. Luk Internship at Intel MMDC
2
2 Motivation Most optimizers are “black-box” style –Limited ability for customization Provide more open API for optimization –Profiling, trace building, optimization, cache management –Include all of Pin API for instrumentation –Flexible, but hide low level details of JIT
3
3 Research Education –University of Colorado at Boulder Advanced Computer Architecture Code Generation And Optimization –Pin: A Binary Instrumentation Tool for Computer Architecture Research and Education (WCAE 2004) Potential Users
4
4 Pin Model Original Code A BC D FE A’ B’ E’ D’ Code Cache Pin Dispatcher Instrumented Code
5
5 ROGUE Model Original Code A BC D FE A’ C’ F’ D’ Code Cache Hot Path E D B A C Original Code F
6
6 How is ROGUE different from Pin? Pin –Instrumentation only –Fixed method for building traces –Application only executes out of code cache ROGUE –Optimization and profiling (instrumentation or hardware) –User defined trace building –Application executes a mix: Hot traces (code cache) Instrumented traces (code cache) Original program (program memory)
7
7 Dynamic Optimization Flow Perform runtime analysis –Hardware performance monitoring unit Branch Target Buffer –Software profilers BBL’s Edges Path Generate optimized code sequences Patch original code to execute optimized code Repeat the flow
8
8 ROGUE Model Original Code A BC D FE A’ C’ F’ D’ Code Cache Hot Path E D B A C Original Code F
9
9 Code Layout Profile information –Edge profiler –Path profiler Code fetching mechanism –Fetch a range of instructions, basic blocks etc. Perform optimizations
10
10 Collecting Profile Information A BC D FE Step 0: Instrument all edges INS_InsertCall(ins, IPOINT_TAKEN_BRANCH, (AFUNPTR) TakenBr, IARG_PTR taken_edg, … IARG_END); INS_InsertCall(ins, IPOINT_AFTER, (AFUNPTR) Fallthrough, IARG_PTR fallthrough_edg, … IARG_END); Pin Instrumentation
11
11 Code Fetching A BC D FE Step 1: Fetch the hot target basic block A 0x80abcdef BBL bbl = BBL_Fetch(0x80abcdef) Hot edge Use any threshold metric E.g.: Execution count threshold = 100
12
12 Trace Generation A BC D FE Step 2: Create a trace to hold the fetched bbl TRACE trace = TRACE_Alloc(bbl) A’
13
13 Trace Generation A BC D FE for( EDG edg = BBL_EdgHead(bbl); EDG_Valid(edg); edg = EDG_Next(edg) ) { … if (maxedg_cnt < cnt) { maxedg_cnt = cnt; maxedg = edg; } … } Step 3: Walk the flow graph
14
14 Trace Generation A BC D FE Step 4: Add the new hot edge target to trace bbl = TRACE_AddEdg(trace, bbl, maxedg); TRACE_AddInlineCallEdg TRACE_AddInlineReturnEdg TRACE_AddBranchEdg A’ C’
15
15 Trace Generation A BC D FE Step 5: Repeat Step 3 … Step 4 till Trace termination Probability Loopback Identification Max. number of instructions per trace …
16
16 Trace Generation Step 6: Finalize Trace generation TRACE_GenerateCode(trace) 1. Straighten Control Flow -Branch inversion, redundant branch elimination, handling call/return inlining and exit path fix-ups. 2. Compile the trace 3. Enter trace into code cache 4. Patch references to this trace -Any other edges that refer to the same target address can all be patched to refer to the new optimized trace A’ C’ F’ D’ E D B A C F
17
17 Example tool summary Runtime Optimization Guided Using Edges Trace Generation –Loop unrolling –Inline call and return paths Optimizations in the future –Eliminate redundant branches after code layout –Constant propagation –Dead code elimination –Constant Sub-expression Elimination –…
18
18 Fetch a basic block starting from an address Invoke when some threshold metric is reached Initialize a new trace with the fetched basic block Walk flowgraph to find the hot edge Add hot path instructions to trace Use probability as a trace termination metric Fix-up control flow, compile, patch & enter trace into cache VOID TraceGenerator(ADDRINT address) { EDG maxedg; UINT32 prob, sumedg_cnt; BBL bbl = BBL_Fetch(address); TRACE trace = TRACE_Alloc(bbl); while (prob > 0.4) { for (EDG edg = BBL_EdgHead(bbl); EDG_Valid(edg); edg = EDG_Next(edg)) { edg_cnt = EdgProfilerCount( EDG_BblSrc(edg), EDG_BblDst(edg) ); if (maxedg_cnt < edg_cnt) { maxedg = edg; maxedg_cnt = edg_cnt; } sumedg_cnt += edg_cnt; } bbl = TRACE_AddEdg(trace, bbl, maxedg); prob *= maxedg_cnt/sumedg_cnt; } TRACE_GenerateCode(trace); } A simple trace generator using ROGUE
19
19 ROGUE Optimization Comparison GCC 3.3.2 Opt. Level 3
20
20 ROGUE Optimization Comparison Intel Compiler
21
21 The ROGUE Vision Application Code Cache Re-Optimizations Optimized Traces Observe execution behavior Trace Generator Optimizer Cache Manager HW. Perf. Unit Phase Detector Instrumentation Monitor ROGUE
22
22 The ROGUE Vision (2) Dynamic Optimizer Interface –Trace Generator Control trace generation (path, size, thresholds…) –Monitor Register callbacks to trigger trace generation –Optimizer Provided with some standard optimizations Ability to write custom optimizations (add/delete instructions) –Cache manager Placement strategies of generated traces in the code cache Patching of original code use optimized code in code cache Dynamic Optimization Engine –Build a dynamic optimizer using the interface
23
23 ROGUE ROGUE Current Status Application Code Cache Re-Optimizations Optimized Traces Observe execution behavior Trace Generator Optimizer Cache Manager Functional Modules HW. Perf. Unit Phase Detector Instrumentation Monitor
24
24 ROGUE Summary Dynamic optimization framework –Facilitates the construction of customizable dynamic optimizers via high level abstraction –Tool for research and teaching API (Application Programmer Interface) –New API to perform dynamic optimizations –Inherits the complete PIN 2.0 API
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.