Download presentation
Presentation is loading. Please wait.
Published byJoel Hancock Modified over 8 years ago
1
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
2
Motivation General choices for adaptive optimization ADAPT The Architecture The Language An example Results
3
There’s only so much optimization that can be performed at compile-time. Have to generate code for generic system models – make compile-time assumptions that may be sensitive to input, unknown till runtime. Convergence of technologies – difficult to generate common binary to exploit individual system characteristics.
4
Possible solution? “Use of adaptive and dynamic optimization paradigms, where optimization is performed at runtime when complete system and input knowledge is available.”
5
Choose from statically generated code- variants +Easy -May not result in max possible optimization -Can result in code explosion Parameterization +Single copy of source -May still not result in max possible optimization Dynamic compilation +Complete input and system knowledge – max optimization possible -Considerable runtime overhead
6
Automated De-Coupled Adaptive Program Optimization Generic framework, which leverages existing tools Uses a domain-specific language, AL, by which adaptive techniques can be specified …
7
Supports dynamic compilation and parameterization Enables optimizations through “runtime sampling” Facilitates an iterative modification and search approach
8
3 functions of a dynamic/adaptive optimization system Evaluate effectiveness of particular optimization for current input & system information Apply optimization if profitable Re-evaluate applied optimizations and tune according current runtime conditions
10
Runtime system consists of: Modified version of application Remote optimizer has source code description of target machine stand-alone tools & compilers Local optimizer agent of remote-optimizer on system detects hot-spots tracks multiple interval contexts (here, loop bounds) runs in separate thread Optimization and execution truly asynchronous
11
LO invokes RO, when hotspot detected RO tunes the interval using available tools, according to user-specified heuristics RPC returns If new code available, dynamically link to application as the new best/experimental version, depending on RO’s message
13
Candidate code sections have 2 control flow paths through best known version through experimental version Each of these can be replaced dynamically Flag indicates which version to execute Monitor experimental versions of each context collected data used as feedback if better, swap with best known version
14
Optimization process outside critical path/decoupled from execution
15
ADAPT Language (AL) * Features: Uses an LL1 grammar => simple parser Domain specific language with C-style format Defines reserved words that at runtime contain useful input data and system information * “A full description of ADAPT language is beyond the scope of this paper”, and by extension, this presentation.
17
Initialize some variables Constraints Interface to tool to be used This block defines the heuristic
18
StatementDescription constraint(compile- time constraint) Supplies a compile-time constraint apply_spec (condition,type, syntax[,params]) A description of a tool or flag collect (event list) execute; Initiates the monitoring of an experimental code version mark_as_best Specifies that the code variant that would be generated under the current runtime conditions is a new best known version end_phase Denotes the end of an optimization phase
19
Test Machines: 6 core Sun ULTRA Enterprise 4000, single-core Pentium II Linux workstation ExperimentResult Useless Copying - Run a dynamically compiled version of code without applying any optimization Less than ~5% Some cases show a speed-up! Specialization – Loop bounds replaced as constants by their runtime value. Average improvement: E4000: 13.6% Pentium: 2.2% Flag Selection – Experiment with various combinations of compiler flags Average improvement: E4000: 35% Pentium: 9.2% Identified some non-intuitive choices Loop Unrolling – Loop unrolled by factors that evenly divide no. of iterations of innermost loop to a maximum factor of 10. Average improvement: E4000: 18% Pentium: 5% Loop Tiling – Loops deemed appropriate tiled for ½, ¼,.., 1 / 16 of L2 cache size Average improvement: E4000: 13.5% Pentium: 9.8% Parallelization – Loops deemed appropriate by Polaris parallelized Average improvement: E4000: 51.8%
20
There’s advantage in doing runtime optimization Can be applied to general-purpose programs as well For full-blown runtime optimization, need to move optimization process outside the critical path
21
if (questions(“?!”) == 1) delay(); THANK_YOU(“Have a great weekend!”);
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.