Trace-based Just-in-Time Type Specialization for Dynamic Languages

Trace-based Just-in-Time Type Specialization for Dynamic Languages
Andreas Gal et. al. Presented by James Perretta and C Jones

Just in Time Compilation

Dynamic Languages: Trade-offs
Provide expressive high-level abstractions Programmer doesn’t have to wait for code to compile Type information unknown until runtime Lots of runtime type checks Hard to optimize

Just-in-time Compilation
Type information is present at runtime Idea: Compile frequently-used code at runtime Type info lets us optimize Take advantage of profiling data Generate native machine code Trade runtime compilation overhead for faster code Trade runtime comp. overhead for overall faster code

Trace Specialization Assumes most time is spent in a few tight loops
Observation: Most loops are “type stable” function min(array) { let found_min = null; // type should change at most once for (let item of array) { // item most likely type-stable if (found_min === null || item < found_min) { found_min = item; } return found_min;

Trace Specialization High-level process: Can span multiple functions
Record frequent paths (trace) Compile to native code Perform classical optimizations Can span multiple functions Inlines small functions for free

Trace Recording

TraceMonkey Architecture

Baseline Bytecode Interpreter
function min(array) { let found_min = null; for (let item of array) { if (found_min === null || item < found_min) { found_min = item; } return found_min; min([1, 2, 3]); min([4, 5, 6]); SetNull found_min GetIterator iter, array loop: IteratorNext item, JumpStrictEq found_min, JumpLt item, assign: Set found_min, item done: Return found_min Interpreters based on “fat bytecodes,” make for more efficient interpreter, but less opportunity for optimization This isn’t a real bytecode format, just for demonstration purposes

Bytecode Traces SetNull found_min GetIterator iter, array loop: IteratorNext item, JumpStrictEq found_min, null, @assign JumpLt item, assign: Set found_min, item done: Return found_min The method described in the paper records traces in loops, starting at a hot threshold on loop headers. (Their heuristic for a hot edge is n=2!) In this case, we’d begin recording in the middle of the search, since the loop wouldn’t initially be hot.

Record Traces SetNull found_min GetIterator iter, array loop: IteratorNext item, JumpStrictEq found_min, null, @assign JumpLt item, assign: Set found_min, item done: Return found_min loop: (found_min: int, array: array) Guard type(array) is array ArrayNext item, Guard type(found_min) is int Guard type(item) is int JumpLt item, This recorded trace is specialized to the types that were present when it was recorded Notice that we can specialize the IteratorNext instruction into an array-specific instruction that’s faster Other similar high impact optimizations include “object shapes” avoiding property lookups, etc. Here I show the recorded trace as similar fat bytecode, the actual implementation in the paper uses a lower-level SSA bytecode for optimization

Record Trace Trees SetNull found_min GetIterator iter, array loop: IteratorNext item, JumpStrictEq found_min, null, @assign JumpLt item, assign: Set found_min, item done: Return found_min loop: (found_min: int, array: array) Guard type(array) is array ArrayNext item, Guard type(found_min) is int Guard type(item) is int JumpLt item, loop2: (found_min: int, array: array) Set found_min, item The trace contains no control flow joins, but it can contain splits. Here, when the other side-exit branch of the if becomes hot, we record another trace that splits off the first one, constructing a trace tree

Record Multiple Type Combinations
loop: (found_min: null, array: array) Guard type(array) is array ArrayNext item, Guard found_min is null Set found_min, item SetNull found_min GetIterator iter, array loop: IteratorNext item, JumpStrictEq found_min, null, @assign JumpLt item, assign: Set found_min, item done: Return found_min loop: (found_min: int, array: array) Guard type(array) is array ArrayNext item, Guard type(found_min) is int Guard type(item) is int JumpLt item, loop2: (found_min: int, array: array) Set found_min, item The first pass through the loop might get recorded if the function is called again, now that the loop is hot, we can record the loop with a different type combination, with the found_min initially null In this case, the effect is emergently similar to loop peeling

Trace Forests Try to connect traces as we generate them
Lets groups of traces execute w/o exiting to interpreter Loops in (a) and (c) eventually stabilize (b) doesn’t stabilize, but still stays within trace group Trace Forests

Nested Trace Trees

Nested Loops: Naive Solution
Treat loop branches the same as other branches Inner loops become hot first When inner loop exits, tries to record branch trace Inner loop trace contains outer loop code! Requires copy of outer loop for every side exit

Nested Loops: Better Solution
Record separate traces for outer and inner loops Stop extending inner loop trace when outer loop reached Outer loop header starts its own trace When outer loop reaches inner loop, try to call inner loop trace Builds nested trace tree

Nested Trace Tree Inner tree (t2) captures inner loop (i2)
Outer loop (t1) calls inner loop (t2) Inner tree returns to outer tree at exit guard Nested Trace Tree

Outer tree (t1) calls two nested trees (t2 and t4)
Nested trees have guards for side exits Multiple Nested Loops

Evaluation

speedup vs a baseline interpreter (SpiderMonkey)
SunSpider Benchmarks, speedup vs a baseline interpreter (SpiderMonkey) Almost always faster than baseline Slower than other optimizers in cases that don’t exercise the loops as heavily, wastes time on recording

Performs best on programs that do bit manipulation
Does not trace recursion (see controlflow-recursive) Does not trace eval (see date-format-xparb) and certain functions implemented in C, or through regex operations Less effective for very small nested loops, heavy use of builtins

Complexities and Limitations
Needs to reconstruct call frames Transfer locals back to interpreter activation records Different traces have different activation record layouts, use trace stitching to optimize Keep track of what calls were made so that you can reconstruct the call stack if you take a side exit, since you’re leaving the inlined path into an out-of-line path When returning to the interpreter, you need to copy all of the variables out of registers and back into place Similarly, you might need to add copy operations when directly calling into other traces in cases where register and stack allocations don’t match up, trace stitching can allow re-compiling those to reduce the copies

Complexities and Limitations
Abort un-recordable data (i.e. recursion, exceptions) Blacklisting aborted traces Foreign function interface (FFI) calls Trace specialization only currently used LuaJit Authors claim exceptions not very common in JS programming Don’t want to waste time re-recording things that we know won’t work Traces don’t update interpreter state until they exit Globals and call stack might be out of date Need to re-materialize interpreter state to interact with People used to think that trace specialization would win, but the complexities outweigh the performance benefits

Thank You!

Trace-based Just-in-Time Type Specialization for Dynamic Languages

Similar presentations

Presentation on theme: "Trace-based Just-in-Time Type Specialization for Dynamic Languages"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Trace-based Just-in-Time Type Specialization for Dynamic Languages

Similar presentations

Presentation on theme: "Trace-based Just-in-Time Type Specialization for Dynamic Languages"— Presentation transcript:

Similar presentations

About project

Feedback