Lecture 9 Dynamic Compilation

Slides:

Advertisements

Similar presentations

Part IV: Memory Management

Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

Chapter 6: Memory Management

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.

Integrity & Malware Dan Fleck CS469 Security Engineering Some of the slides are modified with permission from Quan Jia. Coming up: Integrity – Who Cares?

Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.

Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.

Aarhus University, 2005Esmertec AG1 Implementing Object-Oriented Virtual Machines Lars Bak & Kasper Lund Esmertec AG

Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001.

Previous finals up on the web page use them as practice problems look at them early.

JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.

1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.

Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Lecture 10 : Introduction to Java Virtual Machine

Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08.

CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.

IT253: Computer Organization

Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel.

Code Optimization 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a.

CE Operating Systems Lecture 14 Memory management.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.

1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”

Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.

Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.

NETW3005 Memory Management. Reading For this lecture, you should have read Chapter 8 (Sections 1-6). NETW3005 (Operating Systems) Lecture 07 – Memory.

A Region-Based Compilation Technique for a Java Just-In-Time Compiler Toshio Suganuma, Toshiaki Yasue and Toshio Nakatani Presenter: Ioana Burcea.

CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.

Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.

CS703 - Advanced Operating Systems By Mr. Farhan Zaidi.

Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.

CS161 – Design and Architecture of Computer

Code Optimization.

Memory Management.

Segmentation COMP 755.

Memory COMPUTER ARCHITECTURE

Chapter 8: Main Memory.

CS703 - Advanced Operating Systems

Outline Paging Swapping and demand paging Virtual memory.

Chapter 9: Virtual Memory

Optimizing Compilers Background

Compositional Pointer and Escape Analysis for Java Programs

CPSC 315 – Programming Studio Spring 2012

Run-time organization

idempotent (ī-dəm-pō-tənt) adj

The Simplest Heuristics May Be The Best in Java JIT Compilers

CSCI1600: Embedded and Real Time Software

Evolution in Memory Management Techniques

Chapter 9: Virtual-Memory Management

Optimizing Transformations Hal Perkins Autumn 2011

Portability CPSC 315 – Programming Studio

Inlining and Devirtualization Hal Perkins Autumn 2011

Inlining and Devirtualization Hal Perkins Autumn 2009

Correcting the Dynamic Call Graph Using Control Flow Constraints

Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.

Operating System Chapter 7. Memory Management

Trace-based Just-in-Time Type Specialization for Dynamic Languages

CSE451 Virtual Memory Paging Autumn 2002

CS703 - Advanced Operating Systems

Lecture 4: Instruction Set Design/Pipelining

COMP755 Advanced Operating Systems

CSc 453 Final Code Generation

Procedure Linkages Standard procedure linkage Procedure has

CSCI1600: Embedded and Real Time Software

CSc 453 Interpreters & Interpretation

Dynamic Binary Translators and Instrumenters

Just In Time Compilation

Presentation transcript:

Lecture 9 Dynamic Compilation Motivation & Background Overview Compilation Policy Partial Method Compilation Partial Dead Code Elimination Escape Analysis Results “Partial Method Compilation Using Dynamic Profile Information”, John Whaley, OOPSLA 01 Advanced Compilers L9. Dynamic Compilation

I. Goals of This Lecture Beyond static compilation Example of a complete system Use of data flow techniques in a new context Experimental approach Advanced Compilers L9. Dynamic Compilation

Static/Dynamic Compiler: high-level  binary, static Interpreter: high-level, emulate, dynamic Dynamic compilation: high-level  binary, dynamic machine-independent, dynamic loading cross-module optimization Specialize program using runtime information (without profiling) Interpreter: matlab, Binary translator: Advanced Compilers L9. Dynamic Compilation

High-Level/Binary Binary translator: Binary-binary; mostly dynamic Run “as-is” Software migration (x86  alpha, sun, transmeta; 68000  powerPC  x86) Virtualization (make hardware virtualizable) Dynamic optimization (Dynamo Rio) Security (execute out of code in a cache that is “protected”) Interpreter: matlab, Binary translator: Advanced Compilers L9. Dynamic Compilation

Closed-world vs. Open-world Closed-world assumption (most static compilers) All code is available a priori for analysis and compilation. Open-world assumption (most dynamic compilers) Code is not available; arbitrary code can be loaded at run time. Open-world assumption precludes many optimization opportunities. Solution: Optimistically assume the best case, but provide a way out if necessary. Advanced Compilers L9. Dynamic Compilation

II. Overview in Dynamic Compilation Interpretation/Compilation policy decisions Choosing what and how to compile Collecting runtime information Instrumentation Sampling Exploiting runtime information frequently-executed code paths Advanced Compilers L9. Dynamic Compilation

Speculative Inlining Virtual call sites are deadly. Kill optimization opportunities Virtual dispatch is expensive on modern CPUs Very common in object-oriented code Speculatively inline the most likely call target based on class hierarchy or profile information. Many virtual call sites have only one target, so this technique is very effective in practice. Advanced Compilers L9. Dynamic Compilation

III. Compilation Policy ∆Ttotal = Tcompile – (nexecutions * Timprovement) If ∆Ttotal is negative, our compilation policy decision was effective. We can try to: Reduce Tcompile (faster compile times) Increase Timprovement (generate better code) Focus on large nexecutions (compile hot spots) 80/20 rule: Pareto Principle 20% of the work for 80% of the advantage Advanced Compilers L9. Dynamic Compilation

Latency vs. Throughput Tradeoff: startup speed vs. execution performance Startup speed Execution performance Interpreter Best Poor ‘Quick’ compiler Fair Optimizing compiler Advanced Compilers L9. Dynamic Compilation

Multi-Stage Dynamic Compilation System interpreted code Stage 1: when execution count = t1 (e.g. 2000) compiled code Execution count is the sum of method invocations & back edges executed. Stage 2: Our dynamic compilation system uses a three stage system. In Stage 1, all methods are interpreted. When the execution count for a method exceeds a certain threshold, t1, we transition to Stage 2, a compiled version. When the execution count hits another threshold, t2, we transition to Stage 3, a fully optimized version of the method. when execution count = t2 (e.g. 25000) fully optimized code Stage 3: Advanced Compilers L9. Dynamic Compilation

Granularity of Compilation Compilation takes time proportional to the amount of code being compiled. Many optimizations are not linear. Methods can be large, especially after inlining. Cutting inlining too much hurts performance considerably. Even “hot” methods typically contain some code that is rarely or never executed. All of the previous techniques compile a method-at-a-time. However, when we compile a method at a time, we really don’t have much choice in the matter. Methods can be large, especially after performing inlining, and cutting inlining too much can hurt performance considerably. Even when we are frugal about inlining, methods can still become very large. Advanced Compilers L9. Dynamic Compilation

Example: SpecJVM db Hot loop void read_db(String fn) { int n = 0, act = 0; byte buffer[] = null; try { FileInputStream sif = new FileInputStream(fn); buffer = new byte[n]; while ((b = sif.read(buffer, act, n-act))>0) { act = act + b; } sif.close(); if (act != n) { /* lots of error handling code, rare */ } catch (IOException ioe) { Hot loop Here’s an example from the SpecJVM db benchmark. This method contains a hot loop which contains a method call. However… Advanced Compilers L9. Dynamic Compilation

Example: SpecJVM db Lots of rare code! void read_db(String fn) { int n = 0, act = 0; byte buffer[] = null; try { FileInputStream sif = new FileInputStream(fn); buffer = new byte[n]; while ((b = sif.read(buffer, act, n-act))>0) { act = act + b; } sif.close(); if (act != n) { /* lots of error handling code, rare */ } catch (IOException ioe) { Lots of rare code! This method also contains lots of error handling code that is rarely or never executed. The “read” method in the inner loop also itself contains hot and not-hot portions. Advanced Compilers L9. Dynamic Compilation

Optimize hot “regions”, not methods Optimize only the most frequently executed segments within a method. Simple technique: any basic block executed during Stage 2 is said to be hot. Beneficial secondary effect of improving optimization opportunities on the common paths. The problem is that the regions that are important to compile have nothing to do with the method boundaries. By using a method granularity, the compiler wastes a lot of time compiling and optimizing large pieces of code that do not matter. Advanced Compilers L9. Dynamic Compilation

Method-at-a-time Strategy % of basic blocks compiled Check is an anomaly because it is a code just to check the correctness of the JVM – not indicative of real programs. Mtrt – raytracer – tight innermost loop Javac is real. Now let’s compare the performance of a method-at-a-time strategy with an optimal strategy. This graph shows the percentage of basic blocks that are compiled using a method-at-a-time strategy at different compilation threshold values. The vertical axis gives the percentage of basic blocks that are compiled, and the horizontal axis gives the execution thresholds. Notice that the horizontal axis is on a semi-log scale. At threshold one, all basic blocks in all methods that have ever been executed are compiled, so the value is one hundred percent. At threshold 5000, the percentage ranges from about 3% up to about 75%, depending on the application. execution threshold Advanced Compilers L9. Dynamic Compilation

Actual Basic Blocks Executed % of basic blocks Now let’s take a look at how many basic blocks are actually executed at different threshold values. The first thing to notice is that even at threshold one, on average only slightly more than half of the basic blocks are ever executed. As the threshold value increases, the percentage continues to drop. This shows that a large amount of code is rare, even in hot methods. Note that because the horizontal axis is a log scale, we see a rather gentle descent. If this were graphed on a linear scale, the dropoff would be much more sharp. execution threshold Advanced Compilers L9. Dynamic Compilation

Dynamic Code Transformations Compiling partial methods Partial dead code elimination Escape analysis Advanced Compilers L9. Dynamic Compilation

IV. Partial Method Compilation Based on profile data, determine the set of rare blocks. Use code coverage information from the first compiled version First, based on profile data, we determine the set of rare blocks. We use the code coverage information I described earlier from the Stage 2. Any basic block that is never executed in Stage 2 is said to be rare. Advanced Compilers L9. Dynamic Compilation

Partial Method Compilation Perform live variable analysis. Determine the set of live variables at rare block entry points. live: x,y,z Next, before we perform any code transformations or optimizations, we perform live variable analysis on the code. We determine the set of live variables at each of the rare block entry points. Advanced Compilers L9. Dynamic Compilation

Partial Method Compilation Redirect the control flow edges that targeted rare blocks, and remove the rare blocks. to interpreter… Third, we redirect the control flow edges that targeted the rare blocks, and then remove the rare blocks as unreachable code. Advanced Compilers L9. Dynamic Compilation

Partial Method Compilation Perform compilation normally. Analyses treat the interpreter transfer point as an unanalyzable method call. Next, we perform compilation normally. All of the analyses and optimizations simply treat the interpreter transfer point as an unanalyzable method call. One of the most attractive aspects of this technique is that it doesn’t require any changes to any of the existing optimizations and analyses. Advanced Compilers L9. Dynamic Compilation

Partial Method Compilation Record a map for each interpreter transfer point. In code generation, generate a map that specifies the location, in registers or memory, of each of the live variables. Maps are typically < 100 bytes live: x,y,z x: sp - 4 Next, in code generation, we record a map for each interpreter transfer point. This map specifies the location, in registers or memory, where the live variable can be found. These maps are typically less than 100 bytes each. y: R1 z: sp - 8 Advanced Compilers L9. Dynamic Compilation

V. Partial Dead Code Elimination Move computation that is only live on a rare path into the rare block, saving computation in the common case. The first optimization is partial dead code elimination. We made a slight modification to our existing dead code elimination algorithm to treat rare blocks specially. We move computation that is only live on a rare path into the rare block. This saves computation in the common case. Advanced Compilers L9. Dynamic Compilation

Partial Dead Code Example if (rare branch 1){ ... z = x + y; } if (rare branch 2){ a = x + z; if (rare branch 1) { x = 0; ... z = x + y; } if (rare branch 2) { a = x + z; Here is a really simple example. X is defined on the main path, but only used in rare paths. Therefore, the algorithm will copy the instruction to the rare blocks … Advanced Compilers L9. Dynamic Compilation

VI. Escape Analysis Escape analysis finds objects that do not escape a method or a thread. “Captured” by method: can be allocated on the stack or in registers. “Captured” by thread: can avoid synchronization operations. All Java objects are normally heap allocated, so this is a big win. We also modified our pointer and escape analysis to support rare block information. Now, treating an entrance to the rare path as a method call is a conservative assumption. The method call could perform arbitrary operations, much more than the original code. Now typically, this does not matter because there are no merges Advanced Compilers L9. Dynamic Compilation

Escape Analysis Stack allocate objects that don’t escape in the common blocks. Eliminate synchronization on objects that don’t escape the common blocks. If a branch to a rare block is taken: Copy stack-allocated objects to the heap and update pointers. Reapply eliminated synchronizations. Advanced Compilers L9. Dynamic Compilation

VII. Run Time Improvement This graph shows the run time improvement by using our technique. Each benchmark has three bars. The first bar is the original run time; these are all normalized to 100%. The second bar shows the run time using our partial method compilation technique. The third bar adds the two optimizations that I described earlier. The partial method compilation technique by itself improves run times by an average of 5%. Adding the two optimizations I described earlier improves First bar: original (Whole method opt) Second bar: Partial Method Comp (PMC) Third bar: PMC + opts Bottom bar: Execution time if code was compiled/opt. from the beginning Advanced Compilers L9. Dynamic Compilation