1 Java Bytecode Optimization Optimizing Java Bytecode for Embedded Systems Stefan Hepp.

Slides:



Advertisements
Similar presentations
Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
Advertisements

Virtual Machines Matthew Dwyer 324E Nichols Hall
1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
Compilation 2007 Code Generation Michael I. Schwartzbach BRICS, University of Aarhus.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Java Virtual Machine (JVM). Lecture Objectives Learn about the Java Virtual Machine (JVM) Understand the functionalities of the class loader subsystem.
1 1 Lecture 14 Java Virtual Machine Instructors: Fu-Chiung Cheng ( 鄭福炯 ) Associate Professor Computer Science & Engineering Tatung Institute of Technology.
JAVA Processors and JIT Scheduling. Overview & Literature n Formulation of the problem n JAVA introduction n Description of Caffeine * Literature: “Java.
Advice Weaving in AspectJ Alex Gontmakher. Outline Possible implementation approaches Quick JVM primer AJC implementation Performance Evaluation.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Aarhus University, 2005Esmertec AG1 Implementing Object-Oriented Virtual Machines Lars Bak & Kasper Lund Esmertec AG
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
CS 536 Spring Code generation I Lecture 20.
JVM-1 Java Virtual Machine Reading Assignment: Chapter 1: All Chapter 3: Sections.
Java Virtual Machine (JVM). Lecture Objectives Learn about the Java Virtual Machine (JVM) Understand the functionalities of the class loader subsystem.
JETT 2003 Java.compareTo(C++). JAVA Java Platform consists of 4 parts: –Java Language –Java API –Java class format –Java Virtual Machine.
Combining Static and Dynamic Data in Code Visualization David Eng Sable Research Group, McGill University PASTE 2002 Charleston, South Carolina November.
IPT Readings on Instrumentation, Profiling, and Tracing Seminar presentation by Alessandra Gorla University of Lugano December 7, 2006.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
Unit 061 Java Virtual Machine (JVM) What is Java Virtual Machine? The Class Loader Subsystem Linking oVerification oPreparation oResolution Class Initialization.
1 Further OO Concepts II – Java Program at run-time Overview l Steps in Executing a Java Program. l Loading l Linking l Initialization l Creation of Objects.
CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Code Generation Introduction. Compiler (scalac, gcc) Compiler (scalac, gcc) machine code (e.g. x86, arm, JVM) efficient to execute i=0 while (i < 10)
Dynamic Purity Analysis for Java Programs Haiying Xu, Christopher J.F. Pickett, Clark Verbrugge School of Computer Science, McGill University PASTE ’07.
David Evans CS201j: Engineering Software University of Virginia Computer Science Lecture 18: 0xCAFEBABE (Java Byte Codes)
CSC 8505 Compiler Construction IR Example: Java Bytecode (looking inside class files)
Java and C++, The Difference An introduction Unit - 00.
UNDER THE HOOD: THE JAVA VIRTUAL MACHINE Lecture 24 – CS2110 – Fall 2009.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
1 The Java Virtual Machine Yearly Programming Project.
1 October 1, October 1, 2015October 1, 2015October 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Java Bytecode What is a.class file anyway? Dan Fleck George Mason University Fall 2007.
Lecture 10 : Introduction to Java Virtual Machine
CSC 310 – Imperative Programming Languages, Spring, 2009 Virtual Machines and Threaded Intermediate Code (instead of PR Chapter 5 on Target Machine Architecture)
10/12/2015© Hal Perkins & UW CSEV-1 CSE P 501 – Compilers Java Implementation – JVMs, JITs &c Hal Perkins Winter 2008.
O VERVIEW OF THE IBM J AVA J UST - IN -T IME C OMPILER Presenters: Zhenhua Liu, Sanjeev Singh 1.
Real-Time Java on JOP Martin Schöberl. Real-Time Java on JOP2 Overview RTSJ – why not Simple RT profile Scheduler implementation User defined scheduling.
CSCE 2013L: Lab 1 Overview  Java Basics The JVM Anatomy of a Java Program  Object-Oriented Programming Overview  Example: Payroll.java JDK Tools and.
1 Introduction to JVM Based on material produced by Bill Venners.
Roopa.T PESIT, Bangalore. Source and Credits Dalvik VM, Dan Bornstein Google IO 2008 The Dalvik virtual machine Architecture by David Ehringer.
Compiler Construction
A Time Predictable Instruction Cache for a Java Processor Martin Schoeberl.
An Efficient Stack Machine Martin Schöberl. JOP Stack Architecture2 Overview JVM stack machine Parameter passing Stack access patterns Common stack caches.
A Java Compiler Overview. October 21, 2003Shane A. Brewer2 Who Am I? Shane A. Brewer Masters Graduate.
Java Virtual Machine Case Study on the Design of JikesRVM.
Java Basics Opening Discussion zWhat did we talk about last class? zWhat are the basic constructs in the programming languages you are familiar.
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 18: Code Safety and Virtual Machines
Jun 14, 2004RAM-SE'04 Workshop, Oslo, Norway 1 Negligent Class Loaders for Software Evolution Yoshiki Sato, Shigeru Chiba (Tokyo Institute of Technology.
Procedures and Functions Procedures and Functions – subprograms – are named fragments of program they can be called from numerous places  within a main.
UNDER THE HOOD: THE JAVA VIRTUAL MACHINE II CS2110 Fall 200 Lecture 25 1.
Reference Types CSE301 University of Sunderland Harry R Erwin, PhD.
RealTimeSystems Lab Jong-Koo, Lim
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
CS216: Program and Data Representation
Inheritance, Polymorphism and the Object Memory Model
2.1. Compilers and Interpreters
Multi-Dispatch in the Java™ Virtual Machine
Java Virtual Machine (JVM)
CSc 453 Interpreters & Interpretation
Inlining and Devirtualization Hal Perkins Autumn 2011
Inlining and Devirtualization Hal Perkins Autumn 2009
Byte Code Verification
Course Overview PART I: overview material PART II: inside a compiler
Java Implementation – JVMs, JITs &c Hal Perkins Summer 2004
Lecture 4: Instruction Set Design/Pipelining
CSc 453 Interpreters & Interpretation
Presentation transcript:

1 Java Bytecode Optimization Optimizing Java Bytecode for Embedded Systems Stefan Hepp

2 Overview ■ Toolchain JOP, JIT vs. ahead-of-time compilation ■ Existing open source tools ■ JOPtimizer framework and code representations ■ Inlining ■ Results

3 Toolchain Overview ■ Sourcecode compiled with javac to Java bytecode ■ Optimization defered to JVM, profiling information and JIT compiler is used ■ Not feasable on embedded processors like JOP

4 Toolchain Overview ■ Ahead-of-timeoptimization needed ■ Optimization of bytecode for target platform ■ Output is Java bytecode ■ Profiling vs. static WCET

5 Toolchain Overview ■ Advantages over JIT  Runtime is not critical  No warm-up phase to gather profiling information and to do JIT compiling ■ Disadvantages  Less accurate/no profiling information available at design-time, class hierarchy may change dynamically  Target platform must be known

6 Existing Tools ■ Soot framework looks promising, but not designed for embedded systems and very complex ■ Other open source tools usually only remove unused methods and obfuscate code

7 JOPtimizer ■ JOPtimizer: a new framework for optimizations  Intermediate code representations  Inlining which respects method size restrictions

8 Assumtions ■ Assumptions about embedded applications  No dynamic class loading or class modifications at runtime  Reflection is not used  All class files are available at compile-time (except native classes) ■ Allows more optimizations (but assumtions can be disables) ■ Exclude “library” code (like java.*)  Define: library classes must not extend/reference application classes

9 Java Class Files ■ A class consists of:  ConstantPool: indexed table of constants (numbers, Strings, class names, method names, signatures,.. )  Classname, super-class, interfaces (references to CP)  Fields, methods: name, signature, flags  Method code as attribute of methods  Stack architecture with variable length encoding ■ Parsing and compiling of classfiles done by existing Libraries (BCEL, ASM,...)

10 The JVM Instruction Set ■ (partially) typed stack instructions ■ 32bit (int, float, reference, byte, short,..) and 64bit (long, double) variables ■ exception-handling, synchronization, subroutines ■ Stack- and variable table entries always 32bit ■ No indirect jumps, stack size must be static private Map m; private void test(int i) { int j[] = new int[2]; float a = 2.0f; j[0] = i * (int) a; m.put(this, j); } private test(I)V ICONST_2 NEWARRAY T_INT ASTORE 2 FCONST_2 FSTORE 3 ALOAD 2 ICONST_0 ILOAD 1 FLOAD 3 F2I IMUL IASTORE ALOAD 0 GETFIELD #4 ALOAD 0 ALOAD 2 INVOKEINTERFACE #7 POP RETURN

11 Stackcode Representation ■ Internal representation (“stackcode”)  Types and constant values as parameters of instructions to reduce number of different instructions (~40 stackcode instructions)  Stack emulation to determine operand types for all instructions (swap, dup,..)  Variables and types instead of 32-bit slots  Constant values instead of references into CP  split basic blocks at exception handler ranges too ■ Still a stack architecture ■ Stackcode can be mapped directly to bytecode (allows analysis of code size and execution time)

12 Quadcode Representation ■ Stack creates implicit dependencies between instructions and blocks, makes optimizations more complex ■ Quadruple form of code (“quadcode”)  Create local variable per stack slot, emulate stack to determine the arguments of instructions  Instructions with types and constants as parameter  Instructions to manupulate stack not needed (pop) or replaced with copy instructions (load, swap, dup,..)

13 Quadcode Representation ■ Quadcode representation enables simpler implementation of optimizations, but code cannot be mapped to bytecode directly ■ Stackcode and Quadcode similar to Soot internal representations (Baf, Jimple, Shimple) public int calc(int a, int b) { copy.ref s0, l0 // load.ref l0// aload_0 getfield.'Test.fField' s0, s0 // getfield 'Test.fField'// getfield #3 copy.float s1, 2.0f // push.float 2.0f// fconst_2 binop.float.div s0, s0, s1 // binop.float.div// fdiv copy.float l3, s0 // store.float l3// fstore_3 copy.int s0, l1 // load.int l1// iload_1 return.int s0 // return.int// ireturn }

14 Creation of Bytecode ■ Transformation back from quadcode to bytecode  Create complete expressions from instructions (“Decompile” code), compile expression trees to JVM instructions like javac (Soot does this (Grimp))  Create stack form of quadruple instructions, compile to bytecode (JOPtimizer does this, optional in Soot)  Per quadcode instruction: load parameters on stack, execute operation and store result back ■ load/store elimination and local variable allocation for stackcode needed before bytecode can be created ■ Decompilation method of Soot gets slightly better results

15 Inlining ■ Invocations are expensive on JOP ■ Inline methods to eliminate invokation overhead ■ Inlining is not always possible  Callee code restrictions  Code size and variable table size restrictions of JOP ■ Inlining comes at a price  Caller code size increases, makes caller cache miss more costly  Overall program size increases if callee is not removed (p.e. is called somewhere else)

16 Inlining methods 1. Traverse callgraph bottom-up (leaves first) 2. Find and devirtualize invocations  static, final, private invokations not virtual  Check class hierarchy for overloading methods 3. Check if inlining is admissible 4. Estimate gain 5. Replace invocation with copy of callee  insert nullpointer-check for callee class reference  map local variables of callee above caller variables

17 Inlining Checks ■ Inlining is not possible if  new code size or variable table size of caller exceeds platform limits  the callee uses exception handlers or synchronized code  throwing an exception clears the stack  stack of the caller needs to be saved and restored if an exception is handled within the inlined method (NYI in JOPtimizer)  the method or class is excluded from inlining by configuration (caller or callee, p.e. Native class)

18 Inlining Checks (cont.) ■ Check field- and method references in callee code  Must be accessible from caller  Else make field or method public if possible  Always possible for fields as they are not virtual in Java  All overloading methods must be made public too  If a private method is made public, all invocations have to be changed from invokespecial to invokevirtual (luckily only methods of callee class have to be searched)  Naming conflicts or dynamic class loading can prevent changes, thus preventing inlining class A public a() tmp = new C() invoke tmp.b() class B public b() if (v == null) invoke B.c() private c() class C extends B private c()

19 Inlining Checks (cont.) ■ Estimate gain of inlining  Depends on cache state  Possible degredation of performance if inlined method is seldom invoked  Calculate gain based on invocation frequency and cache state estimations  Decrease weight of callees with multiple call sites to reduce increase of application code size  Select method with highest (positive) weight for inlining ■ Add inlined invocations to inlining candidate list, repeat inlining (check with new codesize)

20 Benchmark Results ■ Inlining of stackcode, ■ Inlining limited by maximal code size imposed by JOP's memory cache ■ Removing of unused code should be implemented

21 Inlining Improvements ■ Many improvements possible  Type analysis/callgraph thinning for better devirtualization  Better cache state and invocation frequency estimation (WCET-driven?)  Run optimizations to reduce code size prior to inlining  Allow inlining of synchronized code/exception handlers  Try to find invocations with highest gain application-wide ...

22 Summary ■ Optimizing code at runtime not feasible for (realtime) embedded systems ■ Existing open source tools not designed for embedded systems ■ Inlining implemented in JOPtimizer which takes target platform into account (code restrictions, caching,..), up to 14% speedup of JBE benchmark ■ load/store elimination and local variable allocation needed for further optimizations to be implemented ■ Still many improvements possible..

23 Q&A Thanks for your attention! Questions?

24 Transformations