Compilation 2007 Code Generation Michael I. Schwartzbach BRICS, University of Aarhus
2 Code Generation Code Generation Phases Computing resources, such as: layout of data structures offsets register allocation Generating an internal representation of machine code for statements and expressions Optimizing the generated code (ignored for now) Emitting the code to files in assembler format Assembling the emitted code to binary format
3 Code Generation Joos Code Generation Compute offsets and signatures Generate code for static initializers Generate code for statements and expressions Optimize the generated code (ignored for now) Compute locals and stack limits Emit Jasmin code Assemble Jasmin code to class files
4 Code Generation Computing Offsets Each formal and local variables must have an offset in the stack frame The this object always has offset 0 The naive solution: enumerate all formals and locals The better solution: reuse offsets for locals in disjoint scopes The clever solution: exploit liveness information must still respect the runtime types of locals
5 Code Generation Naive Offsets public void m(int p, int q, Object r ) int x = 42; int y ; { int z ; z = 87; } { boolean a ; Object b ; { boolean b ; int z ; b = true; boolean c ; c = b && (x==87); } { int y ; y = x; } max = 12
6 Code Generation Better Offsets public void m(int p, int q, Object r ) int x = 42; int y ; { int z ; z = 87; } { boolean a ; Object b ; { boolean b ; int z ; b = true; boolean c ; c = b && (x==87); } { int y ; y = x; } max = 10
7 Code Generation Clever Offsets public void m(int p, int q, Object r ) int x = 42; int y ; { int z ; z = 87; } { boolean a ; Object b ; { boolean b ; int z ; b = true; boolean c ; c = b && (x==87); } { int y ; y = x; } max = 6
8 Code Generation The function sig(σ) encodes a type: sig( void ) = V sig( byte ) = B sig( short ) = S sig( int ) = I sig( char ) = C sig( boolean ) = Z sig(σ [] ) = [ desc(σ) sig(C 1. C C k ) = C 1 / C 2 /... / C k desc( void ) = V desc( byte ) = B desc( short ) = S desc( int ) = I desc( char ) = C desc( boolean ) = Z desc(σ [] ) = [ desc(σ) desc(C 1. C C k ) = L C 1 / C 2 /... / C k ; Computing Signatures (1/2)
9 Code Generation Computing Signatures (2/2) This extends to fields, methods, and constructors The field named x in class C: sig(C) / x The method σ m ( σ 1 x 1,..., σ k x k ) in class C: sig(C) / m ( desc(σ 1 )...desc(σ k ) ) desc(σ) The constructor C ( σ 1 x 1,..., σ k x k ) in class C: sig(C) / ( desc(σ 1 )...desc(σ k ) )V
10 Code Generation Static Initializers (1/2) Initialization of static fields is performed when the class is loaded by the JVM All static fields are first given default values The code for initialization is written in a special method with the name Fields that are static final and constant valued must then be initialized Finally, all other static fields are initialized
11 Code Generation Static Initializers (2/2) public class A { public static int x = A.y+1; public static final int y = 42; public static void main(String[] args) { System.out.print(A.x); } public class A { public static int x = A.y+1; public static int y = 42; public static void main(String[] args) { System.out.print(A.x); } public class A { public static int x = A.y+1; public static final int y = A.fortytwo(); public static int fortytwo() { return 42; } public static void main(String[] args) { System.out.print(A.x); }
12 Code Generation Generating Code Each statement and expression generates a sequence of bytecodes A code template shows how to generate bytecodes for a given language construct The template ignores the surrounding context This yields a simple, recursive strategy for the code generation
13 Code Generation Code Template Invariants A statement and a void expression leaves the stack height unchanged A non- void expression increases the stack height by one This is a local property of each template The generated code must be verifiable This is not a local property, since the verifier performs a global static analysis
14 Code Generation Code Templates (1/12) if( E ) S E ifeq false S false: if( E ) S 1 else S 2 E ifeq false S 1 goto endif false: S 2 endif: nop while( E ) S goto cond: loop: S cond: E ifne loop while(true) S loop: S goto loop
15 Code Generation { σ n ; S } S Code Templates (2/12) E ; E { σ n = E ; S } E σ store offset(n) S type(E) = void E ; E pop type(E) ≠ void throw E ; E athrow return E ; E σ return return; return σ store is either istore or astore depending on σ
16 Code Generation Code Templates (3/12) new C ( E 1,..., E k δ new sig(C) dup E 1... E k invokespecial sig(δ) this( E 1,..., E k δ aload 0 E 1... E k invokespecial δ indicates that δ is the corresponding resolved declaration
17 Code Generation Code Templates (4/12) super( E 1,..., E k δ aload 0 E 1... E k invokespecial sig(δ) aload 0 I 1 putfield sig(x 1 ) desc(σ 1 )... aload 0 I n putfield sig(x n ) desc(σ n ) The current class contains the non-static field initializations: σ 1 x 1 = I 1 ;... σ n x n = I n ;
18 Code Generation Code Templates (5/12) E. m ( E 1,..., E k δ E E 1... E k invokevirtual sig(δ) E. m ( E 1,..., E k δ E E 1... E k invokeinterface sig(δ) C. m ( E 1,..., E k δ E 1... E k invokestatic sig(δ) sig(δ) is an interface sig(δ) is a class
19 Code Generation (char) E E i2c E instanceof CE instanceof sig(C) Code Templates (6/12) ( C ) E E checkcast sig(C)
20 Code Generation Code Templates (7/12) thisaload 0 n σ load offset(n) E.f E getfield sig(f) desc(type(E.f)) E 1 [ E 2 ] E 1 E 2 σ aload type(E 1 [ E 2 ] ) = σ type(n) = σ C.f getstatic sig(f) desc(type(C.f)) σ aload is either iaload, baload, saload, caload, or aaload depending on σ
21 Code Generation Code Templates (8/12) n = E E dup σ store offset(n) E 1. f = E 2 E 1 E 2 dup_x1 putfield sig(f) desc(type(E 1.f)) E 1 [ E 2 ] = E 3 E 1 E 2 E 3 dup_x2 σ astore type(E 1 [ E 2 ] ) = σ type(n) = σ C. f = E E dup putstatic sig(f) desc(type(C.f))
22 Code Generation Code Templates (9/12) new σ [ E ] E multianewarray desc(σ) 1 E.length E arraylength E.clone() E invokevirtual sig(type(E)) /clone()Ljava/lang/Object;
23 Code Generation Code Templates (10/12) 42 ldc_int 42 true ldc_int 1 null aconst_null "abc" ldc_string "abc"
24 Code Generation Code Templates (11/12) E 1 + E 2 E 1 E 2 iadd type(E 1 + E 2 ) = int E 1 + E 2 E 1 E 2 invokevirtual S /concat(L S ;)L S ; type(E 1 + E 2 ) = String S java/lang/String - E E ineg
25 Code Generation Code Templates (12/12) E 1 || E 2 E 1 dup ifne firsttrue pop E 2 firsttrue: E 1 && E 2 E 1 dup ifeq firstfalse pop E 2 firstfalse:
26 Code Generation Stack and Locals Limits The generated code must explicitly state: the maximal number of local and formal offsets the maximal local stack height This is used to determine the size of the frame The locals limit is the maximal offset + (1 or 0) The stack limit is computed by a static analysis
27 Code Generation Stack Limit Analysis Consider the control flow graph of the bytecodes succ(S i ) denotes the set of successor bytecodes Δ(S i ) denotes the change in stack height by S i S 0 denotes the first bytecode For every bytecode S i we define the following integer-valued properties: B[[S i ]] denotes the stack height before S i A[[S i ]] denotes the stack height after S i
28 Code Generation Dataflow Constraints B[[S 0 ]] = 0 A[[S i ]] = B[[S i ]] + Δ(S i ) x succ(S i ): A[[S i ]] = B[[x]] A[[S i ]] 0 These constraints must have a solution The stack limit is the largest value of any A[[S i ]]
29 Code Generation Jasmin Class Format (1/3) The overall structure of a Jasmin file is:.source sourcefile.class modifiers name.super sig( superclass ).implements sig( interface ).field modifiers desc( type ) constructors methods
30 Code Generation Jasmin Class Format (2/3) The structure of a constructor is:.method modifiers sig( constructor ).throws sig( exception ).limit stack stacklimit.limit locals localslimit bytecodes.end method
31 Code Generation Jasmin Class Format (3/3) The structure of a method is:.method modifiers sig( method ).throws sig( exception ).limit stack stacklimit.limit locals localslimit bytecodes.end method
32 Code Generation A Tiny Class public class Foo { public int y = 42; public Foo(int z) { y = y+z; } public String print(int n) { if (n==0) return new Integer(y).toString(); else return new Foo(y).print(n-1); }
33 Code Generation The Generated Code.source Foo.java.class public Foo.super java/lang/Object.field public "y" I.method public (I)V.limit stack 3.limit locals 2 aload_0 invokespecial java/lang/Object/ ()V aload_0 bipush 42 putfield Foo/y I aload_0 getfield Foo/y I iload_1 iadd dup_x1 putfield Foo/y I pop return.end method.method public print(I)Ljava/lang/String;.limit stack 3.limit locals 2 iload_1 iconst_0 if_icmpeq true2 iconst_0 goto end3 true2: iconst_1 end3: ifeq false0 new java/lang/Integer dup aload_0 getfield Foo/y I invokespecial java/lang/Integer/ (I)V invokevirtual java/lang/Integer/toString()Ljava/lang/String; areturn false0: new Foo dup aload_0 getfield Foo/y I invokespecial Foo/ (I)V iload_1 iconst_1 isub invokevirtual Foo/print(I)Ljava/lang/String; areturn.end method
34 Code Generation The Binary Class File cafe babe e 001e 0c00 0a a f6c 616e 672f 496e a f6c 616e 672f 4f62 6a c69 6e69 743e c d f6f e 740c c 0a00 1d f a53 6f c a a c f e67 0c00 0f f 6f2e 6a c6a f 6c61 6e67 2f e 673b 0a a 001d 000b c 6a f6c 616e 672f e67 3b f e ab a10 2ab a2a b b 605a b b a e a e1b 039f a bb00 1d59 2ab b700 0cb6 001b b0bb a b400 04b7 001a 1b04 64b b