Java Garbage Collection, Byte Code James Atlas August 7, 2008
James Atlas - CISC3702 Review Java Security Java Security JVM Cryptography
August 7, 2008James Atlas - CISC3703 Schedule Today Today Java Garbage Collection Java Bytecode Tuesday Tuesday Review Thursday Thursday Final (5-7PM)
August 7, 2008James Atlas - CISC3704 Java Garbage Collection
August 7, 2008James Atlas - CISC3705 Garbage Collection What is garbage and how can we deal with it? What is garbage and how can we deal with it? Garbage collection schemes Garbage collection schemes Reference Counting Mark and Sweep Stop and Copy A comparison A comparison
August 7, 2008James Atlas - CISC3706 How Objects are Created in Java An object is created in Java by invoking the new() operator. An object is created in Java by invoking the new() operator. Calling the new() operator, the JVM will do the following: Calling the new() operator, the JVM will do the following: allocate memory; assign fields their default values; run the constructor; a reference is returned.
August 7, 2008James Atlas - CISC3707 How Java Reclaims Objects Memory Java does not provide the programmer any means to destroy objects explicitly Java does not provide the programmer any means to destroy objects explicitly The advantages are The advantages are No dangling reference problem in Java Easier programming No memory leak problem
August 7, 2008James Atlas - CISC3708 What is Garbage? Garbage: unreferenced objects Garbage: unreferenced objects Student john= new Student(); Student jeff= new Student(); john=jeff; john Object becomes garbage, because it is an unreferenced Object john Object becomes garbage, because it is an unreferenced Object john Object john jeff jeff Object
August 7, 2008James Atlas - CISC3709 What is Garbage Collection? What is Garbage Collection? What is Garbage Collection? Finding garbage and reclaiming memory allocated to it. Finding garbage and reclaiming memory allocated to it. When is the Garbage Collection process invoked? When is the Garbage Collection process invoked? When the total memory allocated to a Java program exceeds some threshold. Is a running program affected by garbage collection? Is a running program affected by garbage collection? Yes, depending on the algorithm. Typically the program suspends during garbage collection. john Object john jeff jeff Object
August 7, 2008James Atlas - CISC37010 Strategies for Handling Garbage Modern Societies produce an excessive amount of waste? Modern Societies produce an excessive amount of waste? What is the solution? What is the solution? Reduce Reuse Recycle The same Applies to Java!!! The same Applies to Java!!!
August 7, 2008James Atlas - CISC37011 Reduce Garbage A Java program that does not create any objects does not create garbage. A Java program that does not create any objects does not create garbage. Objects used until the end of the program do not become garbage. Objects used until the end of the program do not become garbage. Reducing the number of objects used will reduce the amount of garbage generated. Reducing the number of objects used will reduce the amount of garbage generated.
August 7, 2008James Atlas - CISC37012 Reuse Garbage Reuse objects instead of generating new ones. Reuse objects instead of generating new ones. for (int i=0;i< ; ++i) { SomeClass obj= new SomeClass(i); SomeClass obj= new SomeClass(i); System.out.println(obj); System.out.println(obj);} This program generates one million objects and prints them out. This program generates one million objects and prints them out. SomeClass obj= new SomeClass(); for (int i=0;i< ; ++i) { obj.setInt(i); obj.setInt(i); System.out.println(onj); System.out.println(onj);} Using only one object and implementing the setInt() method, we dramatically reduce the garbage generated. Using only one object and implementing the setInt() method, we dramatically reduce the garbage generated.
August 7, 2008James Atlas - CISC37013 Recycle Garbage Don't leave unused objects for the garbage collector. Don't leave unused objects for the garbage collector. Put them instead in a container to be searched when an object is needed. Advantage: reduces garbage generation. Advantage: reduces garbage generation. Disadvantage: puts more overhead on the programmer. Disadvantage: puts more overhead on the programmer. Can anyone think of what design pattern this represents? Can anyone think of what design pattern this represents?
August 7, 2008James Atlas - CISC37014 Garbage Collection Strategies Reference Counting Reference Counting Mark and Sweep Mark and Sweep Stop and Copy Stop and Copy
August 7, 2008James Atlas - CISC37015 Reference Counting Garbage Collection Main Idea: Add a reference count field for every object. Main Idea: Add a reference count field for every object. This Field is updated when the number of references to an object changes. This Field is updated when the number of references to an object changes. Example Example Object p= new Integer(57); Object p= new Integer(57); Object q = p; Object q = p; 57 refCount = 2 p q
August 7, 2008James Atlas - CISC37016 Reference Counting (cont'd) The update of reference field when we have a reference assignment ( i.e p=q) can be implemented as follows The update of reference field when we have a reference assignment ( i.e p=q) can be implemented as follows if (p!=q) { if (p!=null) if (p!=null) --p.refCount; --p.refCount; p=q; p=q; if (p!=null) if (p!=null) ++p.refCount; ++p.refCount;} 57 refCount = 0 p q 99 refCount = 2 Example: Object p = new Integer(57); Object q= new Integer(99); p=q
August 7, 2008James Atlas - CISC37017 Reference Counting (cont'd) What in case of indirect references? What in case of indirect references? We can still use reference counting, provided we consider all references to an object including references from other objects. We can still use reference counting, provided we consider all references to an object including references from other objects. Object p = new Association(new Integer(57), new Integer(99)); Object p = new Association(new Integer(57), new Integer(99));
August 7, 2008James Atlas - CISC37018 Reference Counting (cont'd) When does reference counting fail? When does reference counting fail? When head is assigned to null, first object reference count becomes 1 and not zero When head is assigned to null, first object reference count becomes 1 and not zero Reference counting will fail whenever the data structure contains a cycle of references Reference counting will fail whenever the data structure contains a cycle of references next refCount = 1 ListElements refCount = 1 ListElements next refCount = 1 ListElements next head
August 7, 2008James Atlas - CISC37019 Reference Counting (cont'd) Advantages and Disadvantages Advantages and Disadvantages + Garbage is easily identified. + Garbage can be collected incrementally. - Every object should have a reference count field. Overhead for updating reference count fields. It fails in the case of cyclic references. It does not de-fragment the heap
August 7, 2008James Atlas - CISC37020 Mark-and-Sweep Garbage Collection It is the first garbage collection algorithm that is able to reclaim garbage even for cyclic data structures. It is the first garbage collection algorithm that is able to reclaim garbage even for cyclic data structures. Mark and sweep algorithm consists of two phases: Mark and sweep algorithm consists of two phases: mark phase sweep phase for each root variable r mark(r); mark(r);sweep();
August 7, 2008James Atlas - CISC37021 Mark and Sweep (cont'd) void sweep(){ for each Object p in the heap for each Object p in the heap { if (p.marked) if (p.marked) p.marked=false; p.marked=false; else else heap.release(p); heap.release(p);}} program
August 7, 2008James Atlas - CISC37022 Mark and Sweep (cont'd) Advantages Advantages It correctly identifies and collects garbage even in the presence of reference cycles. No overhead in manipulating references. Disadvantages Disadvantages The program suspends while garbage collecting. It does not De-Fragment the heap.
August 7, 2008James Atlas - CISC37023 Stop-and-Copy Garbage Collection This algorithm collects garbage and defragments the heap. This algorithm collects garbage and defragments the heap. The heap is divided into two regions: active and inactive. The heap is divided into two regions: active and inactive. When the memory in the active region is exhausted, the program is suspended and : When the memory in the active region is exhausted, the program is suspended and : Live objects are copied to the inactive region contiguously Live objects are copied to the inactive region contiguously The active and in active regions reverse their roles The active and in active regions reverse their roles The Algorithm The Algorithm for each root variable r r=copy(r,inactiveHeap); r=copy(r,inactiveHeap); swap (activeHeap,inactiveHeap);
August 7, 2008James Atlas - CISC37024 Stop-and-Copy Garbage Collection (cont'd) Object copy(Object p, Heap destination) { if (p==null) if (p==null) return null; return null; if (p.forward==null) if (p.forward==null) { q=destination.newInstance(p.class); q=destination.newInstance(p.class); p.forward= q; p.forward= q; for each field f in p for each field f in p { if (f is primitive type) if (f is primitive type) q.f=p.f; q.f=p.f; else else q.f= copy(p.f, destination); q.f= copy(p.f, destination); } q.forward = null; q.forward = null; } return p.forward; return p.forward; } A’ null B’ null C’ null head inactive active
August 7, 2008James Atlas - CISC37025 Stop-and-Copy Garbage Collection (cont'd) Advantages Advantages It works for cyclic data structures It Defragments the heap. Disadvantages Disadvantages All objects are copied when the garbage collector is invoked – it does not work incrementally. It requires twice as much memory as the program actually uses.
August 7, 2008James Atlas - CISC37026 Java Garbage Collector History mark-and-sweep three memory spaces 1. Permanent space: used for JVM class and method objects 2. Old object space: used for objects that have been around a while 3. New (young) object space: used for newly created objects Also broken into Eden, Survivor1 and Survivor2Also broken into Eden, Survivor1 and Survivor2 allowed different techniques for each space
August 7, 2008James Atlas - CISC37027 Java Garbage Collector History (cont’) 1.3 techniques 1.3 techniques Copy-compaction: used for new object space. Mark-compact: used in old object space. Similar to mark and sweep, mark-compact marks all unreachable objects; in the second phase, the unreachable objects compact. Similar to mark and sweep, mark-compact marks all unreachable objects; in the second phase, the unreachable objects compact. Incremental garbage collection (optional) Incremental GC creates a new middle section in the heap, which divides into multiple trains. Garbage is reclaimed from each train one at a time. This provides fewer, more frequent pauses for garbage collection, but it can decrease overall application performance. Incremental GC creates a new middle section in the heap, which divides into multiple trains. Garbage is reclaimed from each train one at a time. This provides fewer, more frequent pauses for garbage collection, but it can decrease overall application performance. still all “stop-the-world” techniques still all “stop-the-world” techniques
August 7, 2008James Atlas - CISC37028 Java Garbage Collector History (cont’) introduced parallel GC algorithms introduced parallel GC algorithms YoungOld Stop the world MultithreadedConcurrent CopyingX X *Parallel copying X XX *Parallel scavenging X XX Incremental 1 (see note below) X Mark-compact XX *Concurrent X 2 (see note below) X Note 1: Subdivides the new generation to create an additional middle generation Note 2: Uses stop-the-world approach for two of its six phases
August 7, 2008James Atlas - CISC37029 JVM Internals The architecture The architecture JVM is an abstract concept Sun just specified the interface implementation details depend on specific product (SUN JDK, IBM JDK, Blackdown) Java bytecode, the internal language Java bytecode, the internal language independent from CPU-type (bytecode) Stackoriented, object-oriented, type-safe
August 7, 2008James Atlas - CISC37030 Runtime view on a JVM Class loader Class loader Runtime Data storage Method Area (Classes) Heap (Objects) Stack Frames PC registers Native method stacks JVM runtime JVM runtime Native methods Native methods
August 7, 2008James Atlas - CISC37031 Runtime data Frame: Frame: Saves runtime state of execution threads, therefore holds information for method execution (program counter) Saves runtime state of execution threads, therefore holds information for method execution (program counter) All frames of a thread are managed in a stack frame All frames of a thread are managed in a stack frame
August 7, 2008James Atlas - CISC37032 Runtime data Method area Method area Runtime information of the class file Type information Constant Pool Method information Field information Class static fields Reference to the classloader of the class Reference to reflection anchor (Class)
August 7, 2008James Atlas - CISC37033 The Constant Pool The "constant pool" is a heterogenous array of data. Each entry in the constant pool can be one of the following: The "constant pool" is a heterogenous array of data. Each entry in the constant pool can be one of the following: string, class or interface name, reference to a field or method, numeric value, constant String value No other part of the class file makes specific references to strings, classes, fields, or methods. All references for constants, names of methods, and fields are via lookup into the constant pool. No other part of the class file makes specific references to strings, classes, fields, or methods. All references for constants, names of methods, and fields are via lookup into the constant pool.
August 7, 2008James Atlas - CISC37034 The Class File Structure HEADERHEADER CONSTANT- POOL ACCESS FLAGS (Final, Native, Private, Protected,...) INTERFACESINTERFACES FIELDSMETHODS ATTRIBUTESATTRIBUTES You can use a classdumper like javap -c or DumpClass to analyze these inner details You can use a classdumper like javap -c or DumpClass to analyze these inner details
August 7, 2008James Atlas - CISC37035 Javap -c -verbose example Our Chicken.java class Our Chicken.java class
August 7, 2008James Atlas - CISC37036 The Class File Format Java class files are brought into the JVM via the classloader Java class files are brought into the JVM via the classloader The class file is basically just a plain byte array, following the rules of the byte code verifier. The class file is basically just a plain byte array, following the rules of the byte code verifier. All 16-bit and 32-bit quantities are formed by reading in two or four 8-bit bytes, respectively, and joining them together in big-endian format. All 16-bit and 32-bit quantities are formed by reading in two or four 8-bit bytes, respectively, and joining them together in big-endian format.
August 7, 2008James Atlas - CISC37037 Methods and Fields The type of a field or method is indicated by a string called its signature. The type of a field or method is indicated by a string called its signature. Fields may have an additional attribute giving the field's initial value. Fields may have an additional attribute giving the field's initial value. Methods have an additional CODE attribute giving the java bytecode for executing that method. Methods have an additional CODE attribute giving the java bytecode for executing that method.
August 7, 2008James Atlas - CISC37038 The CODE Attribute maximum stack space maximum stack space maximum number of local variables maximum number of local variables The actual bytecode for executing the method. The actual bytecode for executing the method. A table of exception handlers, A table of exception handlers, start and end offset into the bytecodes, an exception type, and the offset of a handler for the exception
August 7, 2008James Atlas - CISC37039 Bytecode Basics
August 7, 2008James Atlas - CISC37040 The JVM types JVM-Types and their prefixes JVM-Types and their prefixes Byte b Byte b Short s Short s Integer i (java booleans are mapped to jvm ints!) Long l Long l Character c Single float f Single float f double float d double float d References a to Classes, Interfaces, Arrays These Prefixes used in opcodes (iadd, astore,...) These Prefixes used in opcodes (iadd, astore,...)
August 7, 2008James Atlas - CISC37041 The JVM Instruction Mnemonics Shuffling (pop, swap, dup,...) Shuffling (pop, swap, dup,...) Calculating (iadd, isub, imul, idiv, ineg,...) Calculating (iadd, isub, imul, idiv, ineg,...) Conversion (d2i, i2b, d2f, i2z,...) Conversion (d2i, i2b, d2f, i2z,...) Local storage operation (iload, istore,...) Local storage operation (iload, istore,...) Array Operation (arraylength, newarray,...) Array Operation (arraylength, newarray,...) Object management (get/putfield, invokevirtual, new) Object management (get/putfield, invokevirtual, new) Push operation (aconst_null, iconst_m1,....) Push operation (aconst_null, iconst_m1,....) Control flow (nop, goto, jsr, ret, tableswitch,...) Control flow (nop, goto, jsr, ret, tableswitch,...) Threading (monitorenter, monitorexit,...) Threading (monitorenter, monitorexit,...)
August 7, 2008James Atlas - CISC37042 Bytecode Java Bytecode (JBC) are followed by zero or more bytes of additional operand information. Java Bytecode (JBC) are followed by zero or more bytes of additional operand information. Table lookup instructions (tableswitch, lookupswitch) have a flexible length Table lookup instructions (tableswitch, lookupswitch) have a flexible length The wide operation extension allows the base operations to use „large“ operands The wide operation extension allows the base operations to use „large“ operands No self-modifying code No self-modifying code No branching to arbitrary locations, only to beginning of instructions limited to scope of current method (enforced by verifier!) No branching to arbitrary locations, only to beginning of instructions limited to scope of current method (enforced by verifier!)
August 7, 2008James Atlas - CISC37043 Bytecode (Reverse) Engineering
August 7, 2008James Atlas - CISC37044 Bytecode Engineering tools Obfuscators Obfuscators Remove/Manipulate all information that can be used for reverse engineering Native compilers Native compilers „Real“ compile of java bytecodes to native instructions (x86/sparc) Build your own bytecode Build your own bytecode Programmatic Generation Manipulate classfiles with an API
August 7, 2008James Atlas - CISC37045 Obfuscators Techniques used Identifier Name Mangling Identifier Name Mangling The JVM does not need useful names for Methods and FieldsThe JVM does not need useful names for Methods and Fields They can be renamed to single letter identifiersThey can be renamed to single letter identifiers Constant Pool Name Mangling Constant Pool Name Mangling Decrypts constant pool entries on runtimeDecrypts constant pool entries on runtime Control flow obfuscation Control flow obfuscation Insertion of phantom variables, stack scramblingInsertion of phantom variables, stack scrambling And by relying on their default values inserting ghost branch instructions, which never executeAnd by relying on their default values inserting ghost branch instructions, which never execute
August 7, 2008James Atlas - CISC37046 Obfuscators Problems with Obfuscation Constant value Mangling implies overhead processing in extra method call of an „deobfuscatename“ method in each retrieval from constant pool Constant value Mangling implies overhead processing in extra method call of an „deobfuscatename“ method in each retrieval from constant pool Dynamic class loading may become broken as classes get new names and reflection calls like class.forName(„Account“) will fail because class „Account“ now known as by it‘s obfuscated name „b16“! Dynamic class loading may become broken as classes get new names and reflection calls like class.forName(„Account“) will fail because class „Account“ now known as by it‘s obfuscated name „b16“! And: Obfuscation breaks patterns that can be recognized by JIT-engines for optimization And: Obfuscation breaks patterns that can be recognized by JIT-engines for optimization
August 7, 2008James Atlas - CISC37047 Obfuscators Design your code for better protection Try to use the lowest visibility (if possible private, if not package), because public and protected fields must keep their name after obfuscation and therefore reveal some structure of your intellectual property, where possible remove unneccessary „public“ modifiers Interaction into the package should be done via a few explicit public accessor classes Interaction into the package should be done via a few explicit public accessor classes to set/get internal fields and call methods Use resource bundles instead of the classfiles to hold string constants Use resource bundles instead of the classfiles to hold string constants Harder reverse-engineering thru one more protectable indirection
August 7, 2008James Atlas - CISC37048 Protecting the Source Code: Native Compilers Convert Java bytecode to C Convert Java bytecode to C Generate executable via normal c-build Generate executable via normal c-build fast execution Additional decompilation effort needed Long turnaround times Even for small java programs you get monster size executable files (67mb source for Viva.java) from some commercial products Transformed program may than be vulnerable to buffer overflows and off-by-ones
August 7, 2008James Atlas - CISC37049 Bytecode Reverse Engineering Decompilation Decompilation Get Source code from class files Graphical Analysis Graphical Analysis Rebuild the logical control flow Disassembly Disassembly Get symbolic bytecode from class files
August 7, 2008James Atlas - CISC37050 JAD Java Decompiler Java Decompiler Free for personal use JADClipse plugin for Eclipse - allows you to browse.class files