JIT-Compiler-Assisted Distributed Java Virtual Machine Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau Department of Computer Science and.

JIT-Compiler-Assisted Distributed Java Virtual Machine Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau Department of Computer Science and Information Systems The University of Hong Kong Presented by Cho-Li Wang

TCHPC 2004, Taiwan, Mar, 2004 2 Outline Distributed Java Virtual Machine Design Tradeoffs Related work JESSICA2 features Experimental results Conclusion & future work A raytracing demo

TCHPC 2004, Taiwan, Mar, 2004 3 Distributed Java Virtual Machine (DJVM) A distributed Java Virtual Machine (DJVM) consists of a group of extended JVMs running on a distributed environment to support true parallel execution of a multithreaded Java application. A DJVM provides all the JVM services, that are compliant with the Java language specification, as if running on a single machine – Single System Image (SSI). Heap Bytecode Execution Engine Class Thread DJVM (Single System Image) import java.util.*; class worker extends Thread{ private long n; public worker(long N){ n=N; } public void run(){ long sum=0; for(long i=0; i<n; i++) sum+=i; System.out.println(“N=“+n+” Sum="+sum);} } public class test { static final int N=100; public static void main(String args[]){ worker [] w= new worker[N]; Random r = new Random(); for (int i=0; i<N; i++) w[i] = new worker(r.nextLong()); for (int i=0; i<N; i++) w[i].start(); try{ for (int i=0; i<N; i++) w[i].join();} catch (Exception e){}} } Java thread JVM

TCHPC 2004, Taiwan, Mar, 2004 4 Design Tradeoffs of a DJVM How to manage the threads? Distributed thread scheduling Initial placement vs thread migration How to store the data ? Distributed heap (object store) Java memory model (memory consistency) Can an off-the-shelf DSM be used as the heap? How to process the bytecode ? Execution Engine : Interpretation, Just-in-Time (JIT) compilation, Static compilation Thread Sched Exec Engine Heap

TCHPC 2004, Taiwan, Mar, 2004 5 Related work cJVM (IBM Haifa Research) Interpreter mode execution built-in object caching JAVA/DSM (Rice University) Interpreter mode execution Heap built on top of a page-based DSM JESSICA(HKU) Thread migration Interpreter mode execution Heap built on top of a page-based DSM Jackal, Hyperion Static compilation Link to object-based DSM Remote Creation Intr Embedded OO-based DSM (Proxy) Manual Distribution Intr Page-based DSM Transparent Migration Intr Page-based DSM Remote Creation Static compilation OO-based DSM

TCHPC 2004, Taiwan, Mar, 2004 6 JESSICA2 (Java-Enabled Single-System- Image Computing Architecture) Thread Migration Global Object Space JESSICA2 JVM A Multithreaded Java Program JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM MasterWorker JIT Compiler Mode Portable Java Frame

TCHPC 2004, Taiwan, Mar, 2004 7 JESSICA2 Main Features Transparent Java thread migration Runtime capturing and restoring of thread execution context. No source code modification; no bytecode instrumentation (preprocessing); no new API introduced Enable dynamic load balancing on clusters JIT compiler-based execution engine (JITEE) Operated in Just-In-Time (JIT) compilation mode cluster-aware Global Object Space A shared global heap spanning all cluster nodes Provide location-transparent object access Adaptive migrating home protocol for memory consistency, plus various optimizing schemes. I/O redirection

TCHPC 2004, Taiwan, Mar, 2004 8 JESSICA2 thread migration (In a JIT-enabled JVM) Thread Frame (1) Alert Frames Method Area JVM  Frame parsing  Restore execution Frame  Stack analysis  Stack capturing Thread Scheduler Source node Destination node Migration Manager Load Monitor Method Area RTC Frames BTC (2) (3) PC RTC: Raw Thread Context BTC : Bytecode-oriented Thread Context (thread id, frames, class names, method signature, PC, Operand stack ptr, local vars …) Transformation of the RTC into the BTC directly inside the JIT compiler

TCHPC 2004, Taiwan, Mar, 2004 9 Thread Stack Transformation Raw Thread Context (RTC) %esp:0x00000000 %esp+4:0x082ca809 %esp+8:0x08225400 %esp+12:0x08266bc0 %esp:0x00000000 %esp+4:0x082ca809 %esp+8:0x08225400 %esp+12:0x08266bc0... %eax = 0x08623200 %ebx = 0x08293100 Frames{ method CPI::run()V@111 local=13;stack=0; var: arg0:CPI, 33, 0x8225400 local1: [D; 33, 0x8266bc0@2 local2: int, 2;... Bytecode-oriented Thread Context (BTC) Raw Thread Context (RTC) Stack Capturing Stack Restoration

TCHPC 2004, Taiwan, Mar, 2004 10 Details Bytecode verifier Bytecode translation migration point selection : head of a basic block INVOKESTATIC, INVOKESPECIAL, INVOKEVIRTUAL and INVOKEINTERFACE Construct control flow graph invoke code generation Native Code Linking & Constant Resolution Intermediate Code Java frame C frame Register rebuild Register allocation (Restore) mov var1->reg1 mov var2->reg2... regvar Variables Java frame detection thread stack raw stack Global Object Space 1.Migration checking 2.Non-destructive register spilling 3.Object checking 4.Type spilling for variable type deducing

TCHPC 2004, Taiwan, Mar, 2004 11 Example of native code instrumentation

TCHPC 2004, Taiwan, Mar, 2004 12 Optimization on migration points – Pseudo-inlining Purpose : eliminate the costs of unnecessary inserted migration points General idea: delete M-points before a small method invocation

TCHPC 2004, Taiwan, Mar, 2004 13 Dynamic Register Patching Stack growth %ebp bootstrap frame trampoline frame Ret addr frame 0 reg1 <- value1 reg2 <- value2 jmp restore_point0 Ret addr %ebp frame 1 reg1 <- value1 jmp restore_point1 Compiled methods: Method1(){... retore_point1: } Method0(){... retore_point10: } trampoline bootstrap(){ trampoline(); closing handler(); }

TCHPC 2004, Taiwan, Mar, 2004 14 Advantages of native code instrumentation Lightweight Re-use JIT compiler internal data structures and control flow analysis functions No need to include debugging information in Java class files Instrumented native codes are more efficient than instrumented bytecode. Transparent No source code modification. No new API introduced. No preprocessing

TCHPC 2004, Taiwan, Mar, 2004 15 Global Object Space (GOS) Provide global heap abstraction for DJVM Home-based object coherence protocol, compliant with JVM Memory Model OO-based to reduce false sharing Non-blocking communication Use threaded I/O interface inside JVM for communication to hide the latency Adaptive object home migration mechanism Take advantage of JVM runtime information for optimization

TCHPC 2004, Taiwan, Mar, 2004 16 GOS runtime data structure Master object object header cache pointer object data Cache object object header cache pointer cache data Master host id master address class cache obj list thread id status cache data next Cache data thread id status cache data next cache data Cache header

TCHPC 2004, Taiwan, Mar, 2004 17 Experimental environment HKU Gideon 300 Linux cluster : 300 P4 PCs (2GHz, 512 MB RAM, 40 GB disk) Network: 312-port Foundry FastIron 1500 Non-blocking switch (100 Mbits/s)

TCHPC 2004, Taiwan, Mar, 2004 18 Migration overhead during normal execution (SPECJVM98 benchmark) BenchmarksTime (seconds)Space (native code/bytecode) No migrationMigrationNo migrationMigration compress11.3111.39(+0.71%)6.897.58(+10.01%) jess30.4830.96(+1.57%)6.828.34(+22.29%) raytrace24.4724.68(+0.86%)7.478.49(+13.65%) db35.4936.69(+3.38%)7.017.63(+8.84%) javac38.6640.96(+5.95%)6.748.72(+29.38%) mpegaudio28.0729.28(+4.31%)7.978.53(+7.03%) mtrt24.9125.05(+0.56%)7.478.49(+13.65%) jack37.7837.90(+0.32%)6.958.38(+20.58%) Average (+2.21%) (+15.68%)

TCHPC 2004, Taiwan, Mar, 2004 19 Migration overhead analysis Program (frame #)LT(1)CPI(1)ASP(1)N-Body(8)SOR(2) Latency (ms)4.9972.6804.67810.8038.467 Frame # 1246810 Var # 415375981103 Size (B) 201417849128117132145 Capture (us) 202266410495605730 Parse (us) 235253447526611724 Create (us) 360 Compile (us) 4785758471,1691,4511,720 Build (us) 71114162128 Total (us) 1,2821,4652,0782,5663,0483,562 Overall migration latency Migration time breakdown (LT program)

TCHPC 2004, Taiwan, Mar, 2004 20 GOS Optimizations (using 4 PCs) NO = No optimizationsHS = Home migration + Synchronized Method Shipping H = Home migrationHSP = HS + Object pushing

TCHPC 2004, Taiwan, Mar, 2004 21 JESSICA2 vs JESSICA (CPI)

TCHPC 2004, Taiwan, Mar, 2004 22 Application benchmark

TCHPC 2004, Taiwan, Mar, 2004 23 Parallel Ray Tracing (using 64 nodes of Gideon 300 cluster) Linux 2.4.18-3 kernel (Redhat 7.3) 64 nodes: 108 seconds 1 node: 4402 seconds ( 1.2 hour) Speedup = 4402/108= 40.75

TCHPC 2004, Taiwan, Mar, 2004 24 Conclusions Transparent Java thread migration in JIT compiler enables the high- performance execution of multithreaded Java application on clusters An embedded GOS layer can take advantage of the JVM runtime information to reduce communication overhead

TCHPC 2004, Taiwan, Mar, 2004 25 Future work Advanced thread migration mechanism without overhead during normal execution (finished) Incremental Distributed GC Enhanced Single I/O Space to benefit more real-life applications Parallel I/O Support

TCHPC 2004, Taiwan, Mar, 2004 26 Thanks JESSICA2 Webpage http://www.csis.hku.hk/~clwang/ projects/JESSICA2.html

JIT-Compiler-Assisted Distributed Java Virtual Machine Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau Department of Computer Science and.

Similar presentations

Presentation on theme: "JIT-Compiler-Assisted Distributed Java Virtual Machine Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau Department of Computer Science and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

JIT-Compiler-Assisted Distributed Java Virtual Machine Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau Department of Computer Science and.

Similar presentations

Presentation on theme: "JIT-Compiler-Assisted Distributed Java Virtual Machine Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau Department of Computer Science and."— Presentation transcript:

Similar presentations

About project

Feedback