Download presentation
Presentation is loading. Please wait.
Published byDarrell Brown Modified over 9 years ago
1
Research supported by IBM CAS, NSERC, CITO Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters Marc Berndl Benjamin Vitale Mathew Zaleski Angela Demke Brown
2
Context Threading Interpreter performance Why not just JIT? High performance JVMs still interpret People use interpreted languages that don’t yet have JITs They still want performance! 30-40% of execution time is due to branch misprediction Our technique eliminates 95% of branch mispredictions
3
Context Threading Overview üMotivation Background: The Context Problem Existing Solutions Our Approach Inlining Results
4
Context Threading load A Tale of Two Machines Loaded Program Virtual Program Return Address Wayness (Conditional) Execution Cycle Bytecode Bodies Pipeline Target Address (Indirect) Predictors Execution Cycle Virtual Machine Interpreter Real Machine CPU
5
Context Threading Interpreter Loaded Program Bytecode bodies Internal Representation fetch dispatch Load Parms execute Execution Cycle
6
Context Threading 0: iconst_0 1: istore_1 2: iload_1 3: iload_1 4: iadd 5: istore_1 6: iload_1 7: bipush 64 9: if_icmplt 2 12: return Running Java Example void foo(){ int i=1; do{ i+=i; } while(i<64); } Java Source Java Bytecode Javac compiler
7
Context Threading while(1){ opcode = *vpc++; switch(opcode){ } }; Switched Interpreter … iload_1 iadd istore_1 iload_1 bipush 64 if_icmplt 2 … Virtual Program Switched Body Implementation case iload_1:.. break; case iadd:.. break; -7 64 Internal Representation vPC Simple, portable and extremely slow
8
Context Threading while(1){ opcode = *vPC++; switch(opcode){ //and many more.. } }; Switched Interpreter case iload_1:.. break; case iadd:.. break; slow. burdened by switch and loop overhead
9
Context Threading 9 “Threading” Dispatch ‣ No switch overhead. Data driven indirect branch. execution of virtual program “threads” through bodies (as in needle & thread) iload_1:.. goto *vPC++; iadd:.. goto *vPC++; istore:.. goto *vPC++; 0: iconst_0 1: istore_1 2: iload_1 3: iload_1 4: iadd 5: istore_1 6: iload_1 7: bipush 64 9: if_icmplt 2 12: return
10
Context Threading 10 0: iconst_0 1: istore_1 2: iload_1 3: iload_1 4: iadd 5: istore_1 6: iload_1 7: bipush 64 9: if_icmplt 2 12: return Context Problem ‣ Data driven indirect branches hard to predict iload_1:.. goto *vPC++; iadd:.. goto *vPC++; istore:.. goto *vPC++; indirect branch predictor (micro-arch)
11
Context Threading 11 “Threading” Dispatch ‣ No switch overhead. Data driven indirect branch. execution of virtual program “threads” through bodies (as in needle & thread) iload_1:.. goto *vPC++; iadd:.. goto *vPC++; istore:.. goto *vPC++;
12
Context Threading Direct Threaded Interpreter -7 &&if_icmplt 64 &&bipush &&iload_1 &&istore_1 &&iadd &&iload_1 … iload_1 iadd istore_1 iload_1 bipush 64 if_icmplt 2 … DTT - Direct Threading Table Virtual Program vPC iload_1:.. goto *vPC++; iadd:.. goto *vPC++; Target of computed goto is data-driven C implementation of each body istore:.. goto *vPC++;
13
Context Threading iload: goto *vpc iadd: goto *vpc istore: goto *vpc... iload iadd istore... virtual program (DTT) indirect branch predictor (micro-arch) vpc direct threaded bytecode bodies (native code) pc of dispatch branch insufficient context for prediction Context Problem
14
Context Threading Context Problem &&iload_1 &&iadd &&istore_1 &&iload_1 &&bipush 64 &&if_icmplt -7 DTT - Direct Threading Table iload_1:.. goto *vPC++; Indirect Branch Predictors iload_1 iadd bipush vPC
15
Context Threading Existing Solutions Body GOTO *PC ???? Piumarta & Ricardi : Bodies Replicated Super InstructionReplicate iload_1 goto *pc 1 iload_1 goto *pc 2 1 1 2 2 Ertl & Gregg: Bodies and Dispatch Replicated Limited to relocatable virtual instructions
16
Context Threading Overview üMotivation üBackground: The Context Problem üExisting Solutions Our Approach Inlining Results
17
Context Threading Key Observation Virtual and native control flow similar Linear or straight-line code Conditional branches Calls and Returns Indirect branches Hardware has predictors for each type Direct uses indirect branch for everything! ‣ Solution: Leverage hardware predictors
18
Context Threading Key Observation Virtual and native control flow have same branch types Linear (not really a branch) Conditional Calls and Returns Indirect Hardware has predictors for each type ‣ Solution: Leverage hardware predictors
19
Context Threading Essence of our Solution iload_1:.. ret; iadd:.. ret;.. call iload_1 call istore_1 call iadd call iload_1 CTT - Context Threading Table (generated code) Bytecode bodies (ret terminated) Return Branch Predictor Stack … iload_1 iadd istore_1 iload_1 bipush 64 if_icmplt 2 … Package bodies as subroutines and call them
20
Context Threading Subroutine Threading iload_1: … ret; iadd: … ret; call bipush call if_icmplt call iload_1 call istore_1 call iadd call iload_1 CTT load time generated code Bytecode bodies (ret terminated) if_cmplt: … goto *vPC++; virtual branch instructions as before … iload_1 iadd istore_1 iload_1 bipush 64 if_icmplt 2 … 64 -7 DTT contains addresses in CTT vPC
21
Context Threading Virtual Branches target: … … … … call … call iload_1 call if_icmplt … Context Threading Table 5 DTT vPC if(icmplt) pc=target; goto *vPC++; Virtual Branch body Context problem remains for virtual branches
22
Context Threading The Context Threading Table A sequence of generated call instructions Good alignment of virtual and hardware control flow for straight-line code. ‣ Can virtual branches go into the CTT?
23
Context Threading Specialized Branch Inlining Conditional Branch Predictor now mobilized … … target: … call … call iload_1 if(icmplt) goto target: … Branch Inlined Into the CTT 5 DTT vPC target : … … Inlining conditional branches provides context
24
Context Threading Tiny Inlining Context Threading is a dispatch technique But, we inline branches Some non-branching bodies are very small Why not inline those? ► Inline all tiny linear bodies into the CTT
25
Context Threading What can go in the CTT? Calls to bodies Inlined bodies Mixed-Mode virtual machine? ‣ Performance?
26
Context Threading Overview üMotivation üBackground: The Context Problem üExisting Solutions üOur Approach üInlining Results
27
Context Threading Experimental Setup Two Virtual Machines on two hardware architectures. VM: Java/SableVM, OCaml interpreter Compare against direct threaded SableVM SableVM distro uses selective inlining Arch: P4, PPC Branch Misprediction Execution Time ► Is our technique effective and general?
28
Context Threading Mispredicted Taken Branches Normalized to Direct Threading 95% mispredictions eliminated on average SableVm/Java Pentium 4
29
Context Threading Execution time Normalized to Direct Threading 27% average reduction in execution time Pentium 4
30
Context Threading Execution Time (geomean) Normalized to Direct Threading Our technique is effective and general
31
Context Threading Conclusions Context Problem: branch mispredictions due to mismatch between native and virtual control flow Solution: Generate control flow code into the Context Threading Table Results Eliminate 95% of branch mispredictions Reduce execution time by 30-40% ‣ recent, post CGO 2005, work follows
32
Context Threading 32 What about Scripting Languages? Recently ported context threading to TCL. 10x cycles executed per bytecode dispatched. Much lower dispatch overhead. Speedup due to subroutine threading, approx. 5%. TCL conference 2005 Cycles per virtual instruction
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.