Download presentation
Presentation is loading. Please wait.
Published byReynold McGee Modified over 9 years ago
1
1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time Code Layout
2
2 Goals Improve instruction reference locality –big problem for commodity applications Eliminate need for profile information –required by current compiler-based solutions
3
3 How? Implement layout dynamically using Activation Order: A new heuristic for code layout. Locate procedures in order of use.
4
4 Requirements No special hardware support. Minimal changes to the operating system. Minimal system overhead.
5
5 Optimizing Procedure Layout Bad LayoutBetter Layout
6
6 Current Practice: Pettis and Hansen Nodes are procedures. Edges are caller/callee pairs. Weights are call frequency. WinMain() Initialize() EventLoop() GetEvent() React() HandleRareCase() HandleInputError() CheckForInputError() HandleCommonCase() 1 1 12939468754 1284041 10 68753
7
7 Pettis and Hansen Layout EventLoop() GetEvent() React() CheckForInputError() HandleCommonCase() 12939468754 128404 68753 EventLoop() React() HandleCommonCase() 12939468754 68753 Node-1 layout: [] layout: [GetEvent, CheckForInputErrors] Node-2 React() HandleCommonCase() 68754 68753 layout: [EventLoop, GetEvent, CheckForInputErrors] Node-3 HandleCommonCase() 68753 layout: [React, EventLoop, GetEvent, CheckForInputErrors] Node-4 layout: [HandleCommonCase, React, EventLoop, GetEvent, CheckForInputErrors]
8
8 A New Heuristic Activation Order: Co-locate procedures that are activated sequentially. Example:
9
9 Implementing JITCL __start: perform initializations call thunk_main thunk_main:... thunk_foo:... __InstructionMemory: Thunk routines implement code layout on-the-fly.
10
10 Thunk routines // Global variables: //ProcPointers[] - one element per procedure //INDEX_proc and LENGTH_proc for each procedure thunk_main: if (InCodeSegment(ProcPointers[INDEX_main])) ProcPointers[INDEX_main] = CopyToTextSegment(ProcPointer[INDEX_main], LENGTH_main); PatchCallSite(ProcPointer[INDEX_main], ComputeCallSiteFromReturnAddress(RA)); jmp ProcPointer[INDEX_main]; The thunk routines copy procedures into the text segment and update call sites at run-time.
11
11 Simulation Methodology 8K Cache Size Direct-Mapped2-WayAssociativity ATOMEtchSimulation UNIX/RISCWin32/x86
12
12 Workloads
13
13 Results The AO heuristic is effective. The overhead of JITCL is negligible. JITCL improves procedure layout without requiring profile information. JITCL reduces program memory requirements.
14
14 Results: The AO Heuristic Improvement in I-Cache Miss Rate Conclusion: Effectiveness of heuristic is comparable to P&H.
15
15 Overhead of JITCL Copy overhead –instruction overhead –cache overhead Cache consistency Disk overhead - comparable to demand loaded text; not evaluated.
16
16 Results: Overhead Overhead Instructions (%) Conclusion: JITCL Overhead is less than 0.1% in all cases.
17
17 Results: Performance Saved Cycles per Instruction Conclusion: Overall performance is comparable to P&H.
18
18 JITCL for Win32 Applications Windows applications are composed of multiple executable modules. When transitions between modules are frequent, intra-module code layout is less effective. With JITCL, inter-module code layout is possible and beneficial.
19
19 Win32 Cache Miss Rates Conclusion: Careful layout did not help Win32 applications.
20
20 Text Segment Size Text size in megabytes Conclusion: JITCL typically reduces text size by 50%.
21
21 JITCL vs. PBO JITCL provides an alternative to feedback-based procedure layout. Many important optimizations still require profile information. –instruction scheduling –register allocation –other intra-procedural optimizations Don’t expect profile-based optimization to go away!
22
22 Conclusions Just-In-Time code layout achieves comparable benefit to profile-based code layout without the need for profiles. The AO heuristic is effective. The overhead of procedure copying is low. Benefit in I-Cache is comparable to Pettis and Hansen layout. JITCL can reduce working set size.
23
23 The Morph Project oMphr For more information: http://www.eecs.harvard.edu/morph/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.