Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08.

Slides:



Advertisements
Similar presentations
Performance Analysis and Optimization through Run-time Simulation and Statistics Philip J. Mucci University Of Tennessee
Advertisements

IBM JIT Compilation Technology AOT Compilation in a Dynamic Environment for Startup Time Improvement Kenneth Ma Marius Pirvu Oct. 30, 2008.
Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Compilation Technology Oct. 16, 2006 © 2006 IBM Corporation Software Group Reducing Startup Costs of Java Applications with Shared Relocatable Code Derek.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
IBM Software Group © 2004 IBM Corporation Compilation Technology Java Synchronization : Not as bad as it used to be! Mark Stoodley J9 JIT Compiler Team.
Aarhus University, 2005Esmertec AG1 Implementing Object-Oriented Virtual Machines Lars Bak & Kasper Lund Esmertec AG
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
COP4020 Programming Languages
3-1 3 Compilers and interpreters  Compilers and other translators  Interpreters  Tombstone diagrams  Real vs virtual machines  Interpretive compilers.
CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson.
Intro to Java The Java Virtual Machine. What is the JVM  a software emulation of a hypothetical computing machine that runs Java bytecodes (Java compiler.
Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.
CS 355 – Programming Languages
A Portable Virtual Machine for Program Debugging and Directing Camil Demetrescu University of Rome “La Sapienza” Irene Finocchi University of Rome “Tor.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
Levels of Architecture & Language CHAPTER 1 © copyright Bobby Hoggard / material may not be redistributed without permission.
JIT in webkit. What’s JIT See time_compilation for more info. time_compilation.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
7. Just In Time Compilation Prof. O. Nierstrasz Jan Kurs.
Lecture 10 : Introduction to Java Virtual Machine
EMSOFT’02 Silicomp Research Institute JCOD 1 JCOD A Lightweight Modular Compilation Technology For Embedded Java Bertrand Delsart :
1 Introduction to JVM Based on material produced by Bill Venners.
Java Virtual Machine Case Study on the Design of JikesRVM.
Copyright © 2007 Addison-Wesley. All rights reserved.1-1 Reasons for Studying Concepts of Programming Languages Increased ability to express ideas Improved.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Netprog: Java Intro1 Crash Course in Java. Netprog: Java Intro2 Why Java? Network Programming in Java is very different than in C/C++ –much more language.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 7 OS System Structure.
IBM Software Group, Compilation Technology © 2007 IBM Corporation Some Challenges Facing Effective Native Code Compilation in a Modern Just-In-Time Compiler.
Virtual Support for Dynamic Join Points C. Bockisch, M. Haupt, M. Mezini, K. Ostermann Presented by Itai Sharon
Execution of an instruction
Instrumentation in Software Dynamic Translators for Self-Managed Systems Bruce R. Childers Naveen Kumar, Jonathan Misurda and Mary.
Virtual Machines, Interpretation Techniques, and Just-In-Time Compilers Kostis Sagonas
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
Full and Para Virtualization
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
CS 598 Scripting Languages Design and Implementation 12. Interpreter implementation.
AdaptJ Sookmyung Women’s Univ. PSLAB. 1. 목차 1. Overview 2. Collecting Trace Data using the AdaptJ Agent 2.1 Recording a Trace 3. Analyzing Trace Data.
*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring.
Oracle9i Developer: PL/SQL Programming Chapter 6 PL/SQL Packages.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Instruction Set Architectures Continued. Expanding Opcodes & Instructions.
Smalltalk Implementation Harry Porter, October 2009 Smalltalk Implementation: Optimization Techniques Prof. Harry Porter Portland State University 1.
Introduction to Operating Systems
PROGRAMMING LANGUAGES
Compiler Construction (CS-636)
Java Virtual Machine Complete subject details are available at:
2.1. Compilers and Interpreters
The Simplest Heuristics May Be The Best in Java JIT Compilers
CSc 453 Interpreters & Interpretation
Adaptive Code Unloading for Resource-Constrained JVMs
Inlining and Devirtualization Hal Perkins Autumn 2011
Inlining and Devirtualization Hal Perkins Autumn 2009
Instruction Set Architectures Continued
Lecture Topics: 11/1 General Operating System Concepts Processes
CSc 453 Interpreters & Interpretation
Dynamic Binary Translators and Instrumenters
Run-time environments
Just In Time Compilation
Presentation transcript:

Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08 Nikola Grcevski, IBM Canada Lab

Agenda The motivation and the importance of profiling Design and implementation of J9 VM interpreter profiler Performance results and start-up overhead

The static vs. dynamic compiler Static compilers can take their time to analyze the code - perform intra procedural analysis Dynamic Just-In-Time compilers don’t have this luxury, compilation happens during application runtime Can dynamic compilers ever produce quality optimized code comparable to static compilers?

Why profile? The whole category of speculative optimizations relies on some type of profiling information Opens up opportunities for new code and memory optimizations Critical for high performance dynamic compiler systems

What could we profile? Pretty much anything that we expect will provide repeatable information that we can use to optimize The profiling can be at the Java level or CPU level if the OS supports it.

What kind of profilers does J9 have JIT profiler –Instruments methods with various profiling hooks –Targeted only to methods that are very hot –Temporal and slows down execution Interpreter profiler –The topic of this presentation

What kinds of data we collect with the interpreter profiler? Branch direction Virtual/Interface call targets Switch statement index Instanceof and checkcast runtime types

Interpreter profiler design Buffered approach to data collection on the application threads ……. Application Thread 1 Application Thread N if vcall if vcall icall if switch mul add div

Interpreter profiler design Buffer full event triggers processing of the data by the JIT ……. Application Thread 1 if vcall if switch if Buffer full event JIT runtime

Interpreter profiler design JIT parses the application thread profiling buffer and builds internal profiling data structure JIT runtime JIT profiling hashtable data Bytecode program counter Profiling buffer Hash function based on bytecode PC

What’s in the data we collect? Bytecode program counter Variable size data packet –1 byte for branch direction –Word size for call targets and runtime types –4 bytes for switch index

Processing the buffered branch information We create an object to hold the bytecode PC and branch counts. We are using 4 bytes to store the branch information. pc; taken | not taken

What does the JIT do with the call information? We keep up to 3 call targets with their counts as well as residue count pc; Class A; Class B; Class C; count We use the same approach for checkcast and instanceof residue

What does the JIT do with the switch information? We create a data structure to hold the bytecode PC and counts for switch index. The index data is 8 bytes wide, split into 4 records: the top 3 and the rest. pc; record 1record 2record 3The rest count | index each record is split into 2 portions: 1 byte count and 1 byte switch index

Storing the profiling data Each data record is stored in global hashtable, using the PC for the hash function On subsequent encounters of the same PC with profiling data the records are updated. – Branch and switch counts are incremented – Call targets and runtime types are added and counts incremented.

Using the profiling information The profiler database only knows of bytecode PC At all points where the compiler is interested in profiling information it generates the bytecode pc from the method information and the bytecode index The compiler has to make sense out of the information in the hashtable

Interpreter profiler design JIT compiler consults the profiling hashtable in various stages of method compilation JIT profiling hashtable ……. Compilation Thread inliner order code codegen

Performance results Up to 30% improvement on various applications –EJB and other middleware applications benefit mostly from code ordering and devirtualization for the purpose of inlining –Benchmarks typically benefit from other optimization enabled by the ability to devirtualize virtual and interface calls With various tweaks we managed to drive the start-up over head to below 10%

How do we manage the profiling overhead? We turn the profiler off in –Xquickstart mode No locking on the hashtable We detect startup phase of the application and skip records to ease off the data collection overhead

Turning the profiler ON and OFF The profiler is ON by default The sampler thread turns the profiler OFF or back ON –Number of consecutive ticks in JIT generated code turns the profiler OFF –Number of consecutive ticks in interpreter turns the profiler back ON

Some of the problems we encountered Tuning for optimal balance between startup overhead and throughput performance wasn’t easy Application phase change detection wasn’t easy Class unloading created lots of problems

Summary Profiling is critical for performance of run-time systems Using buffered approach to data collection can help build efficient profilers Tuning for optimal balance of startup overhead and throughput performance is challenging