IBM Software Group, Compilation Technology © 2007 IBM Corporation Some Challenges Facing Effective Native Code Compilation in a Modern Just-In-Time Compiler.

Slides:



Advertisements
Similar presentations
Paging: Design Issues. Readings r Silbershatz et al: ,
Advertisements

IBM JIT Compilation Technology AOT Compilation in a Dynamic Environment for Startup Time Improvement Kenneth Ma Marius Pirvu Oct. 30, 2008.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Compilation Technology Oct. 16, 2006 © 2006 IBM Corporation Software Group Reducing Startup Costs of Java Applications with Shared Relocatable Code Derek.
Java.  Java is an object-oriented programming language.  Java is important to us because Android programming uses Java.  However, Java is much more.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
IBM Software Group © 2004 IBM Corporation Compilation Technology Java Synchronization : Not as bad as it used to be! Mark Stoodley J9 JIT Compiler Team.
IBM Software Group © 2005 IBM Corporation Compilation Technology Toward Deterministic Java Performance Mark Stoodley, Mike Fulton Toronto Lab, IBM Canada.
Chapter 2: Processes Topics –Processes –Threads –Process Scheduling –Inter Process Communication (IPC) Reference: Operating Systems Design and Implementation.
Aarhus University, 2005Esmertec AG1 Implementing Object-Oriented Virtual Machines Lars Bak & Kasper Lund Esmertec AG
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
W4118 Operating Systems OS Overview Junfeng Yang.
Compilation Technology October 17, 2005 © 2005 IBM Corporation Software Group Reducing Compilation Overhead in J9/TR Marius Pirvu, Derek Inglis, Vijay.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 4 Assessing and Understanding Performance
Portability CPSC 315 – Programming Studio Spring 2008 Material from The Practice of Programming, by Pike and Kernighan.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could easily spend ten weeks But, don’t have ten weeks, so.
SM3121 Software Technology Mark Green School of Creative Media.
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Introduction to Computing By Engr. Bilal Ahmad. Aim of the Lecture  In this Lecture the focus will be on Technology, we will be discussing some specifications.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Adaptive Optimization in the Jalapeño JVM Matthew Arnold Stephen Fink David Grove Michael Hind Peter F. Sweeney Source: UIUC.
Lecture 10 : Introduction to Java Virtual Machine
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
Compilation Technology © 2007 IBM Corporation CGO Performance Overheads In Real-Time Java Programs Mark Stoodley and Mike Fulton Compilation.
Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.
Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Full and Para Virtualization
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Msdevcon.ru#msdevcon. ИЗ ПЕРВЫХ РУК: ДИАГНОСТИКА ПРИЛОЖЕНИЙ С ПОМОЩЮ ИНСТРУМЕНТОВ VISUAL STUDIO 2012 MAXIM GOLDIN Senior Developer, Microsoft.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Lecture 4 Page 1 CS 111 Summer 2013 Scheduling CS 111 Operating Systems Peter Reiher.
Mohit Aron Peter Druschel Presenter: Christopher Head
Jacob R. Lorch Microsoft Research
Software Architecture in Practice
The Simplest Heuristics May Be The Best in Java JIT Compilers
CSCI1600: Embedded and Real Time Software
Department of Computer Science University of California, Santa Barbara
The Tail At Scale Dean and Barroso, CACM 2013, Pages 74-80
Portability CPSC 315 – Programming Studio
Adaptive Code Unloading for Resource-Constrained JVMs
Introduction to Virtual Machines
Introduction to Virtual Machines
Department of Computer Science University of California, Santa Barbara
CSCI1600: Embedded and Real Time Software
Nikola Grcevski Testarossa JIT Compiler IBM Toronto Lab
Just In Time Compilation
Presentation transcript:

IBM Software Group, Compilation Technology © 2007 IBM Corporation Some Challenges Facing Effective Native Code Compilation in a Modern Just-In-Time Compiler Mark Stoodley and Compilation Control Team Testarossa JIT Compiler Team IBM Toronto Lab

IBM Software Group © 2007 IBM Corporation 2Compilation Technology Outline  Identification Challenge –Finding the right methods to compile  Effectiveness Challenge –What is the right way to compile those methods?  Timing Challenge –When is the right time to compile those methods?  Summary

IBM Software Group © 2007 IBM Corporation 3Identification Challenge Identification Challenge: Finding the right methods to compile  What are the right methods? –Methods that will execute a lot in future –Methods that benefit best from compilation  Race with the program itself –Want to discover methods as early as possible –But minimize false positives  Cannot afford much overhead

IBM Software Group © 2007 IBM Corporation 4Identification Challenge Finding Methods with Invocation Count  We detect “hot” interpreted methods via counts –Theory: invoked a lot means program executes it a lot  On the plus side: –Easy to implement, low overhead for interpreter  But : –Frequently invoked methods don’t necessarily consume lots of CPU and may not be good compilation choices e.g. getters and setters invoked a lot but don’t consume CPU e.g. a big matrix multiply invoked less but consumes CPU

IBM Software Group © 2007 IBM Corporation 5Identification Challenge Finding Methods with Sampling  Periodically record top method of active thread stacks  In theory: –M of N ticks in one method  consuming M/N of CPU  In practice: –Depends on application characteristics –Hindered by sampling granularity

IBM Software Group © 2007 IBM Corporation 6Identification Challenge Thinking about Sampling  Most operating systems give sampling period  10ms –10ms  100Hz is Nyquist rate for signal with max frequency 50Hz Of course, we don’t need perfect knowledge of the input “signal”  Processor speeds measured in GHz –Method invocation rate still near or in MHz band  Sampling works best for applications where methods execute for a looooooong time, e.g. apps with hot spots –What about programs that don’t have hot spots?

IBM Software Group © 2007 IBM Corporation 7Identification Challenge Sampling Effectiveness Depends on Platform and Application Characteristics  More stuff happens in 10ms on faster machines than on slower machines –Raw machine speeds vary widely –Virtualized targets are entering the mainstream –Emulated targets seem really slow  Matters more for applications without hot spots than for those with hotspots –Sampling will find hot spots –Sampling frequency too course even on slow machines when no hot spots

IBM Software Group © 2007 IBM Corporation 8Identification Challenge The Identification Challenge  Identify the methods burning CPU as quickly as possible with low overhead –Machine speeds leveling off, but still a wide range of frequencies especially virtualized/emulated platforms –More cores Beware synchronization Cache per thread decreasing: cache footprint critical –Application characteristics evolving New layers of abstraction Easy to write lots of code automatically (visual interface) Increased use of generated classes

IBM Software Group © 2007 IBM Corporation 9Identification Challenge Identification Challenge: Steps We’ve Taken  Sampling framework: relative hotness –Compiles not only triggered by absolute sample count –Instead: how hot is this method compared to all others  Sampling windows adjusted based on method size –More likely to catch samples in big methods –Small hot methods harder to find Make it easier for them to reach compilation trigger  Large set of heuristics in this space

IBM Software Group © 2007 IBM Corporation 10Effectiveness Challenge Effectiveness Challenge: What’s the right way to compile a method?  Depends on many factors: –Application phase –Application requirements –Application characteristics –Availability of resources –System utilization

IBM Software Group © 2007 IBM Corporation 11Effectiveness Challenge Example: Middleware Server + Application  IBM WebSphere Application Server startup: –Loads  15,000 classes (includes DayTrader application) –Executes more than 38,000 methods –Takes 10s – 120s, depending on platform  Application then runs for extended period  Some methods active in start-up and steady-state –Forces trade-off: start-up vs. steady-state performance

IBM Software Group © 2007 IBM Corporation 12Effectiveness Challenge Start-up executes lots of methods a few times  Want to compile many many methods cheaply –Native code performance for highest number of methods –Cheap compilations means better coverage Also methods can appear hot at startup that aren’t important later –Benefit of aggressive optimization is lower Class hierarchy is highly unstable  Careful about methods also active in steady-state –Cheap compilations also means slower code –Will need fast performance for these methods in steady-state

IBM Software Group © 2007 IBM Corporation 13Effectiveness Challenge Steady-State is very different from Start-up  Flat profile, thousand(s) of active methods  Want to compile many methods more aggressively –Best throughput performance –Class hierarchy stabilized so aggressive opts more worthwhile –Application code complexity requires profiling and analysis –Large application scale limits effectiveness of some opts  Tough to find methods that matter due to flat profile –Also to upgrade cheap compilations from startup that matter

IBM Software Group © 2007 IBM Corporation 14Effectiveness Challenge Classic Phase Identification Problem, Right?  Distinguish “start-up” from “steady-state” –Apply different compilation strategy in each phase  Testarossa uses class load phase heuristic –Loading lots of classes means start-up Compilations during class load phase done cheaply (cold) Compilations outside class load phase more aggressive (warm)  Mostly works –But not easy

IBM Software Group © 2007 IBM Corporation 15Effectiveness Challenge Class Load Phase Heuristic Complexities  What does “lots” of classes mean? –Need to establish some threshold –IBM SE JVMs supports 12 platforms Ranging from laptops to mainframes Processor / memory / disk speeds vary substantially from machine to machine and platform to platform Especially with growth of virtualized and emulated targets  Sensitivity –How long to wait before saying in or out? –How long does the decision last?

IBM Software Group © 2007 IBM Corporation 16Effectiveness Challenge More Complexities  Compiles hurt more on some platforms than others –Slower systems seem to pay a higher (relative) price –Easy to miss mistakes because they don’t hurt you everywhere  Not all class loads are equal –Classes vary widely in size –Increased use of generated classes Tools that “precompile” to bytecode

IBM Software Group © 2007 IBM Corporation 17Effectiveness Challenge Some (Annoying) Facts Fact #1: Lots of people care about how fast the application can process transactions (steady-state throughput) Fact #2: Lots of people care about how fast the server can start (startup time) Personal Observation: These two sets intersect less than I’d like Fact #3: Everyone wants what they care about to get better Really Annoying Fact of Life: People complain a LOT if the thing they care about gets worse Really Annoying Fact of Life: Customers rarely care if something works well for their platform but not for another platform

IBM Software Group © 2007 IBM Corporation 18Effectiveness Challenge …And I’ve Simplified the Problem  Other criteria matter too (not just start-up and throughput) –Throughput ramp-up time –Throughput variability from run to run –Maximum application pause –Application utilization –Power and energy consumption are becoming important –Memory for code and used by JIT  All these criteria are also sensitive to the target platform  Matter to varying degrees, from not at all to very very much  Evolving heuristics is really hard

IBM Software Group © 2007 IBM Corporation 19Effectiveness Challenge The Effectiveness Challenge  Properly account for relative importance of a growing set of criteria when generating native code while adapting to the characteristics of increasingly complex applications running on a wide range of targets  We’ve always had to deal with platform sensitivity –Increases the challenge

IBM Software Group © 2007 IBM Corporation 20Effectiveness Challenge Effectiveness Challenge: Steps We’ve Taken  Adaptive class load phase –Tries to adjust for different machine speeds  Ahead-Of-Time (AOT) compilation –Store code in a persistent cache –Avoid compilation cost completely –Can be used to amortize compilation cost across JVMs –Trade-off is lower code quality for Java conformant persistence

IBM Software Group © 2007 IBM Corporation 21Timing Challenge Timing Challenge: When is the Right Time to Compile a Method?  A1: As early as possible (maximize benefit?)  A2: After behaviour has “settled” –Resolve references, class hierarchy stabilized, code paths executed  A3: When application is idle (minimize impact?)  A4: Not “now” –e.g. Real-time applications have utilization expectations –e.g. CPU consumption may cost money  A5: RIGHT NOW!

IBM Software Group © 2007 IBM Corporation 22Timing Challenge The Timing Challenge  Compile methods at the right time to maximize benefit and minimize impact to application –Current approach relies on when we identify a method –“Benefit” comes back to effectiveness

IBM Software Group © 2007 IBM Corporation 23Compilation Technology Timing Challenge: Steps We’ve Taken  Avoid aggressive class hierarchy optimizations during startup  Real-Time: lower compilation thread priority so real-time tasks take precedence  Real-Time: Avoid compiling while GC is active  Dynamic Loop Transfer –Identify interpreted methods “stuck” in a loop –Generate a compiled body that can accept transition from interpreter on the loop back-edge

IBM Software Group © 2007 IBM Corporation 24Compilation Technology Summary  Three big challenges facing effective native code generation in modern JITs: –Identification Challenge: find the right methods to compile –Effectiveness Challenge: compile methods in the right way –Timing Challenge: compile methods at the right time  Complex system: lots of overlap in these challenges  Any functional JIT must deal with these challenges –But degree of success varies!

IBM Software Group © 2007 IBM Corporation 25Compilation Technology Questions?  Mark Stoodley 

IBM Software Group © 2007 IBM Corporation 26Compilation Technology Backup Slides

IBM Software Group © 2007 IBM Corporation 27Compilation Technology When do you find the method?  After program completes üGood knowledge of what methods matter  But no opportunity to improve program execution  Before program executes üMost opportunity to improve execution time  But no knowledge of what to focus on People tend to get stuck on “most opportunity” But how much improvement will native code bring? Answer can depend on when you compile it