Dynamic Compilation Vijay Janapa Reddi The University of Texas at Austin Adaptive Optimizations in Virtual Machines.

Slides:



Advertisements
Similar presentations
Chapt.2 Machine Architecture Impact of languages –Support – faster, more secure Primitive Operations –e.g. nested subroutine calls »Subroutines implemented.
Advertisements

Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa Computer Architecture Department Universitat.
7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
TM Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000.
Aarhus University, 2005Esmertec AG1 Implementing Object-Oriented Virtual Machines Lars Bak & Kasper Lund Esmertec AG
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Intermediate Code. Local Optimizations
Lecture 1CS 380C 1 380C Last Time –Course organization –Read Backus et al. Announcements –Hadi lab Q&A Wed 1-2 in Painter 5.38N –UT Texas Learning Center:
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
COP4020 Programming Languages
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Session-02. Objective In this session you will learn : What is Class Loader ? What is Byte Code Verifier? JIT & JAVA API Features of Java Java Environment.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Embedded Java Research Geoffrey Beers Peter Jantz December 18, 2001.
Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.
380C Lecture 15 Where are we & where we are going –Global Analysis & Optimization Dataflow & SSA Constants, Expressions, Scheduling, Register Allocation.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University.
CS 355 – Programming Languages
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
7. Just In Time Compilation Prof. O. Nierstrasz Jan Kurs.
1 History of compiler development 1953 IBM develops the 701 EDPM (Electronic Data Processing Machine), the first general purpose computer, built as a “defense.
Adaptive Optimization in the Jalapeño JVM Matthew Arnold Stephen Fink David Grove Michael Hind Peter F. Sweeney Source: UIUC.
Lecture 10 : Introduction to Java Virtual Machine
The Jikes RVM | Ian Rogers, The University of Manchester | Dr. Ian Rogers Jikes RVM Core Team Member Research Fellow, Advanced.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Java Virtual Machine Case Study on the Design of JikesRVM.
Copyright © 2007 Addison-Wesley. All rights reserved.1-1 Reasons for Studying Concepts of Programming Languages Increased ability to express ideas Improved.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Virtual Machines, Interpretation Techniques, and Just-In-Time Compilers Kostis Sagonas
1 EECS 6083 Compiler Theory Based on slides from text web site: Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation I John Cavazos University.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Method Profiling John Cavazos University.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
Just-In-Time Compilation. Introduction Just-in-time compilation (JIT), also known as dynamic translation, is a method to improve the runtime performance.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Code Optimization.
Chapter 1 Introduction.
Introduction to Advanced Topics Chapter 1 Text Book: Advanced compiler Design implementation By Steven S Muchnick (Elsevier)
Chapter 1 Introduction.
Optimization Code Optimization ©SoftMoore Consulting.
Inlining and Devirtualization Hal Perkins Autumn 2011
Inlining and Devirtualization Hal Perkins Autumn 2009
Adaptive Optimization in the Jalapeño JVM
The Challenge of Cross - Language Interoperability
CSc 453 Interpreters & Interpretation
JIT Compiler Design Maxine Virtual Machine Dhwani Pandya
Just In Time Compilation
Presentation transcript:

Dynamic Compilation Vijay Janapa Reddi The University of Texas at Austin Adaptive Optimizations in Virtual Machines

- 1 - Outline v Background »Why software optimization matters »Myths, terminology, and historical context »How programs are executed v Adaptive optimization v Toward Feedback-directed optimization

- 2 - Quick Peek at Frequently Used PLs

- 3 - v Software development is difficult Median lines of code Developing Sophisticated Software

- 4 - Developing Sophisticated Software v Software development is difficult Median lines of code Lines per hour

- 5 - Developing Sophisticated Software v Software development is difficult v PL & SE innovations, such as »Dynamic memory allocation, object- oriented programming, strong typing, components, frameworks, design patterns, aspects, etc. v Resulting in modern languages with many benefits »Better abstractions »Reduced programmer efforts »Better (static and dynamic) error detection »Significant reuse of libraries v Have helped enable the creation of large, sophisticated applications

- 6 - The Trade-Off w/ Sophisticated SW v Implementing those features pose performance challenges »Dynamic memory allocation  Need pointer knowledge to avoid conservative dependences »Object-oriented programming  Need efficient virtual dispatch, overcome small methods, extra indirection »Automatic memory management  Need efficient allocation and garbage collection algorithms »Runtime bindings  Need to deal with unknown information »... v Features require a rich runtime environment  virtual machine Performance challenges!!! What are the challenges?

- 7 - Type Safe, OO, VM-implemented Languages Are Mainstream v Java is ubiquitous »eg. Hundreds of IBM products are written in Java v “Dynamic” languages are widespread and run on a VM »eg. Perl, Python, PHP, etc. v These languages are not just for traditional applications »Virtual Machine implementation, eg. Jikes RVM »Operating Systems, eg. Singularity »Real-time and embedded systems, eg. Metronome-enabled systems »Massively parallel systems, eg. DARPA-supported efforts at IBM, Sun, and Cray v Virtualization is everywhere »browsers, databases, O/S, binary translators, VMMs, in hardware, etc.

- 8 - Have We Answered the Performance Challenges? v So far, so good … »“Today’s typical application on today’s hardware runs as fast as 1970s typical application on 1970s typical hardware” – statement by Michael Hind from IBM »Features expand to consume available resources… »eg. Current IDEs perform compilation on every save v Where has the performance come from? »Processor technology, clock rates (X%) »Architecture design (Y%) »Software implementation (Z%) »X + Y + Z = 100%

- 9 - Future Trends in Software v Software development is still difficult »PL/SE innovation will continue to occur »Trend toward more late binding, resulting in dynamic requirements »Will pose further performance challenges v Real software is now built by piecing components together »Components themselves are becoming more complex, general purpose »Software built with them is more complex »Performance is often terrible  J2EE benchmark creates 10 business objects (w/ 6 fields) from a SOAP message [Mitchell et al., ECOOPE06] > 10,000 calls > 1,400 objects created v Traditional compiler optimization wouldn’t help much v Optimization at a higher semantic level could be highly profitable Why?!

Future Trends in Hardware v Processor speed advances not as great as in the past (x << X?) v Computer architects providing multicore machines »Will require software to utilize these resources »Not clear if it will contribute more than in the past (y ? Y) v Thus, one of the following will happen »Overall performance will decline »Increase in software sophistication will slow »Software implementation will pick up the slack (z > Z)

Dynamic Compiler Performance – Myths::Fact or Fiction?! v Because they execute at runtime, dynamic compilers must be blazingly fast v Dynamic class loading is a fundamental roadblock to cross-method optimization v Sophisticated profiling is too expensive to perform online v A static compiler will always produce better code than a dynamic compiler v Production VMs avoid complex optimizations, favoring stability over performance

How are Programs Executed? v Interpretation »Low startup overhead, but much slower than native code execution  Popular approach for high-level languages u Ex., APL, SNOBOL, BCPL, Perl, Python, MATLAB  Useful for memory-challenged environments v Classic just-in-time compilation »Compile each method to native code on first invocation  Ex., DAISY, Pin, …, ParcPlace Smalltalk-80, Self-91  Initial high (time & space) overhead for each compilation  Precludes use of sophisticated optimizations (eg. SSA, etc.)

Recall What a JIT Does? v Code generation component of a virtual machine v Compiles bytecodes to in-memory binary machine code »Simpler front-end and back-end than traditional compiler  Not responsible for source-language error reporting  Doesn’t have to generate object files or relocatable code v Compilation is interspersed with program execution »Compilation time and space consumption are very important v Compile program incrementally; unit of compilation is a method »JIT may never see the entire program »Must modify traditional notions of IPA (Interprocedural Analysis)

JIT Design Requirements v High performance (of executing application) »Generate “reasonable” code at “reasonable” compile time costs »Selective optimization enables multiple design points v Deployed on production servers  RAS »Reliability, Availability, Serviceability »Facilities for logging and replaying compilation activity v Tension between high performance and RAS requirements »Especially in the presence of (sampling-based) feedback- directed opts »So far, a bias to performance at the expense of RAS, but that is changing as VM technology matures  Ogato et al., OOPSLA’06 discuss this issue

Interpretation vs. (Dynamic) Compilation v Example: 500 methods v Assume: Compiler gives 4x speedup, but has 20x overhead v Short running: Interpreter is best v Long running: Compilation is best

Interpretation vs. JITing Tradeoff

Selective Optimization v Hypothesis: most execution is spent in a small % of methods »90/10 (or 80/20) rule v Idea: use two execution strategies »Unoptimized: interpreter or non-optimizing compiler »Optimized: Full-fledged optimizing compiler v Strategy »Use unoptimized execution initially for all methods »Profile application to find “hot” subset of methods  Optimize this subset  Often many times

Selective Optimization Examples v Adaptive Fortran: interpreter + 2 compilers v Self’93: non-optimizing + optimizing compilers v JVMs »Interpreter + compilers: Sun’s HotSpot, IBM DK for Java, IBM’s J9 »Jikes RVM, Intel’s Judo/ORP, BEA’s JRockit v CLR »only 1 runtime compiler, i.e., a classic JIT »But, also use ahead-of-time (AOT) compilation (NGEN)

Jikes RVM [Fink et al., OOPSLA’02 tutorial] v Java bytecodes to IA32,PPC/32 v 3 levels of Intermediate Representation (IR) »Register-based; CFG of extended basic blocks »HIR: operators similar to Java bytecode »MIR: expands complex operators, exposes runtime system implementation details (object model, memory management) »LIR: target-specific, very close to target instruction set v Multiple optimization levels »Suite of classical optimizations and some Java-specific optimizations »Optimizer preserves and exploits Java static types all the way through LIR »Many optimizations are guided by profile-derived branch probabilities

Jikes RVM Opt Level 0 (JIT 0) v On-the-fly (bytecode  IR) »constant, type and non-null propagation, constant folding, branch optimizations, field analysis, unreachable code elimination v BURS-based instruction selection v Linear scan register allocation v Inline trivial methods (methods smaller than a calling sequence) v Local redundancy elimination (CSE, loads, exception checks) v Local copy and constant propagation; constant folding v Simple control flow optimizations »Static splitting, tail recursion elimination, peephole branch opts v Simple code reordering v Scalar replacement of aggregates & short arrays v One pass of global, flow-insensitive copy and constant propagation and dead assignment elimination

Jikes RVM Opt Level 1 (JIT 1) v Much more aggressive inlining »Larger space thresholds, profile-directed »Speculative CHA (recover via preexistence and OSR) v Runs multiple passes of many level 0 optimizations v More sophisticated code reordering algorithm [Pettis&Hansen] v Aggressive inlining is currently the primary difference between level 0 and level 1 v Over time many optimizations shifted from level 1 to level 0

Jikes RVM Opt Level 2 (JIT 2) v Loop normalization, peeling & unrolling v Scalar SSA »Constant & type propagation »Global value numbers »Global CSE »Redundant conditional branch elimination v Heap Array SSA »Load/store elimination »Global code placement (PRE/LICM)

Selective Optimization Effectiveness v Jikes RVM [Arnold et al.,TR Nov’04]

Selective Optimization Effectiveness v Jikes RVM [Arnold et al.,TR Nov’04] Why is selective better? Why is selective not a lot better?

Designing an Adaptive Optimization System v What is the system architecture for implementing selective optimization? v What is the profiling mechanism to gather information v What is the policy for driving recompilation? v How effective are existing systems?

Profiling Techniques v Program instrumentation »e.g. basic block counters, value profiling v Sampling [Whaley, JavaGrande’00; Arnold&Sweeney TR’00; Arnold&Grove, CGO’05; Zhuang et al. PLDI’06] »e.g. sample method running, call stack at context switch v Hybrid: [Arnold&Ryder, PLDI’01] »combine sampling and instrumentation v Runtime service monitors [Deutsch&Schiffman, POPL’84,Hölzle et al., ECOOP’91; Kawachiya et al., OOPSLA’02; Jones&Lins’96] »e.g. dispatch tables, synchronization services, GC v Hardware performance monitors: [Ammons et al. PLDI’97; Adl-Tabatabai et al., PLDI’04] »e.g. drive selective optimization, suggest locality improvements

Feedback-Directed Optimization (FDO) v Exploit information gathered at runtime to optimize execution »“selective optimization”: what to optimize »“FDO”: how to optimize  Similar to offline profile-guided optimization  Only requires 1 run! v Advantages of FDO [Smith’00] »Can exploit dynamic information that cannot be inferred statically »System can change and revert decisions when conditions change »Runtime binding has advantages v Performed in many systems »Eg, Jikes RVM, 10% improvement using FDO  Using basic block frequencies and call edge profiles v Many opportunities to use profile info during various compiler phases »Almost any heuristic-based decision can be informed by profile data  Inlining, code layout, multiversioning, register allocation, global code motion, exception handling optimizations, loop unrolling, speculative stack allocation, software prefetching

Common FDO Techniques v Compiler optimizations »Inlining »Code Layout »Multiversioning »Potpourri v Runtime system optimizations »Caching »Speculative meta-data representations »GC Acceleration »Locality optimizations

FDO Potpourri v Many opportunities to use profile info during various compiler phases v Almost any heuristic-based decision can be informed by profile data v Large range of examples… »Loop unrolling  Unroll “hot” loops only »Register allocation  Spill in “cold” paths first »Global code motion  Move computation from hot to cold blocks »Exception handling optimizations  Avoid expensive runtime handlers for frequent exceptional flow »Speculative stack allocation  Stack allocate objects that escape only on cold paths »Software prefetching  Profile data guides placement of prefetch instructions

How much performance gain is interesting? v Quiz: An optimization needs to produce > X% performance improvement to be considered interesting. X = ? »a) 1% b) 5% c) 10% d) 20% »Sometimes research papers with < 5-10% improvement are labeled failures v Answer: it depends on complexity of the solution »Value = performance gain / complexity »Every line of code requires maintenance, and is a possible bug  10 LOC yielding 1.5% speedup u Product team may incorporate in VM by end of week  25,000 LOC yielding 1.5% speedup: u Not worth the complexity

Summary v Learned about the importance of adaptive optimizations in VMs »need for adaptive optimization »selective optimization »FDO v Next class »Starting next class we will go into the details of profiling and adaptive optimizations