Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.

Slides:



Advertisements
Similar presentations
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Advertisements

Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Programmability Issues
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Program Representations. Representing programs Goals.
Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
Trace-based Just-in-Time Type Specialization for Dynamic Languages Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, David Mandelin, Mohammad R.
The Use of Traces for Inlining in Java Programs Borys J. Bradel Tarek S. Abdelrahman Edward S. Rogers Sr.Department of Electrical and Computer Engineering.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
Dynamic Tainting for Deployed Java Programs Du Li Advisor: Witawas Srisa-an University of Nebraska-Lincoln 1.
Parallelizing Compilers Presented by Yiwei Zhang.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Chapter 6: An Introduction to System Software and Virtual Machines
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Session-02. Objective In this session you will learn : What is Class Loader ? What is Byte Code Verifier? JIT & JAVA API Features of Java Java Environment.
Dynamic Optimization as typified by the Dynamo System See “Dynamo: A Transparent Dynamic Optimization System”, V. Bala, E. Duesterwald, and S. Banerjia,
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Chapter 3 Memory Management: Virtual Memory
5.3 Machine-Independent Compiler Features
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
JIT in webkit. What’s JIT See time_compilation for more info. time_compilation.
7. Just In Time Compilation Prof. O. Nierstrasz Jan Kurs.
O VERVIEW OF THE IBM J AVA J UST - IN -T IME C OMPILER Presenters: Zhenhua Liu, Sanjeev Singh 1.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
“Dynamo: A Transparent Dynamic Optimization System ” V. Bala, E. Duesterwald, and S. Banerjia, PLDI 2000 “Dynamo: A Transparent Dynamic Optimization System.
Java Virtual Machine Case Study on the Design of JikesRVM.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Online partial evaluation of bytecodes (3)
Dynamo: A Transparent Dynamic Optimization System Bala, Dueterwald, and Banerjia projects/Dynamo.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
Full and Para Virtualization
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Chapter – 8 Software Tools.
Compilation of XSLT into Dataflow Graphs for Web Service Composition Peter Kelly Paul Coddington Andrew Wendelborn.
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Just-In-Time Compilation. Introduction Just-in-time compilation (JIT), also known as dynamic translation, is a method to improve the runtime performance.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Code Optimization Overview and Examples
Code Optimization.
Online Subpath Profiling
Optimizing Transformations Hal Perkins Winter 2008
Virtual Machines (Introduction to Virtual Machines)
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Instruction Level Parallelism (ILP)
Creating Computer Programs
How to improve (decrease) CPI
Creating Computer Programs
Dynamic Binary Translators and Instrumenters
Just In Time Compilation
Code Optimization.
Presentation transcript:

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo

Motivations for Dynamic Optimization Object Oriented Languages results in delayed binding, reduces scope optimizations DLLs limit static compile-time optimizations Java JIT and dynamic binary translators are impractical for heavyweight static compiler optimization

Motivations for Dynamic Optimization Computer system vendors are totally reliant on software vendors to enable optimizations to take advantage of their hardware Software is now commonly installed on a network file system server and run on machines of varying configurations

Some Traditional Dynamic Optimization Techniques Compile-Time Multiversioning –Multiple versions of code sections are generated at compile-time –Most appropriate variant is selected at runtime based upon characteristics of the input data and/or machine environment –No runtime information can be exploited during code generation –Multiple variants can cause code explosion Thus typically only a few versions are created

Some Traditional Dynamic Optimization Techniques Dynamic Feedback –Similar to Compile-Time Multiversioning Multiple versions generated at compile-time No runtime information can be exploited during code generation Only a few versions created to prevent code explosion –Chooses variant by sampling Measures execution times for variants and selects the fastest

Some Traditional Dynamic Optimization Techniques Dynamic Compilation –Generates new code variants during program executions Takes advantage of runtime information –More overhead than the other methods –To reduce overheads Dynamic compilation is staged at compile-time Dynamic compilation only be applied to code sections that may benefit from it

ADAPT (Automated De-Coupled Adaptive Program Transformation) Michael J. Voss and Rudolf Eigenmann Purdue University

Overview of ADAPT ADAPT tries to combine the features of the other methods Uses a source-to-source compiler to perform optimizations Dynamic selection mechanism selects best code variant to run and does code generation (similar to JIT)

Intervals Optimization occurs at the granularity of intervals –Single entry, single exit Typically loop nests Source-to-source compiler replaces intervals with an if-else block that selects between a call to the Dynamic Selector and the default static version

Compiler Component ADAPT can use off-the-shelf compilers –Set different optimization flags or compilers to produce different variants Loop distribution Tiling Unrolling Automatic Parallelization

ADAPT Components The Inspector monitors the runtime environment –Timings –Each interval –Each optimized variant of the interval –Machine configuration Used to maintain and prioritize the Optimization Queue

ADAPT Components The Optimization Queue is a priority queue that orders interval descriptors by execution time –Used to minimize overheads by avoiding insignificant intervals The Dynamic Selector chooses variants to run –Variants become “stale” after a period of time and are removed

Sample Walkthrough Source-to-source compilation Start with static version and run Inspector tracks intervals for frequent use Optimized variants of the frequently used intervals and other runtime information are generated When an interval takes sufficiently long time, the Dynamic Selector is called and chooses the best variant –If the variant takes too long, go back to the static version

Dynamo: A Transparent Dynamic Optimization System HP Labs Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia

Overview of Dynamo Takes in binary instruction codes Optimizes the code dynamically without code annotations or binary means Transparent operation: –Accepts and optimizes legacy code –Runs like a hybrid user DLL and a virtual machine

How Dynamo Starts Dynamo takes over, takes snapshot of registers and environment stack. Dynamo activates the “interpreter” –Intercepts and scans the native code from the program like a filter Program Dynamo’s Interpreter CPU

How Dynamo Works Fragment Cache Potential Code Fragment Code Fragment Start trace End trace Optimize And link with Other code fragments In cache Already? yes no

Code Fragments The interpreter can create and optimize code fragments Code fragments are code traces –Code trace starts when a certain piece of code is executed many times –Program most likely to follow the same path while tracing

Code Fragments Code fragments consist of: –Start which is the line of code after a taken backward branch –End which is a backward taken branch or another branch leading to another fragment Easy to optimize code fragment –One entrance, multiple exits –Requires one iteration of a backward and forward data flow analysis

Optimizations of Fragments Remove branches expressing fall-throughs only Keeps conditional branches A BC D A C D Optimizations include: Constant Propagation Copy Propagation Loop Invariant Strength Reduction Branch, load, assignment redundancy

Linking Cached Fragments Conditional branches and exits may lead to links to other fragments in cache which speeds up Dynamo. If no fragments exist, start another trace. A C D B D E F G true false Start tracing again

Cache Management New fragment entries require to create links between existing fragments Deletion of fragment requires removal of all links which is slow Cache may get filled up with fragments –Flush when a lot of new code is being traced –Means you have entered a new section of the program

Performance Single PA-8000 processor SpecInt95 benchmarks, compiled with –O2, O4, O2+P, O4+P with/without Dynamo –O2 + Dynamo ran as well as O4 native –O4 + P ran as well with or without Dynamo Overhead of Dynamo was 1.5% of the execution time with SpecInt95.

Conclusion ADAPT and Dynamo are –Opposite approaches of Internal representation, single-exit-single-entry or single-entry-multiple-exits Dynamo use binary code and ADAPT uses source- to-source high level code –Both use standard compilers, no special annotations, and utilize runtime info ADAPT can allow programmers to customize the selection of optimizations.