Nikola Grcevski Testarossa JIT Compiler IBM Toronto Lab

Slides:



Advertisements
Similar presentations
Operating Systems: Monitors 1 Monitors (C.A.R. Hoare) higher level construct than semaphores a package of grouped procedures, variables and data i.e. object.
Advertisements

A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Testarossa JIT Compilation Technology © 2012 IBM Corporation Exceptions: Not so rare as you'd think --Handling Exception Faster Chao Chen, Nikola Grcevski.
IBM Software Group © 2004 IBM Corporation Compilation Technology Java Synchronization : Not as bad as it used to be! Mark Stoodley J9 JIT Compiler Team.
IBM Software Group © 2005 IBM Corporation Compilation Technology Toward Deterministic Java Performance Mark Stoodley, Mike Fulton Toronto Lab, IBM Canada.
My One Slide. Seriously, though… Preemptive threads… …are difficult to reason about. Cooperative threads… …might refuse to yield. Semi-cooperative threads…
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
1 Thread Pools. 2 What’s A Thread Pool? A programming technique which we will use. A collection of threads that are created once (e.g. when server starts).
CS 3013 & CS 502 Summer 2006 Threads1 CS-3013 & CS-502 Summer 2006.
Introduction to Java.
Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.
Parallel Processing (CS526) Spring 2012(Week 8).  Thread Status.  Synchronization in Shared Memory Programming(Java threads ) ◦ Locks ◦ Barriars.
A Multithreading C# Data Synchronization Program and Its Realization Course: ECE 1747H Parallel Programming Professor: Christiana Amza Student / Presenter:
CS 153 Design of Operating Systems Spring 2015
Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.
Adaptive Optimization in the Jalapeño JVM Matthew Arnold Stephen Fink David Grove Michael Hind Peter F. Sweeney Source: UIUC.
Threads in Java. History  Process is a program in execution  Has stack/heap memory  Has a program counter  Multiuser operating systems since the sixties.
Compilation Technology © 2007 IBM Corporation CGO Performance Overheads In Real-Time Java Programs Mark Stoodley and Mike Fulton Compilation.
Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08.
 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for.
4.1 Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Internet Software Development Controlling Threads Paul J Krause.
SPL/2010 Guarded Methods and Waiting 1. SPL/2010 Reminder! ● Concurrency problem: asynchronous modifications to object states lead to failure of thread.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
Threads. Readings r Silberschatz et al : Chapter 4.
Lecture 3 Concurrency and Thread Synchronization     Mutual Exclusion         Dekker's Algorithm         Lamport's Bakery Algorithm.
A Region-Based Compilation Technique for a Java Just-In-Time Compiler Toshio Suganuma, Toshiaki Yasue and Toshio Nakatani Presenter: Ioana Burcea.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
4.1 Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads.
A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.
Concurrent Programming in Java Based on Notes by J. Johns (based on Java in a Nutshell, Learning Java) Also Java Tutorial, Concurrent Programming in Java.
Week 8, Class 3: Model-View-Controller Final Project Worth 2 labs Cleanup of Ducks Reducing coupling Finishing FactoryMethod Cleanup of Singleton SE-2811.
Chapter 4 – Thread Concepts
Applications Active Web Documents Active Web Documents.
Presented by Mike Marty
Background on the need for Synchronization
Outline Other synchronization primitives
Chapter 4 – Thread Concepts
Multithreaded Programming in Java
Faster Data Structures in Transactional Memory using Three Paths
Other Important Synchronization Primitives
Lecture 21 Concurrency Introduction
Effective Data-Race Detection for the Kernel
Automatic Detection of Extended Data-Race-Free Regions
Definitions Concurrent program – Program that executes multiple instructions at the same time. Process – An executing program (the running JVM for Java.
Lecture 10: Threads Implementation
Optimize Your Java Code By Tools
Inlining and Devirtualization Hal Perkins Autumn 2011
Lecture 2 Part 2 Process Synchronization
Fast Communication and User Level Parallelism
Real Time Java : Synchronization
CS510 - Portland State University
Lecture 10: Threads Implementation
Why Threads Are A Bad Idea (for most purposes)
CSE 153 Design of Operating Systems Winter 19
Chapter 6: Synchronization Tools
Decomposing Hardware Lock Elision
CSE 451 Section 1/27/2000.
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
CS703 – Advanced Operating Systems
CSC Multiprocessor Programming, Spring, 2011
Practical Assignment Sinking for Dynamic Compilers
More concurrency issues
Just In Time Compilation
Presentation transcript:

Nikola Grcevski Testarossa JIT Compiler IBM Toronto Lab Effective method for Java Lock Reservation for JVMs that implement cooperative multithreading Nikola Grcevski Testarossa JIT Compiler IBM Toronto Lab To replace the title / subtitle with your own: Click on the title block -> select all the text by pressing Ctrl+A -> press Delete key -> type your own text IBM Toronto & Ottawa Labs 10/10/2019

Notes about the presentation The technology was developed with cooperation with J9 JVM team in Ottawa The following presentation contains IBM patent pending material IBM Toronto & Ottawa Labs 10/10/2019

Presentation structure Background on Java locking Introduction to lock optimization techniques Our approach to lock reservation Results Summary IBM Toronto & Ottawa Labs 10/10/2019

Background on Java locking Synchronization is built into the language Java classes found in libraries are designed to be thread safe Java applications tend to be multithreaded and they need synchronization IBM Toronto & Ottawa Labs 10/10/2019

How much synchronization do Java programs need? Studies have found that majority of Java programs don’t need a lot of synchronization Because of library code use Java programs tend to pull in a lot of synchronization “automatically” Synchronization comes with a cost Even without any contention IBM Toronto & Ottawa Labs 10/10/2019

Compiler solutions for reducing synchronization overhead for unnecessary locks Introduction of bi-modal locks in Java Merging lock regions together Lock reservation and ownership IBM Toronto & Ottawa Labs 10/10/2019

Bi-modal Java locks Use OS level mutex only when handling real contention Also called fat lock Use per object field for quick way of marking an object as locked by one thread only Also called thin lock This locking mechanism isn’t free, it requires use of platform specific coherence instructions IBM Toronto & Ottawa Labs 10/10/2019

Lock coarsening approach Merge more than one locked region locking on same object Reduces number of monitor enter and monitor exit operations Limited to a method scope Interfering monitor operations and calls break it IBM Toronto & Ottawa Labs 10/10/2019

Lock reservation The basic idea is to avoid unlocking an object The object becomes reserved for that thread Subsequent locks by the same thread are fast Locking the object from another thread requires canceling the reservation IBM Toronto & Ottawa Labs 10/10/2019

Why is entering reserved lock faster? The main overhead of entering and exiting an uncontended lock are the platform specific coherence instructions required With reservation we can replace some of the coherence instructions with a check if the lock is reserved for the locking thread We also need state change instructions on enter and exit to distinguish locked and reserved from reserved only IBM Toronto & Ottawa Labs 10/10/2019

Lock reservation in action Thread 1 (T1) Thread 2 (T2) object monenter Locked by T1 monexit Reserved for T1 monenter Locked by T1 monexit Reserved for T1 monenter Locked for T2 monenter – monitor enter operation to take the lock monexit – monitor exit operation to release the lock IBM Toronto & Ottawa Labs 10/10/2019

Great! So what is the problem? Lock reservation canceling is expensive Requires stopping the thread that holds the reservation What if the thread can be stopped in middle of monitor enter or monitor exit The monitor state is non-trivial to deduce while running monitor enter or monitor exit Therefore, lock reservation can be costly and increase contention IBM Toronto & Ottawa Labs 10/10/2019

Our approach to lock reservation J9 JVM implements cooperative threading model Threads can only stop at well defined yield points Selective reservation based on the Java code properties Runtime detection of excessive reservation cancellation and back-out IBM Toronto & Ottawa Labs 10/10/2019

Cooperative vs. preemptive threading models Preemptive – java threads can be stopped at any point in time Cooperative – java threads stop at well defined points (yield points) Yield points are inserted at method enter/exit Yield points are inserted in long running loops Yield points also in JVM runtime functions IBM Toronto & Ottawa Labs 10/10/2019

Cooperative threading simplifies lock reservation Thread cannot be stopped at monitor enter or exit code Cancellation is lot less complicated and intrusive There will be locked regions without yield points (primitive locked regions) Entering and exiting those is faster (no state change instructions required) Example: synchronized (O) { return O.f; } IBM Toronto & Ottawa Labs 10/10/2019

Selective reservation Lock reservation will matter only in hot methods Lock reservation will matter most if the locked region of code is short running Using compile time analysis of the class code and recompilation we can selectively implement reservation IBM Toronto & Ottawa Labs 10/10/2019

Selection algorithm Count the number of synchronized methods in a class and compare with non-synchronized Compute the size of the synchronized code using hotness estimate Derive the amount of synchronization overhead If synchronization overhead is significant or moderate, tag the class as candidate IBM Toronto & Ottawa Labs 10/10/2019

Runtime detection of excessive reservation cancellation and back-out Using timer based sampling and per class cancellation counters we can detect excessive cancellation We can undo reservation by code patching or recompilation Undo scope is very narrow because the reservation is selectively applied IBM Toronto & Ottawa Labs 10/10/2019

The data was taken running on 1 socket dual-core Intel Core2 Duo Results on SPECjvm98 db The data was taken running on 1 socket dual-core Intel Core2 Duo running at 2.16GHz, 2GB RAM, Windows XP Professional IBM Toronto & Ottawa Labs 10/10/2019

The data was taken running on 2 socket dual-core Intel Woodcrest Results on SPECjbb2005 The data was taken running on 2 socket dual-core Intel Woodcrest running at 2.6GHz, 16GB RAM, Windows 2003 64bit Server IBM Toronto & Ottawa Labs 10/10/2019

Summary Lock reservation can reduce unnecessary locking overhead Lock reservation should be applied with caution Can increase contention Cooperative threading simplifies reservation IBM Toronto & Ottawa Labs 10/10/2019