Adaptive Locks: Combining Transactions and Locks for efficient Concurrency Takayuki Usui et all.

Slides:

Advertisements

Similar presentations

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.

Advertisements

HW/Study Guide. Synchronization Make sure you understand the HW problems!

Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)

Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.

Chapter 6: Process Synchronization

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS by Jyrki Nummenmaa.

Threading Part 4 CS221 – 4/27/09. The Final Date: 5/7 Time: 6pm Duration: 1hr 50mins Location: EPS 103 Bring: 1 sheet of paper, filled both sides with.

PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.

Semaphores. Announcements No CS 415 Section this Friday Tom Roeder will hold office hours Homework 2 is due today.

Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.

Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.

Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.

CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.

Software Transaction Memory for Dynamic-Sized Data Structures presented by: Mark Schall.

Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.

CS510 Concurrent Systems Introduction to Concurrency.

Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.

Cosc 4740 Chapter 6, Part 3 Process Synchronization.

1 Concurrent Languages – Part 1 COMP 640 Programming Languages.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Introduction to Concurrency.

CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.

Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.

Kernel Locking Techniques by Robert Love presented by Scott Price.

ICS 313: Programming Language Theory Chapter 13: Concurrency.

Monitors and Blocking Synchronization Dalia Cohn Alperovich Based on “The Art of Multiprocessor Programming” by Herlihy & Shavit, chapter 8.

Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.

CS533 – Spring Jeanie M. Schwenk Experiences and Processes and Monitors with Mesa What is Mesa? “Mesa is a strongly typed, block structured programming.

© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.

AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.

1 Previous Lecture Overview  semaphores provide the first high-level synchronization abstraction that is possible to implement efficiently in OS. This.

Switch off your Mobiles Phones or Change Profile to Silent Mode.

CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.

Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.

1 Programming with Shared Memory - 3 Recognizing parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Jan 22, 2016.

Scalable Computing model : Lock free protocol By Peeyush Agrawal 2010MCS3469 Guided By Dr. Kolin Paul.

December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.

Adaptive Software Lock Elision

Lecture 20: Consistency Models, TM

Part 2: Software-Based Approaches

Outline Other synchronization primitives

Atomic Operations in Hardware

Atomic Operations in Hardware

Faster Data Structures in Transactional Memory using Three Paths

Other Important Synchronization Primitives

Threads and Memory Models Hal Perkins Autumn 2011

Changing thread semantics

Threading And Parallel Programming Constructs

Threads and Memory Models Hal Perkins Autumn 2009

Concurrency: Mutual Exclusion and Process Synchronization

Kernel Synchronization II

CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization

Deferred Runtime Pipelining for contentious multicore transactions

CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization

CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization

CSE 153 Design of Operating Systems Winter 19

Programming with Shared Memory - 3 Recognizing parallelism

Programming with Shared Memory Specifying parallelism

CONCURRENCY Concurrency is the tendency for different tasks to happen at the same time in a system ( mostly interacting with each other ) . Parallel.

Synchronization These notes introduce:

Controlled Interleaving for Transactions

CSE 542: Operating Systems

CSE 542: Operating Systems

Presentation transcript:

Adaptive Locks: Combining Transactions and Locks for efficient Concurrency Takayuki Usui et all

Introduction. Computing is more multi processor oriented. Explicit multi threading is the most direct way to program parallel system (monitor style programming). Flip side – –Interference between threads. –Hard to detect conditions such as deadlocks and races. –Hard to get fine grained critical sections and course grained critical sections reduces concurrency

Alternatives Transactional Memory. Advantages –Higher level programming model. No need to know which locks to acquire. –No need of fine grained delineation of critical sections. Disadvantages –Livelocks, slower progress. –High Overhead.

Idea Try to combine the advantages of locks and transactional memory. How do the authors propose we do that? Adaptive Locks

What are adaptive locks. Synchronization mechanism combining locks and transactions. Programmer can specify critical sections which are executed as either mutex locks or atomically as transactions.

How? atomic (l1) { code } Is equivalent to atomic{ code } when executing in transactional mode or lock (l1); code ; unlock(l1).

How do we decide if it should run as a transaction or as a mutex lock. Let us throw out some terminology. Nominal contention. Actual contention. Transactional overhead.

Nominal Contention s.insert(10)s.insert(20) void public synchronized insert(val) { s[size] = val; size++ } Acquire lock Cannot acquire lock Thread 1Thread 2 Nominal Contention = 1 Wait

Actual Contention Atomic{ s.insert(10) } Atomic{ s.insert(20) } //Thread 1 starts S[0] = 10; // Thread 2 tries at the same time and Aborts. Starts first Tries to execute simultaneously Thread 1Thread 2 Actual Contention = 1 Abort

Transactional Overhead. How much overhead is incurred when the critical section executes in transactional mode versus mutex mode.

How are these terms helpful The authors use these concepts to dynamically calculate which mode the critical section should be executed in. Wait.. Are locks and transactions interchangeable? No they are not.. But we will discuss how with certain high level correctness criteria this can be handled.

Contributions of this paper. Efficient and effective implementation of adaptive locks. –Trading some accuracy to make it faster and reduce overhead. Define conditions under which transaction and mutex locks exhibit equivalent behavior. Evaluate adaptive locks with micro and macro benchmarks.

Programming with adaptive locks Adaptive locks introduce syntax for a labeled atomic sections. al_t lock1; atomic (lock1){ // critical section }

Some rules for using adaptive locks Programmer has the burden to make sure that if all the instances of atomic(lock1) are replaced by mutex blocks (mutex mode) then the program is still correct. Programmer also has the burden to make sure that if all the critical sections are executed as transactions (transactional mode) then the program still runs correctly.

More rules.. All critical sections associated with the same lock should execute in the same mode. Mode of nested adaptive lock should be the same as that of the surrounding lock. Mode switching can also be done either for correctness (I/O operations = mutex mode) or for performance.

Cost benefit analysis Remember the terms that we talked about before –Nominal Contention –Actual Contention –Transactional Overhead The authors use these terms to come up with the decision making logic.

And the winner is a.o >= c If this inequality holds then mutex mode is preferable. All these factors are computed separately for all of the locks dynamically.

Implementation and Optimizations Extension of the C language. Compiler translates it into 2 object code versions. One for mutex version and one for transactional version. Adaptive locks replace regular lock acquisition. The adaptive lock state is packed into a memory word.

What is contained in the state Number of threads executing in transactional mode = thrdsInStmMode Whether lock is in mutex mode = mutex mode Whether mutex lock is held = lockheld Whether we are currently in the process of changing modes = transition.

int acquire(al_t* lock) { int spins = 0; int useTransact = 0; INC(lock->thdsBlocked); while (1) { intptr_t prev,next; prev = lock->state; if (transition(prev) == 0) { if ((useTransact = transactMode(lock,spins))) { if (lockHeld(prev) == 0) { next = setMutexMode(prev,0); next = setThrdsInStmMode(next,thrdsInStmMode(next)+1); if (CAS(lock->state,prev,next) == prev) break; } else { next = setMutexMode(prev,0); next = setTransition(next,1); CAS(lock->state,prev,next); } } else { if (lockHeld(prev) == 0 && thrdsInStmMode(prev) == 0) { next = setMutexMode(prev,1); next = setLockHeld(next,1); if (CAS(lock->state,prev,next) == prev) break; } else if (mutexMode(prev) == 0) { next = setMutexMode(prev,1); next = setTransition(next,1); CAS(lock->state,prev,next); } } else { if (mutexMode(prev) == 0) { if (lockHeld(prev) == 0) { useTransact = 1; next = setThrdsInStmMode(prev,thrdsInStmMode(prev)+1); next = setTransition(next,0); if (CAS(lock->state,prev,next) == prev) break; } } else { if (lockHeld(prev) == 0 && thrdsInStmMode(prev) == 0) { useTransact = 0; next = setLockHeld(prev,1); next = setTransition(next,0); if (CAS(lock->state,prev,next) == prev) break; } if (spin_thrld < ++spins) Yield(); } /* end while(1) */ DEC(lock->thdsBlocked); return useTransact; } Acquire is the main routine

Performance Optimizations Threads need to update variables that keep count and calculate the various statistics for adaptive reasoning. Remember a (actual contention). Instead of updating it all the time, threads do regular writes to it. Then a shared update changes the global value. Of course this can give rise to write-write races but the authors seem to believe that sporadic inaccuracies in the statistics are not significant. Also to note, inaccuracies in statistics will not result in wrong program execution but choosing the other mode to execute the critical sections.

Performance Optimizations contd.. Atomic increment and decrement of variable locks->thdsBlocked is also avoided. The atomic increment and decrement of this variable is done only if there is real spinning else it is not done. This is contrary to the earlier code which was shown.

Performance Optimizations contd.. int acquire(al_t* lock) { int spins = 0;... INC(lock->thdsBlocked); while (1) {... // try to acquire, // break if successful if (spin_thrld < ++spins) Yield(); } DEC(lock->thdsBlocked);... } int acquire(al_t* lock) { int spins = 0;... while (1) {... // try to acquire, // break if successful if (spins == 0) INC(lock->thdsBlocked); if (spin_thrld < ++spins) Yield(); } if (0 < spins) DEC(lock->thdsBlocked);... }

Performance Optimizations contd.. o (optimization overhead) depends on shared memory updates. To keep the estimate of o realistic but inexpensive, –It is calculated at regular intervals. –The number of accesses to memory for that transaction are noted and multiplied with a static estimate of much each transaction would take.

Reality Check.. But hey is interchanging between locks and transactions legal. Are they equivalent? Answer: No, they are not equivalent. To be more specific, it depends on the type of STM system. TL2 which is the STM used by the authors differentiates between locks and transactions when they are used interchangeably.

No more boring bullets. We are not MBA students Thread 1 commits and It removes the first item. Thread 2 commits but Does not copy the value to memory Thread 2 eventually Update the value By that time, r1 and r2 Will see stale values.

So how can we fix this We can make a simple observation from this which is that there should be a lock for all the shared memory locations. Every access to these locations should be done with the lock held. This is the standard lockset well- formedness criteria for multi threaded programs.

Some results Tested with micro and macro benchmarks Tested with red black trees (STM), splay trees (mutex locks), fine grained hash tables – adaptive locks were as good as the better concurrency mechanism. Tested with (Stanford Transactional Applications for Multi-Processing).

Questions?