PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.

Slides:

Advertisements

Similar presentations

Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.

Advertisements

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

Concurrency: Mutual Exclusion and Synchronization Chapter 5.

CSC321 Concurrent Programming: §3 The Mutual Exclusion Problem 1 Section 3 The Mutual Exclusion Problem.

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.

Chapter 6: Process Synchronization

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.

5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

Concurrent Programming James Adkison 02/28/2008. What is concurrency? “happens-before relation – A happens before B if A and B belong to the same process.

Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.

Mutual Exclusion.

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.

Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.

Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.

Nested Parallelism in Transactional Memory Kunal Agrawal, Jeremy T. Fineman and Jim Sukha MIT.

Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.

1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.

The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.

Avishai Wool lecture Introduction to Systems Programming Lecture 4 Inter-Process / Inter-Thread Communication.

1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.

Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.

Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P.

Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.

Adaptive Locks: Combining Transactions and Locks for efficient Concurrency Takayuki Usui et all.

/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:

Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.

Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.

Cosc 4740 Chapter 6, Part 3 Process Synchronization.

A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Introduction to Concurrency.

Process Synchronization Continued 7.2 Critical-Section Problem 7.3 Synchronization Hardware 7.4 Semaphores.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Mutual Exclusion.

Cpr E 308 Spring 2004 Real-time Scheduling Provide time guarantees Upper bound on response times –Programmer’s job! –Every level of the system Soft versus.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.

Internet Software Development Controlling Threads Paul J Krause.

Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.

Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

CY2003 Computer Systems Lecture 04 Interprocess Communication.

CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.

CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.

Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.

December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.

Lecture 20: Consistency Models, TM

Memory Consistency Models

Atomic Operations in Hardware

Atomic Operations in Hardware

Faster Data Structures in Transactional Memory using Three Paths

Memory Consistency Models

Lecture 5: GPU Compute Architecture

Lecture 11: Mutual Exclusion

Lecture 5: GPU Compute Architecture for the last time

Designing Parallel Algorithms (Synchronization)

Lecture 6: Transactions

Lecture 22: Consistency Models, TM

Dr. Mustafa Cem Kasapbaşı

Concurrency: Mutual Exclusion and Process Synchronization

Software Transactional Memory Should Not be Obstruction-Free

Kernel Synchronization II

CSE 153 Design of Operating Systems Winter 19

CS333 Intro to Operating Systems

Chapter 6: Synchronization Tools

CONCURRENCY Concurrency is the tendency for different tasks to happen at the same time in a system ( mostly interacting with each other ) . Parallel.

Controlled Interleaving for Transactions

CSE 542: Operating Systems

Presentation transcript:

PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona

Increasing the speed of execution Using more than one core for program execution To use the extra cores, programs must be parallelized As much of the program’s work as possible is to be done concurrently in the cores

Amdahl’s law Speed up that can be achieved is P= The fraction of the program that can be parallelized S= Number of execution units (cores)

Synchronization problems Most of the individual pieces of the program have to collaborate, so have to share the data in the memory Write access to the shared data cannot happen in uncontrolled fashion An inconsistent state results, if the state(for ex: global variable) is represented by the content of multiple memory locations

Contd.. Traditional solution is synchronization It is achieved with the help of mutual exclusion (mutex) directives Mutex is a technique to avoid the simultaneous use of the common resources(global variables) by the critical sections, by protecting them (i.e. protecting the memory that common resource resides in)

Cont… If all the read and write accesses to the protected state or the resource are performed while holding the mutex lock, it is guaranteed that the program will never see an inconsistent state. Locking mutexes open new set of problems

Problems with locking mutexes Using a single program-wide mutex hurts the program performance, as it decreases the portion of the program that can run in parallel Multiple mutexes increases the fraction ‘P’ in Amdahl’s law, but it also increases the overhead in locking and unlocking the mutexes (especially if the contention in critical regions is light)

Deadlock Another problem with multiple locks is potential deadlocks if overlapping mutexes are locked by multiple threads in a different order Ex: void f1() { void f2() { lock (mutex1) lock (mutex2) lock (mutex2) lock (mutex1)..code… Unlock(mutex1)Unlock(mutex2) Unlock(mutex2)Unlock(mutex1) } Livelock is problem with some algorithms that detect and try to recover form the deadlock  The deadlock detection algorithm may trigger repeatedly if more than one thread (chosen randomly) takes the action

The dilemma So a programmer may be caught between two problems  Increasing the part of the program that can be executed in parallel (P)  The resulting complexity of the program code, that increases the potential for problems

The concept of transactional memory(TM) to solve the consistency Programming with the concept of transactional memory can allow sequences of concurrent operations to be combined into atomic transactions A transaction is a piece of code that executes a series of reads and writes to shared memory Ref: _transactional_memory.html

A piece of code with locking mechanism Void f1(){ Lock(l_timestamp1); Lock(l_counter); *t=timestamp1; *r=counter++; unlock(l_timestamp1); unlock(l_counter); Void f2(){ Lock(l_counter); Lock(l_timestamp1); Lock(l_timestamp2) *r=counter++; If(*r & 1) *t=timestamp2; Else *t=timestamp1; unlock(l_timestamp2); unlock(l_timestamp1); unlock(l_counter);

The same code with TM Void f1(){ Tm_atomic{ *t=timestamp1; *r=counter++; } Void f2(){ Tm_atomic{ *r=counter++; If(*r & 1) *t=timestamp1; Else *t=timestamp2; }}

Transaction The keyword tm_atomic indicates that all the instructions in that block are part of transaction The way a transaction is carried out is as follow: 1.Check if the same memory location is part of another transaction 2.If yes, abort the current transaction 3.If no, record that the current transaction referenced the memory location so that step2 in other transactions can find it 4.Depending on whether it is a read access or write, either load the value of the memory location if the variable has not yet been modified or load it from cache; write it into local storage for the variable: This is called committing the transaction. There are alternatives for checking and aborting  Lazy abort  Eager abort

Cont… If the current transaction is aborted, reset all internal states; delay for some short period, then retry, executing the whole block. Store all the values of the memory locations modified in the transaction for which the new values are placed in local storage Reset the information about the memory locations being part of the transaction Only if all the memory accesses inside the atomic block succeed will the transaction commit

IMPLEMENTATION Hardware TM: Makes changes to CPU, buses.. Software TM: Semantics are added to that of language’s.  Must for general programming Hybrid TM

Issue with TM Recording transition:  may cause a lot of overhead if the exact location of each memory location in the transaction is to be recorded  Instead record the memory block: If the current aborted transaction has the memory locations need to recorded from the same block as an earlier recorded block, it does not have to record it again.

Cont..  This introduces one more problem: If the different variables in two independent transactions belong to the same block(in a cache even if their actual memory addresses of the variables are different and far form each other), if one transition is aborted, then the two transactions are no more independent. This leads to high abort rates  This is called false sharing and needs to be dealt with if blocking is used

Cont.. Handling aborts: Which one is good? Eager abort or lazy abort Ans: No one way is sufficient, Compliers will implement different ways at the same time and flip between them for individual transactions if it seems to be an advantage Semantics : The semantics of the atomic block (for ex. tm_atomic block) are to be integrated into the rest of the language semantics Performance: Plenty of optimizations should be performed by the compiler and it needs research.  In the case of contested and uncontested situations(single thread program), Tm overhead is too high, so two versions of each function are to be generated: one with TM support, one without. It should be made sure that the version without TM support is used as frequently as possible to reduce performance loss.

CONCLUSION TM concept solves problems associated with locking system (by lock free data structure) But there is still a lot of research is to be carried out to take the full advantage of the TM.

References "Parallel Programming with Transactional Memory," U. Drepper, Comm. ACM, vol. 52, no. 2, pp , February _transactional_memory.html _transactional_memory.html