Transactional Memory Companion slides for

Slides:



Advertisements
Similar presentations
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Advertisements

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Rich Transactions on Reasonable Hardware J. Eliot B. Moss Univ. of Massachusetts,
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
The Future of Concurrency Theory Renaissance or Reformation? (Met dank aan Maurice Herlihy) Frits Vaandrager.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 8: Transactional Memory – TCC Topics: “lazy” implementation (TCC)
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
1 Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of “lazy” TM.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
An Introduction to Software Transactional Memory
Art of Multiprocessor Programming 1 Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Transactional Memory The Art of Multiprocessor Programming Herlihy and Shavit.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
WG5: Applications & Performance Evaluation Pascal Felber
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Lecture 9- Concurrency Control (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
Concurrent Revisions: A deterministic concurrency model. Daan Leijen & Sebastian Burckhardt Microsoft Research (OOPSLA 2010, ESOP 2011)
Transactional Memory Companion slides for
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Jonathan Walpole Computer Science Portland State University
Multiprocessor Programming
Transactions and Reliability
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Transactional Memory TexPoint fonts used in EMF.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Michael, Triplett and Walpole.
The University of Adelaide, School of Computer Science
Faster Data Structures in Transactional Memory using Three Paths
Concurrent Objects Companion slides for
Concurrency Control.
Multiprocessor Cache Coherency
The University of Adelaide, School of Computer Science
Example Cache Coherence Problem
Synchronization Lecture 23 – Fall 2017.
Lecture 11: Transactional Memory
Changing thread semantics
Lecture: Consistency Models, TM
Lecture 6: Transactions
Transactional Memory Companion slides for
Lecture 21: Transactional Memory
Threads and Memory Models Hal Perkins Autumn 2009
Chapter 15 : Concurrency Control
Lecture 22: Consistency Models, TM
Lecture: Consistency Models, TM
Hybrid Transactional Memory
Design and Implementation Issues for Atomicity
Software Transactional Memory Should Not be Obstruction-Free
The University of Adelaide, School of Computer Science
Lecture 23: Transactional Memory
Lecture 21: Transactional Memory
Lecture: Consistency Models, TM
Lecture: Transactional Memory
The University of Adelaide, School of Computer Science
I, for one, Welcome our new Multicore Overlords …
Parallel Data Structures
Presentation transcript:

Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Art of Multiprocessor Programming 1

Shared Data Structures Fine grained parallelism has huge performance benefit The reason we get only 2.9 speedup Coarse Grained Fine Grained 25% Shared 25% Shared The hard part about parallelizing applications is taking care of the fraction that is has to do with communication among threads, usually through shared objects… 75% Unshared 75% Unshared

Art of Multiprocessor Programming A FIFO Queue Head Tail a b c d So lets look at what we have learned about parallelizing such objects…the game is making things more fine grained… Dequeue() => a Enqueue(d) Art of Multiprocessor Programming

Coarse Grain Locks Object lock Contention and sequential bottleneck Simple Code, easy to prove correct b c d Tail Head a P: Dequeue() => a Q: Enqueue(d) Object lock Don’t undersell simplicity… Contention and sequential bottleneck Art of Multiprocessor Programming

Fine Grain Locks Finer Granularity, More Complex Code a b c d Head Tail a b c d P: Dequeue() => a Q: Enqueue(d) Verification nightmare: worry about deadlock, livelock… Art of Multiprocessor Programming

Fine Grain Locks Complex boundary cases: empty queue, last item Tail Head a Head Tail a b c d P: Dequeue() => a Q: Enqueue(b) Worry how to acquire multiple locks Art of Multiprocessor Programming

Moreover: Locking Relies on Conventions Relation between Lock bit and object bits Exists only in programmer’s mind Actual comment from Linux Kernel (hat tip: Bradley Kuszmaul) /* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */ Art of Multiprocessor Programming

Lock-Free (JDK 1.5+) Even Finer Granularity, Even More Complex Code a Head Tail a b c d We can go even one step further…we have seen how to design lock free data structures…with finer granularity P: Dequeue() => a Q: Enqueue(d) Worry about starvation, subtle bugs, hardness to modify… Art of Multiprocessor Programming

Composing Objects More than twice the worry… Complex: Move data atomically between structures b c d Tail Head a a must be seen in Q1 or Q2, but never in both P: Dequeue(Q1) => a Enqueue(Q2,a) Invariant: c d a Tail Head b Liran: add invariant More than twice the worry… Art of Multiprocessor Programming

The Transactional Memory Manifesto Current practice inadequate to meet the multicore challenge Instead Replace locking with a transactional “atomic” API atomic { [...] } // Anything Art of Multiprocessor Programming

Promise of Transactional Memory “Great” Performance, “Simple” Code Head Tail a b c d P: Dequeue() => a Q: Enqueue(d) Don’t worry about deadlock, livelock, subtle bugs, etc… Art of Multiprocessor Programming

Promise of Transactional Memory Don’t worry which locks need to cover which variables when… b Tail Head a Head Tail a b c d P: Dequeue() => a Q: Enqueue(d) TM deals with boundary cases under the hood Art of Multiprocessor Programming

Composing Objects Provide Composability… Will be easy to modify multiple structures atomically b c d Tail Head a a must be seen in Q1 or Q2, but never in both P: Dequeue(Q1) => a Enqueue(Q2,a) Invariant: c d a Tail Head b Provide Composability… Art of Multiprocessor Programming

Art of Multiprocessor Programming Transactions Atomic Commit: all updates take effect Abort: effects rolled back on failure Usually retried Serializable Appear to happen in one-at-a-time order Art of Multiprocessor Programming

Art of Multiprocessor Programming Atomic Blocks atomic { x.remove(3); y.add(3); } atomic { y = null; } Art of Multiprocessor Programming

Art of Multiprocessor Programming Atomic Blocks atomic { x.remove(3); y.add(3); } atomic { y = null; } No data race Art of Multiprocessor Programming 16

Art of Multiprocessor Programming Designing a FIFO Queue Public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Write sequential Code Art of Multiprocessor Programming 17

Art of Multiprocessor Programming Designing a FIFO Queue Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Art of Multiprocessor Programming

Enclose in atomic block Designing a FIFO Queue Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Enclose in atomic block Art of Multiprocessor Programming

Art of Multiprocessor Programming Composition Public void Transfer(Queue<T> q1, q2) { atomic { T x = q1.deq(); q2.enq(x); } Trivial or what? Art of Multiprocessor Programming 20

Roll back transaction (discard updates) and restart on failure Public T LeftDeq() { atomic { [...] // Maybe some updates occurred if (this.left == null) abort; [...] } Liran: add the … before the check Roll back transaction (discard updates) and restart on failure Art of Multiprocessor Programming

OrElse Composition Run 1st method. If it retries … atomic { x = q1.deq(); } orElse { x = q2.deq(); } Run 1st method. If it retries … Run 2nd method. If it retries … Entire statement retries Art of Multiprocessor Programming

Art of Multiprocessor Programming Warning Not always this simple Conditional waits If waiting for update in atomic block, then it isn’t atomic Enhanced concurrency e.g., wait-free But often it is Art of Multiprocessor Programming

The Transactional Memory Research Agenda Design languages to support this model Implement the run-time to be fast enough Art of Multiprocessor Programming

Transactional Memory Implementation STM: Software transactional memory HTM: Hardware transactional memory HyTM: Hybrid transactional memory Try in hardware Default to software if unsuccessful Art of Multiprocessor Programming

Hardware versus Software Do we need hardware at all? Analogies: Virtual memory: yes! Garbage collection: no! Probably do need HW for performance Do we need software? Policy issues don’t make sense for hardware (some examples later…) Art of Multiprocessor Programming

Transactional Consistency Memory Transactions are collections of reads and writes executed atomically Transactions should maintain internal and external consistency Internal: the transaction itself should operate on a consistent state. External: with respect to the interleaving of other transactions. When we design STMs, we need to worry about both external and internal consistency Art of Multiprocessor Programming

If z ≠ u then external consistency violated Invariant x = 2y 4 8 4 x Transaction A: Write x Write y Application Memory 2 y Transaction B: Read x Read y Compute z = 1/(x-y) Compute u = 1/(x-y) If z ≠ u then external consistency violated

Art of Multiprocessor Programming Simple Lock-Based STM STMs come in different forms Lock-based Lock-free Here we will describe a simple lock-based STM Lets see how we design an STM to provide external consistency… Art of Multiprocessor Programming

Art of Multiprocessor Programming Synchronization Transaction keeps Read set: object locations Write set: object locations & values to be written Deferred update Changes installed at commit Lazy conflict detection Conflicts detected at commit Art of Multiprocessor Programming

STM: Transactional Locking Map V# Application Memory Array of Versioned Write-Locks V# V# Art of Multiprocessor Programming

Art of Multiprocessor Programming The Global Clock Have one shared global clock Incremented by (small subset of) writing transactions Read by all transactions Used to validate at commit that state worked on is consistent Art of Multiprocessor Programming

Global-Clock Transactions Mem Locks Each transaction starts by copying V clock to RV 12 32 56 69 Shared Version Clock 69 Liran: fix to match the book version 19 17 Private Read Version (RV) Art of Multiprocessor Programming

Reading an Object Mem Locks Check unlocked Put object location in RS Return current value 12 32 56 … Shared Version Clock 69 Liran: fix to match the book version 19 17 Private Read Version (RV) Art of Multiprocessor Programming

To Write an Object Mem Locks Put object location and new value in WS … 12 32 56 Liran: fix to match the book version 19 … Shared Version Clock 69 17 Private Read Version (RV) Art of Multiprocessor Programming

Transactions- Commit x y Mem Locks Acquire locks (write set only) For all read set check unlock validate V# ≤ RV WV = Fetch&Inc(V Clock) Update write set Set write set V#s to WV Release locks 12 x 101 32 56 Liran: fix to match the book version 19 100 69 101 101 y 101 17 Private Read Version (RV) Private Write Version (WV) Shared Version Clock Art of Multiprocessor Programming

Problem: Internal Inconsistency A Zombie is a currently active transaction that is destined to abort because it saw an inconsistent state If Zombies that see inconsistent states are allowed to have irreversible impact on execution state then errors can occur Eventual abort does not save us Art of Multiprocessor Programming

Art of Multiprocessor Programming Internal Consistency Invariant x ≠ 0 && x = 2y 4 8 4 x Transaction B: Read x = 4 2 y Transaction A: Write x (kills B) Write y Zombie transactiosn are ones that are doomed to fail (here trans will fail validatuon at the end) but go on Computing in a way that can cause problems Transaction B: (zombie) Read y = 4 Compute z = 1/(x-y) Application Memory DIV by 0 ERROR Art of Multiprocessor Programming

Solution: Revalidation Revalidate clock on each read Check that state worked on is always consistent Art of Multiprocessor Programming

Global-Clock Transactions Mem Locks Each transaction starts by copying V clock to RV (as before) 12 32 56 69 Shared Version Clock 69 19 17 Private Read Version (RV) Art of Multiprocessor Programming

Read-Only Transactions Mem Locks Check unlocked Read mem Check unlocked again Check V# < RV Return read value 12 32 56 No need for read set! … Shared Version Clock 69 Liran: the book does not include this version 19 17 Private Read Version (RV) Art of Multiprocessor Programming

Regular Transaction - Read Mem Locks Check unlocked Read mem Put object location in RS Check unlocked again Check V# < RV Return read value 12 32 56 … Shared Version Clock 69 19 17 Private Read Version (RV) Art of Multiprocessor Programming

To Write an Object (as before) Mem Locks Put object location and new value in WS 12 32 56 19 … Shared Version Clock 69 17 Private Read Version (RV) Art of Multiprocessor Programming

Transactions- Commit (as before) Mem Locks Acquire locks (write set only) For all read set check unlock validate V# ≤ RV WV = Fetch&Inc(V Clock) Update write set Set write set V#s to WV Release locks why? 12 x 101 32 56 Liran: add “why?” 19 100 69 101 101 y 101 17 Private Read Version (RV) Private Write Version (WV) Shared Version Clock Art of Multiprocessor Programming

Art of Multiprocessor Programming Some explanations to lock-based implementation of regular transactions: why do we need to revalidate reads? When two transactions have their read and write sets intersected, but both succeed to read before the write of the other transaction occurs, then there is no way to serialize them (see example below by Nir Shavit). Hence the need to revalidate the read set *after* locking the write set. Also, upon commit of transaction A, *after* A already took the locks on the write set, and after or while the read set revalidated, another transaction cannot succeed to read from A's write set before A writes it (because it is locked). Detailed example: Take two transactions T1 and T2. Lets say that there are 2 memory locations initialized to 0. Lets say that both transactions read both locations, and T1 writes 1 to location 1 if it saw all 0's and T2 writes 1 to location 2 if it saw all 0's. Now if they both do not revalidate the read locations this means that T1 does not revalidate location 2 after acquiring the lock and T2 does not revalidate location 1 after grabbing the lock. So if they both run, both read both locations, both see all 0's in a snapshot, then both grab locks on their respective write locations, revalidate their own write locations, and write the 1 value with a timestamp greater by 1. Since they only revalidated their write locations after locking, neither saw that the other thread changed the location they only read to a 1 with a larger timestamp. Now we have a memory with two 1's in it even though there is no such serializable execution. Seeing a snapshot before grabbing the locks in the commit is thus not sufficient and the algorithm must have the transactions each revalidate the read set locations after acquiring the locks. Art of Multiprocessor Programming

Hardware Transactional Memory Exploit Cache coherence Already almost does it Invalidation Consistency checking Speculative execution Branch prediction = optimistic synch! Original TM proposal by Herlihy and Moss 1993… Art of Multiprocessor Programming

HW Transactional Memory read active T caches Interconnect memory Art of Multiprocessor Programming

Art of Multiprocessor Programming Transactional Memory read active active T T caches Interconnect memory Art of Multiprocessor Programming

Art of Multiprocessor Programming Transactional Memory active committed active T T caches Interconnect memory Art of Multiprocessor Programming

Art of Multiprocessor Programming Transactional Memory write committed active T D caches Interconnect memory Art of Multiprocessor Programming

Art of Multiprocessor Programming Rewind Before commit Art of Multiprocessor Programming

Art of Multiprocessor Programming Transactional Memory write aborted active active T T D caches Interconnect memory Art of Multiprocessor Programming

Art of Multiprocessor Programming Transaction Commit At commit point If no cache conflicts, we win. Mark transactional entries Read-only: valid Modified: dirty (eventually written back) That’s all, folks! Except for a few details … Art of Multiprocessor Programming

Not all Skittles and Beer Limits to Transactional cache size Scheduling quantum Transaction cannot commit if it is Too big Too slow Actual limits platform-dependent Art of Multiprocessor Programming

Art of Multiprocessor Programming TM Design Issues Implementation choices Language design issues Semantic issues Art of Multiprocessor Programming

Art of Multiprocessor Programming Granularity Object Managed languages, Java, C#, … Easy to control interactions between transactional & non-trans threads Word C, C++, … Hard to control interactions between transactional & non-trans threads Art of Multiprocessor Programming

Direct/Deferred Update Modify private copies & install on commit Commit requires work Consistency easier Direct Modify in place, roll back on abort Makes commit efficient Consistency harder Art of Multiprocessor Programming

Art of Multiprocessor Programming Conflict Detection Lazy: detect on commit/abort Contentions resolved by a race May discards more computation Eager: detect before conflict arises “Contention Manager” module resolves E.g., discard the transaction that is closer to start May abort transaction that could have committed. Liran: move mixing to its own slide and merge “Conflict Detection” Art of Multiprocessor Programming

Contention Management & Scheduling How to resolve conflicts? Who moves forward and who rolls back? Art of Multiprocessor Programming

Contention Manager Strategies Exponential backoff Priority to Rank? Oldest? Most work? Non-waiting task? Judgment of Solomon Liran: add rank Art of Multiprocessor Programming

Art of Multiprocessor Programming I/O & System Calls? Some I/O revocable Provide transaction-safe libraries Undoable file system/DB calls Some not Opening cash drawer Firing missile Art of Multiprocessor Programming

Art of Multiprocessor Programming I/O & System Calls One solution: make transaction irrevocable If transaction tries I/O, switch to irrevocable mode. There can be only one … Requires serial execution No explicit aborts In irrevocable transactions Art of Multiprocessor Programming

Art of Multiprocessor Programming Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); Art of Multiprocessor Programming

Exceptions Throws OutOfMemoryException! int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); What is printed? Art of Multiprocessor Programming

Art of Multiprocessor Programming Unhandled Exceptions Aborts transaction Preserves invariants Safer Commits transaction Like locking semantics What if exception object refers to values modified in transaction? Art of Multiprocessor Programming

Art of Multiprocessor Programming Nested Transactions atomic void foo() { bar(); } atomic void bar() { … Art of Multiprocessor Programming

Art of Multiprocessor Programming Nested Transactions Needed for modularity Who knew that cosine() contained a transaction? Flat nesting If child aborts, so does parent First-class nesting If child aborts, partial rollback of child only Art of Multiprocessor Programming

Strong vs Weak Isolation How do transactional & non-transactional threads synchronize? Interaction with memory-model? Efficient algorithms? Art of Multiprocessor Programming