Art of Multiprocessor Programming 1 Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
MACHINE-INDEPENDENT VIRTUAL MEMORY MANAGEMENT FOR PAGED UNIPROCESSOR AND MULTIPROCESSOR ARCHITECTURES R. Rashid, A. Tevanian, M. Young, D. Golub, R. Baron,
Lock-Based Concurrency Control
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Rich Transactions on Reasonable Hardware J. Eliot B. Moss Univ. of Massachusetts,
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
The Future of Concurrency Theory Renaissance or Reformation? (Met dank aan Maurice Herlihy) Frits Vaandrager.
Nested Transactional Memory: Model and Preliminary Architecture Sketches J. Eliot B. Moss Antony L. Hosking.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 8: Transactional Memory – TCC Topics: “lazy” implementation (TCC)
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
The Future of Distributed Computing Renaissance or Reformation? Maurice Herlihy Brown University.
1 Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of “lazy” TM.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
1 Lecture 9: TM Implementations Topics: wrap-up of “lazy” implementation (TCC), eager implementation (LogTM)
1 Lecture 7: Lazy & Eager Transactional Memory Topics: details of “lazy” TM, scalable lazy TM, implementation details of eager TM.
Software Transactional Memory Nir Shavit Tel-Aviv University and Sun Labs “Where Do We Come From? What Are We? Where Are We Going?”
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
An Introduction to Software Transactional Memory
08_Transactions_LECTURE2 DBMSs should guarantee ACID properties (Atomicity, Consistency, Isolation, Durability). This is typically done by guaranteeing.
Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Transactional Memory The Art of Multiprocessor Programming Herlihy and Shavit.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
WG5: Applications & Performance Evaluation Pascal Felber
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
Transactional Locking Nir Shavit Tel Aviv University Joint work with Dave Dice and Ori Shalev.
Consistency Oblivious Programming Hillel Avni Tel Aviv University.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Lecture 9- Concurrency Control (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Parallel Data Structures. Story so far Wirth’s motto –Algorithm + Data structure = Program So far, we have studied –parallelism in regular and irregular.
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
Novel Paradigms of Parallel Programming Prof. Smruti R. Sarangi IIT Delhi.
Transactional Memory Companion slides for
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Multiprocessor Programming
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Transactional Memory Companion slides for
Transactional Memory TexPoint fonts used in EMF.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Michael, Triplett and Walpole.
Lecture 11: Transactional Memory
Changing thread semantics
Lecture: Consistency Models, TM
Lecture 6: Transactions
Transactional Memory Companion slides for
Lecture 21: Transactional Memory
Lecture 22: Consistency Models, TM
Lecture: Consistency Models, TM
Does Hardware Transactional Memory Change Everything?
Hybrid Transactional Memory
Lecture 23: Transactional Memory
Lecture 21: Transactional Memory
Lecture: Consistency Models, TM
Lecture: Transactional Memory
I, for one, Welcome our new Multicore Overlords …
Presentation transcript:

Art of Multiprocessor Programming 1 Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Shared Data Structures 75% Unshared 25% Shared cc cc cc cc Coarse Grained c c c c c c c c cc cc cc cc Fine Grained c c c c c c c c The reason we get only 2.9 speedup 75% Unshared 25% Shared Fine grained parallelism has huge performance benefit

A FIFO Queue bcd TailHead a Enqueue(d)Dequeue() => a

A Concurrent FIFO Queue Object lock bcd TailHead a P: Dequeue() => a Q: Enqueue(d) Simple Code, easy to prove correct Contention and sequential bottleneck

Fine Grain Locks bcd TailHead a P: Dequeue() => a Q: Enqueue(d) Finer Granularity, More Complex Code Verification nightmare: worry about deadlock, livelock…

Fine Grain Locks bcd TailHead a P: Dequeue() => a Q: Enqueue(b) b TailHead a Worry how to acquire multiple locks Complex boundary cases: empty queue, last item

Art of Multiprocessor Programming 77 Moreover: Locking Relies on Conventions Relation between –Lock bit and object bits –Exists only in programmer’s mind /* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */ Actual comment from Linux Kernel (hat tip: Bradley Kuszmaul)

Lock-Free (JDK 1.5+) bcd TailHead a P: Dequeue() => a Q: Enqueue(d) Even Finer Granularity, Even More Complex Code Worry about starvation, subtle bugs, hardness to modify…

Composing Objects Complex: Move data atomically between structures More than twice the worry… bcd TailHead a P: Dequeue(Q1,a) cda TailHead b Enqueue(Q2,a)

Transactional Memory bcd TailHead a P: Dequeue() => a Q: Enqueue(d) Don’t worry about deadlock, livelock, subtle bugs, etc… Great Performance, Simple Code

Promise of Transactional Memory bcd TailHead a P: Dequeue() => a Q: Enqueue(d) b TailHead a TM deals with boundary cases under the hood Don’t worry which locks need to cover which variables when…

Composing Objects Will be easy to modify multiple structures atomically Provide Composability… bcd TailHead a P: Dequeue(Q1,a) cda TailHead b Enqueue(Q2,a)

Art of Multiprocessor Programming 13 The Transactional Manifesto Current practice inadequate –to meet the multicore challenge Research Agenda –Replace locking with a transactional API –Design languages to support this model –Implement the run-time to be fast enough

Art of Multiprocessor Programming 14 Transactions Atomic –Commit: takes effect –Abort: effects rolled back Usually retried Serizalizable –Appear to happen in one-at-a-time order

Art of Multiprocessor Programming 15 atomic { x.remove(3); y.add(3); } atomic { y = null; } Atomic Blocks

Art of Multiprocessor Programming 16 atomic { x.remove(3); y.add(3); } atomic { y = null; } Atomic Blocks No data race

Art of Multiprocessor Programming 17 Public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Designing a FIFO Queue Write sequential Code

Art of Multiprocessor Programming 18 Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Designing a FIFO Queue

Art of Multiprocessor Programming 19 Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Designing a FIFO Queue Enclose in atomic block

Art of Multiprocessor Programming 20 Warning Not always this simple –Conditional waits –Enhanced concurrency –Complex patterns But often it is

Art of Multiprocessor Programming 21 Public void Transfer(Queue q1, q2) { atomic { T x = q1.deq(); q2.enq(x); } Composition Trivial or what?

Art of Multiprocessor Programming 22 Public T LeftDeq() { atomic { if (this.left == null) retry; … } Roll Back Roll back transaction and restart when something changes

Art of Multiprocessor Programming 23 OrElse Composition atomic { x = q1.deq(); } orElse { x = q2.deq(); } Run 1 st method. If it retries … Run 2 nd method. If it retries … Entire statement retries

Art of Multiprocessor Programming 24 Transactional Memory Software transactional memory (STM) Hardware transactional memory (HTM) Hybrid transactional memory (HyTM, try in hardware and default to software if unsuccessful) 24

Art of Multiprocessor Programming 25 Hardware versus Software Do we need hardware at all? –Analogies: Virtual memory: yes! Garbage collection: no! –Probably do need HW for performance Do we need software? –Policy issues don’t make sense for hardware

Transactional Consistency Memory Transactions are collections of reads and writes executed atomically Tranactions should maintain internal and external consistency –External: with respect to the interleavings of other transactions. –Internal: the transaction itself should operate on a consistent state.

Art of Multiprocessor Programming 27 External Consistency Application Memory x y Invariant x = 2y Transaction A: Write x Write y Transaction B: Read x Read y Compute z = 1/(x-y)

Art of Multiprocessor Programming 28 Simple Lock-Based STM STMs come in different forms –Lock-based –Lock-free Here we will describe a simple lock- based STM

Art of Multiprocessor Programming 29 Synchronization Transaction keeps –Read set: locations & values read –Write set: locations & values to be written Deferred update –Changes installed at commit Lazy conflict detection –Conflicts detected at commit

Art of Multiprocessor Programming 30 STM: Transactional Locking Map Array of Versioned- Write-Locks Application Memory V#

Art of Multiprocessor Programming 31 Reading an Object Check not locked Put V#s & value in RS Mem Locks V#

Art of Multiprocessor Programming 32 To Write an Object Add V# and new value to WS Mem Locks V#

Art of Multiprocessor Programming 33 To Commit Acquire W locks Check V#s unchanged In RS only Install new values Increment V#s Release … Mem Locks V# X Y V#+1

Problem: Internal Inconsistency A Zombie is a currently active transaction that is destined to abort because it saw an inconsistent state If Zombies that see inconsistent states are allowed to have irreversible impact on execution state then errors can occur Eventual abort does not save us

Art of Multiprocessor Programming 35 Internal Consistency Application Memory x y Invariant x \neq 0 && x = 2y Transaction A: Write x (kills B) Write y Transaction B: Read x = 4 Transaction B: (zombie) Read y = 4 Compute z = 1/(x-y) DIV by 0 ERROR

Art of Multiprocessor Programming 36 Solution: The “Global Clock” Have one shared global clock Incremented by (small subset of) writing transactions Read by all transactions Used to validate that state worked on is always consistent

Art of Multiprocessor Programming 37 Read-Only Transactions Copy V Clock to RV Read lock,V# Read mem Check unlocked (1) Recheck V# unchanged (2) (1)+(2)  v# and mem content consistent Check V# < RV Mem Locks Shared Version Clock Private Read Version (RV) 100 Reads from a snapshot of memory. No read set!

Art of Multiprocessor Programming 38 Regular Transactions – during trans. prev. to commit Copy V Clock to RV On read/write, check: Unlocked (acquire) V# < RV Add to R/W set Mem Locks 69 Shared Version Clock Private Read Version (RV)

Art of Multiprocessor Programming 39 Regular Transactions- Commit Acquire locks (write set only) WV = Fetch&Inc(V Clock) For all read set check unlock and revalidate V# < RV Update write set Set write write set V#s to WV Release locks Mem Locks 100 Shared Version Clock x y Private Read Version (RV)

Art of Multiprocessor Programming 40 When two transactions have their read and write sets intersected, but both succeed to read before the write of the other transaction occurs, then there is no way to serialize them (see example below by Nir Shavit). Hence the need to revalidate that the read set *after* locking the write set. Also, upon commit of transaction A, *after* A already took the locks on the write set, and after or while the read set revalidated, another transaction cannot succeed to read from A's write set before A writes it (because it is locked). Detailed example: Take two transactions T1 and T2. Lets say that there are 2 memory locations initialized to 0. Lets say that both transactions read both locations, and T1 writes 1 to location 1 if it saw all 0's and T2 writes 1 to location 2 if it saw all 0's. Now if they both do not revalidate the read locations this means that T1 does not revalidate location 2 after acquiring the lock and T2 does not revalidate location 1 after grabbing the lock. So if they both run, both read both locations, both see all 0's in a snapshot, then both grab locks on their respective write locations, revalidate their own write locations, and write the 1 value with a timestamp greater by 1. Since they only revalidated their write locations after locking, neither saw that the other thread changed the location they only read to a 1 with a larger timestamp. Now we have a memory with two 1's in it even though there is no such serializable execution. Seeing a snapshot before grabbing the locks in the commit is thus not sufficient and the algorithm must have the transactions each revalidate the read set locations after acquiring the locks. Some explanations to lock-based implementation of regular transactions: why do we need to revalidate reads?

Art of Multiprocessor Programming 41 Hardware Transactional Memory Exploit Cache coherence Already almost does it –Invalidation –Consistency checking Speculative execution –Branch prediction = optimistic synch!

Art of Multiprocessor Programming 42 HW Transactional Memory Interconnect caches memory read active T

Art of Multiprocessor Programming 43 Transactional Memory caches memory read active TT

Art of Multiprocessor Programming 44 Transactional Memory caches memory active TT committed

Art of Multiprocessor Programming 45 Transactional Memory caches memory write active committed TD

Art of Multiprocessor Programming 46 Rewind caches memory active TT write aborted D

Art of Multiprocessor Programming 47 Transaction Commit At commit point –If no cache conflicts, we win. Mark transactional entries –Read-only: valid –Modified: dirty (eventually written back) That’s all, folks! –Except for a few details …

Art of Multiprocessor Programming 48 Not all Skittles and Beer Limits to –Transactional cache size –Scheduling quantum Transaction cannot commit if it is –Too big –Too slow –Actual limits platform-dependent

Art of Multiprocessor Programming 49 TM Design Issues Implementation choices Language design issues Semantic issues

Art of Multiprocessor Programming 50 Granularity Object –managed languages, Java, C#, … –Easy to control interactions between transactional & non-trans threads Word –C, C++, … –Hard to control interactions between transactional & non-trans threads

Art of Multiprocessor Programming 51 Direct/Deferred Update Deferred –modify private copies & install on commit –Commit requires work –Consistency easier Direct –Modify in place, roll back on abort –Makes commit efficient –Consistency harder

Art of Multiprocessor Programming 52 Conflict Detection Eager –Detect before conflict arises –“Contention manager” module resolves Lazy –Detect on commit/abort Mixed –Eager write/write, lazy read/write …

Art of Multiprocessor Programming 53 Conflict Detection Eager detection may abort transaction that could have committed. Lazy detection discards more computation.

Art of Multiprocessor Programming 54 Contention Management & Scheduling How to resolve conflicts? Who moves forward and who rolls back?

Art of Multiprocessor Programming 55 Contention Manager Strategies Exponential backoff Priority to –Oldest? –Most work? –Non-waiting? None Dominates Judgment of Solomon

Art of Multiprocessor Programming 56 I/O & System Calls? Some I/O revocable –Provide transaction- safe libraries –Undoable file system/DB calls Some not –Opening cash drawer –Firing missile

Art of Multiprocessor Programming 57 I/O & System Calls One solution: make transaction irrevocable –If transaction tries I/O, switch to irrevocable mode. There can be only one … –Requires serial execution No explicit aborts –In irrevocable transactions

Art of Multiprocessor Programming 58 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); }

Art of Multiprocessor Programming 59 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException!

Art of Multiprocessor Programming 60 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException! What is printed?

Art of Multiprocessor Programming 61 Unhandled Exceptions Aborts transaction –Preserves invariants –Safer Commits transaction –Like locking semantics –What if exception object refers to values modified in transaction?

Art of Multiprocessor Programming 62 Nested Transactions atomic void foo() { bar(); } atomic void bar() { … } atomic void foo() { bar(); } atomic void bar() { … }

Art of Multiprocessor Programming 63 Nested Transactions Needed for modularity –Who knew that cosine() contained a transaction? Flat nesting –If child aborts, so does parent First-class nesting –If child aborts, partial rollback of child only

Art of Multiprocessor Programming 64 Open Nested Transactions Normally, child commit –Visible only to parent In open nested transactions –Commit visible to all –Escape mechanism –Dangerous, but useful What escape mechanisms are needed?

Art of Multiprocessor Programming 65 Strong vs Weak Isolation How do transactional & non-transactional threads synchronize? Interaction with memory-model? Efficient algorithms?