Download presentation
Presentation is loading. Please wait.
1
Transactional Memory Companion slides for
The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Art of Multiprocessor Programming 1
2
Shared Data Structures
Fine grained parallelism has huge performance benefit The reason we get only 2.9 speedup Coarse Grained Fine Grained c 25% Shared 25% Shared c c c The hard part about parallelizing applications is taking care of the fraction that is has to do with communication among threads, usually through shared objects… 75% Unshared 75% Unshared
3
A FIFO Queue a b c d Head Tail Dequeue() => a Enqueue(d)
So lets look at what we have learned about parallelizing such objects…the game is making things more fine grained… Dequeue() => a Enqueue(d)
4
A Concurrent FIFO Queue
Simple Code, easy to prove correct b c d Tail Head a P: Dequeue() => a Q: Enqueue(d) Object lock Don’t undersell simplicity… Contention and sequential bottleneck
5
Fine Grain Locks Finer Granularity, More Complex Code a b c d
Head Tail a b c d P: Dequeue() => a Q: Enqueue(d) Verification nightmare: worry about deadlock, livelock…
6
Fine Grain Locks Complex boundary cases: empty queue, last item
Tail Head a Head Tail a b c d P: Dequeue() => a Q: Enqueue(b) Worry how to acquire multiple locks
7
Moreover: Locking Relies on Conventions
Relation between Lock bit and object bits Exists only in programmer’s mind Actual comment from Linux Kernel (hat tip: Bradley Kuszmaul) /* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */ Art of Multiprocessor Programming 7
8
Lock-Free (JDK 1.5+) Even Finer Granularity, Even More Complex Code a
Head Tail a b c d We can go even one step further…we have seen how to design lock free data structures…with finer granularity P: Dequeue() => a Q: Enqueue(d) Worry about starvation, subtle bugs, hardness to modify…
9
Composing Objects More than twice the worry…
Complex: Move data atomically between structures b c d Tail Head a P: Dequeue(Q1,a) c d a Tail Head b Enqueue(Q2,a) More than twice the worry…
10
Transactional Memory Great Performance, Simple Code
Head Tail a b c d P: Dequeue() => a Q: Enqueue(d) Don’t worry about deadlock, livelock, subtle bugs, etc…
11
Promise of Transactional Memory
Don’t worry which locks need to cover which variables when… b Tail Head a Head Tail a b c d P: Dequeue() => a Q: Enqueue(d) TM deals with boundary cases under the hood
12
Composing Objects Provide Composability…
Will be easy to modify multiple structures atomically b c d Tail Head a P: Dequeue(Q1,a) c d a Tail Head b Enqueue(Q2,a) Provide Composability…
13
The Transactional Manifesto
Current practice inadequate to meet the multicore challenge Research Agenda Replace locking with a transactional API Design languages to support this model Implement the run-time to be fast enough Art of Multiprocessor Programming 13
14
Art of Multiprocessor Programming
Transactions Atomic Commit: takes effect Abort: effects rolled back Usually retried Serizalizable Appear to happen in one-at-a-time order Art of Multiprocessor Programming 14
15
Art of Multiprocessor Programming
Atomic Blocks atomic { x.remove(3); y.add(3); } atomic { y = null; } Art of Multiprocessor Programming 15
16
Art of Multiprocessor Programming
Atomic Blocks atomic { x.remove(3); y.add(3); } atomic { y = null; } No data race Art of Multiprocessor Programming 16 16
17
Art of Multiprocessor Programming
Designing a FIFO Queue Public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Write sequential Code Art of Multiprocessor Programming 17 17
18
Art of Multiprocessor Programming
Designing a FIFO Queue Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Art of Multiprocessor Programming 18
19
Enclose in atomic block
Designing a FIFO Queue Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Enclose in atomic block Art of Multiprocessor Programming 19
20
Art of Multiprocessor Programming
Warning Not always this simple Conditional waits Enhanced concurrency Complex patterns But often it is Art of Multiprocessor Programming 20
21
Art of Multiprocessor Programming
Composition Public void Transfer(Queue<T> q1, q2) { atomic { T x = q1.deq(); q2.enq(x); } Trivial or what? Art of Multiprocessor Programming 21 21
22
Roll back transaction and restart when something changes
Public T LeftDeq() { atomic { if (this.left == null) retry; … } Roll back transaction and restart when something changes Art of Multiprocessor Programming 22
23
OrElse Composition Run 1st method. If it retries …
atomic { x = q1.deq(); } orElse { x = q2.deq(); } Run 1st method. If it retries … Run 2nd method. If it retries … Entire statement retries Art of Multiprocessor Programming 23
24
Art of Multiprocessor Programming
Transactional Memory Software transactional memory (STM) Hardware transactional memory (HTM) Hybrid transactional memory (HyTM, try in hardware and default to software if unsuccessful) Art of Multiprocessor Programming 24
25
Hardware versus Software
Do we need hardware at all? Analogies: Virtual memory: yes! Garbage collection: no! Probably do need HW for performance Do we need software? Policy issues don’t make sense for hardware Art of Multiprocessor Programming 25
26
Transactional Consistency
Memory Transactions are collections of reads and writes executed atomically Tranactions should maintain internal and external consistency External: with respect to the interleavings of other transactions. Internal: the transaction itself should operate on a consistent state. When we design STMs, we need to worry about both external and internal consistency
27
Art of Multiprocessor Programming
External Consistency Invariant x = 2y 8 4 4 x Transaction A: Write x Write y 2 y Transaction B: Read x Read y Compute z = 1/(x-y) Application Memory Art of Multiprocessor Programming
28
Art of Multiprocessor Programming
Simple Lock-Based STM STMs come in different forms Lock-based Lock-free Here we will describe a simple lock-based STM Lets see how we design an STM to provide external consistency… Art of Multiprocessor Programming
29
Art of Multiprocessor Programming
Synchronization Transaction keeps Read set: locations & values read Write set: locations & values to be written Deferred update Changes installed at commit Lazy conflict detection Conflicts detected at commit Art of Multiprocessor Programming
30
STM: Transactional Locking
Map V# Array of Versioned- Write-Locks Application Memory V# V# Art of Multiprocessor Programming 30
31
Art of Multiprocessor Programming
Reading an Object Mem Locks V# Check not locked Put V#s & value in RS V# V# V# V# Art of Multiprocessor Programming 31
32
Art of Multiprocessor Programming
To Write an Object Mem Locks V# Add V# and new value to WS V# V# V# V# Art of Multiprocessor Programming 32
33
Art of Multiprocessor Programming
To Commit Mem Locks Acquire W locks Check V#s unchanged In RS only Install new values Increment V#s Release … V# X V# V#+1 V# V# Y V#+1 V# Art of Multiprocessor Programming 33
34
Problem: Internal Inconsistency
A Zombie is a currently active transaction that is destined to abort because it saw an inconsistent state If Zombies that see inconsistent states are allowed to have irreversible impact on execution state then errors can occur Eventual abort does not save us
35
Art of Multiprocessor Programming
Internal Consistency Invariant x \neq 0 && x = 2y 4 8 4 x Transaction B: Read x = 4 2 y Transaction A: Write x (kills B) Write y Zombie transactiosn are ones that are doomed to fail (here trans will fail validatuon at the end) but go on Computing in a way that can cause problems Transaction B: (zombie) Read y = 4 Compute z = 1/(x-y) Application Memory DIV by 0 ERROR Art of Multiprocessor Programming
36
Solution: The “Global Clock”
Have one shared global clock Incremented by (small subset of) writing transactions Read by all transactions Used to validate that state worked on is always consistent Art of Multiprocessor Programming
37
Read-Only Transactions
Mem Locks Copy V Clock to RV Read lock,V# Read mem Check unlocked (1) Recheck V# unchanged (2) (1)+(2)v# and mem content consistent Check V# < RV 12 Reads from a snapshot of memory. No read set! 32 56 100 Shared Version Clock 100 19 17 Private Read Version (RV) Art of Multiprocessor Programming 37
38
Regular Transactions – during trans. prev. to commit
Mem Locks Copy V Clock to RV On read/write, check: Unlocked (acquire) V# < RV Add to R/W set 12 32 56 19 69 Shared Version Clock 69 17 Private Read Version (RV) Art of Multiprocessor Programming 38
39
Regular Transactions- Commit
Mem Locks Acquire locks (write set only) WV = Fetch&Inc(V Clock) For all read set check unlock and revalidate V# < RV Update write set Set write write set V#s to WV Release locks 12 x 100 32 56 19 101 100 69 y 100 17 Private Read Version (RV) Shared Version Clock Art of Multiprocessor Programming 39
40
Some explanations to lock-based implementation of
regular transactions: why do we need to revalidate reads? When two transactions have their read and write sets intersected, but both succeed to read before the write of the other transaction occurs, then there is no way to serialize them (see example below by Nir Shavit). Hence the need to revalidate that the read set *after* locking the write set. Also, upon commit of transaction A, *after* A already took the locks on the write set, and after or while the read set revalidated, another transaction cannot succeed to read from A's write set before A writes it (because it is locked). Detailed example: Take two transactions T1 and T2. Lets say that there are 2 memory locations initialized to 0. Lets say that both transactions read both locations, and T1 writes 1 to location 1 if it saw all 0's and T2 writes 1 to location 2 if it saw all 0's. Now if they both do not revalidate the read locations this means that T1 does not revalidate location 2 after acquiring the lock and T2 does not revalidate location 1 after grabbing the lock. So if they both run, both read both locations, both see all 0's in a snapshot, then both grab locks on their respective write locations, revalidate their own write locations, and write the 1 value with a timestamp greater by 1. Since they only revalidated their write locations after locking, neither saw that the other thread changed the location they only read to a 1 with a larger timestamp. Now we have a memory with two 1's in it even though there is no such serializable execution. Seeing a snapshot before grabbing the locks in the commit is thus not sufficient and the algorithm must have the transactions each revalidate the read set locations after acquiring the locks. Art of Multiprocessor Programming
41
Hardware Transactional Memory
Exploit Cache coherence Already almost does it Invalidation Consistency checking Speculative execution Branch prediction = optimistic synch! Original TM proposal by Herlihy and Moss 1993… Art of Multiprocessor Programming 41
42
HW Transactional Memory
read active T caches Interconnect memory Art of Multiprocessor Programming 42
43
Art of Multiprocessor Programming
Transactional Memory read active active T T caches memory Art of Multiprocessor Programming 43
44
Art of Multiprocessor Programming
Transactional Memory active committed active T T caches memory Art of Multiprocessor Programming 44
45
Art of Multiprocessor Programming
Transactional Memory write committed active T D caches memory Art of Multiprocessor Programming 45
46
Art of Multiprocessor Programming
Rewind write aborted active active T T D caches memory Art of Multiprocessor Programming 46
47
Art of Multiprocessor Programming
Transaction Commit At commit point If no cache conflicts, we win. Mark transactional entries Read-only: valid Modified: dirty (eventually written back) That’s all, folks! Except for a few details … Art of Multiprocessor Programming 47
48
Not all Skittles and Beer
Limits to Transactional cache size Scheduling quantum Transaction cannot commit if it is Too big Too slow Actual limits platform-dependent Art of Multiprocessor Programming 48
49
Art of Multiprocessor Programming
TM Design Issues Implementation choices Language design issues Semantic issues Art of Multiprocessor Programming
50
Art of Multiprocessor Programming
Granularity Object managed languages, Java, C#, … Easy to control interactions between transactional & non-trans threads Word C, C++, … Hard to control interactions between transactional & non-trans threads Art of Multiprocessor Programming
51
Direct/Deferred Update
modify private copies & install on commit Commit requires work Consistency easier Direct Modify in place, roll back on abort Makes commit efficient Consistency harder Art of Multiprocessor Programming
52
Art of Multiprocessor Programming
Conflict Detection Eager Detect before conflict arises “Contention manager” module resolves Lazy Detect on commit/abort Mixed Eager write/write, lazy read/write … Art of Multiprocessor Programming
53
Art of Multiprocessor Programming
Conflict Detection Eager detection may abort transaction that could have committed. Lazy detection discards more computation. Art of Multiprocessor Programming
54
Contention Management & Scheduling
How to resolve conflicts? Who moves forward and who rolls back? Art of Multiprocessor Programming
55
Contention Manager Strategies
Exponential backoff Priority to Oldest? Most work? Non-waiting? None Dominates Judgment of Solomon Art of Multiprocessor Programming
56
Art of Multiprocessor Programming
I/O & System Calls? Some I/O revocable Provide transaction-safe libraries Undoable file system/DB calls Some not Opening cash drawer Firing missile Art of Multiprocessor Programming
57
Art of Multiprocessor Programming
I/O & System Calls One solution: make transaction irrevocable If transaction tries I/O, switch to irrevocable mode. There can be only one … Requires serial execution No explicit aborts In irrevocable transactions Art of Multiprocessor Programming
58
Art of Multiprocessor Programming
Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); Art of Multiprocessor Programming
59
Exceptions Throws OutOfMemoryException! int i = 0; try { atomic { i++;
node = new Node(); } } catch (Exception e) { print(i); Art of Multiprocessor Programming
60
Exceptions Throws OutOfMemoryException! int i = 0; try { atomic { i++;
node = new Node(); } } catch (Exception e) { print(i); What is printed? Art of Multiprocessor Programming
61
Art of Multiprocessor Programming
Unhandled Exceptions Aborts transaction Preserves invariants Safer Commits transaction Like locking semantics What if exception object refers to values modified in transaction? Art of Multiprocessor Programming
62
Art of Multiprocessor Programming
Nested Transactions atomic void foo() { bar(); } atomic void bar() { … Art of Multiprocessor Programming
63
Art of Multiprocessor Programming
Nested Transactions Needed for modularity Who knew that cosine() contained a transaction? Flat nesting If child aborts, so does parent First-class nesting If child aborts, partial rollback of child only Art of Multiprocessor Programming
64
Open Nested Transactions
Normally, child commit Visible only to parent In open nested transactions Commit visible to all Escape mechanism Dangerous, but useful What escape mechanisms are needed? Art of Multiprocessor Programming
65
Strong vs Weak Isolation
How do transactional & non-transactional threads synchronize? Interaction with memory-model? Efficient algorithms? Art of Multiprocessor Programming
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.