Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiprocessor Programming

Similar presentations


Presentation on theme: "Multiprocessor Programming"— Presentation transcript:

1 Multiprocessor Programming
Transactional Memory The Art of Multiprocessor Programming

2 Multicore Architetures
“Learn how the multi-core processor architecture plays a central role in Intel's platform approach. ….” “AMD is leading the industry to multi-core technology for the x86 based computing market …” “Sun's multicore strategy centers around multi-threaded software. ... “ Art of Multiprocessor Programming

3 Why do we care? Time no longer cures software bloat
When you double your path length You can’t just wait 6 months Your software must somehow exploit twice as much concurrency © 2006 Herlihy & Shavit

4 The Problem Cannot exploit cheap threads Today’s Software
Non-scalable methodologies Today’s Hardware Poor support for scalable synchronization © 2006 Herlihy & Shavit

5 Locking © 2006 Herlihy & Shavit

6 Coarse-Grained Locking
Easily made correct … But not scalable. © 2006 Herlihy & Shavit

7 Fine-Grained Locking Here comes trouble … © 2006 Herlihy & Shavit

8 Why Locking Doesn’t Scale
Not Robust Relies on conventions Hard to Use Conservative Deadlocks Lost wake-ups Not Composable © 2006 Herlihy & Shavit

9 No one else can make progress
Locks are not Robust If a thread holding a lock is delayed … No one else can make progress © 2006 Herlihy & Shavit

10 Why Locking Doesn’t Scale
Not Robust Relies on conventions Hard to Use Conservative Deadlocks Lost wake-ups Not Composable © 2006 Herlihy & Shavit

11 Locking Relies on Conventions
Relation between Lock bit and object bits Exists only in programmer’s mind Actual comment from Linux Kernel (hat tip: Bradley Kuszmaul) /* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */ © 2006 Herlihy & Shavit

12 Why Locking Doesn’t Scale
Not Robust Relies on conventions Hard to Use Conservative Deadlocks Lost wake-ups Not Composable © 2006 Herlihy & Shavit

13 No interference if ends “far enough” apart
Sadistic Homework enq(x) enq(y) Double-ended queue No interference if ends “far enough” apart © 2006 Herlihy & Shavit

14 Interference OK if ends “close enough” together
Sadistic Homework enq(x) enq(y) Double-ended queue Interference OK if ends “close enough” together © 2006 Herlihy & Shavit

15 Make sure suspended dequeuers awake as needed
Sadistic Homework deq() deq() Double-ended queue Make sure suspended dequeuers awake as needed © 2006 Herlihy & Shavit

16 You Try It … One lock? Locks at each end? Waking blocked dequeuers?
Too Conservative Locks at each end? Deadlock, too complicated, etc Waking blocked dequeuers? Harder than it looks © 2006 Herlihy & Shavit

17 Actual Solution Clean solution would be a publishable result
[Michael & Scott, PODC 96] High performance fine-grained lock-based solutions are good for libraries… not general consumption by programmers © 2006 Herlihy & Shavit

18 Why Locking Doesn’t Scale
Not Robust Relies on conventions Hard to Use Conservative Deadlocks Lost wake-ups Not Composable © 2006 Herlihy & Shavit

19 Exposing lock internals breaks abstraction
Locks do not compose Hash Table lock T1 Must lock T1 before adding item add(T1, item) item Move from T1 to T2 lock T1 lock T2 delete(T1, item) add(T2, item) Must lock T2 before deleting from T1 item item Exposing lock internals breaks abstraction © 2006 Herlihy & Shavit

20 Monitor Wait and Signal
Empty zzz buffer Yes! If buffer is empty, wait for item to show up © 2006 Herlihy & Shavit

21 Wait and Signal do not Compose
empty Wait for either? empty zzz… © 2006 Herlihy & Shavit

22 The Transactional Manifesto
What we do now is inadequate to meet the multicore challenge Research Agenda Replace locking with a transactional API Design languages to support this model Implement the run-time to be fast enough © 2006 Herlihy & Shavit

23 Transactions Atomic Linearizable Commit: takes effect
Abort: effects rolled back Usually retried Linearizable Appear to happen in one-at-a-time order © 2006 Herlihy & Shavit

24 Sadistic Homework Revisited
Public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Write sequential Code (1) © 2006 Herlihy & Shavit

25 Sadistic Homework Revisited
Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } (1) © 2006 Herlihy & Shavit

26 Sadistic Homework Revisited
Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Enclose in atomic block (1) © 2006 Herlihy & Shavit

27 Warning Not always this simple But often it is Conditional waits
Enhanced concurrency Overlapping locks But often it is Works for sadistic homework © 2006 Herlihy & Shavit

28 Composition Trivial or what? Public void Transfer(Queue q1, q2) {
atomic { Item x = q1.deq(); q2.enq(x); } Trivial or what? (1) © 2006 Herlihy & Shavit

29 Wake-ups: lost and found
Public item LeftDeq() { atomic { if (this.left == null) retry; } Roll back transaction and restart when something changes (1) © 2006 Herlihy & Shavit

30 OrElse Composition Run 1st method. If it retries …
atomic { x = q1.deq(); } orElse { x = q2.deq(); } Run 1st method. If it retries … Run 2nd method. If it retries … Entire statement retries © 2006 Herlihy & Shavit

31 Transactional Memory [HerlihyMoss93]
© 2006 Herlihy & Shavit

32 Hardware Overview Exploit Cache coherence protocols
Already do almost what we need Invalidation Consistency checking Exploit Speculative execution Branch prediction = optimistic synch! Original TM proposal by Herlihy and Moss 1993… © 2006 Herlihy & Shavit

33 HW Transactional Memory
read active T caches Interconnect memory © 2006 Herlihy & Shavit

34 Transactional Memory read active active T T caches memory
© 2006 Herlihy & Shavit

35 Transactional Memory active active committed T T caches memory
© 2006 Herlihy & Shavit

36 Transactional Memory write active committed T D caches memory
© 2006 Herlihy & Shavit

37 Rewind write aborted active active T T D caches memory
© 2006 Herlihy & Shavit

38 Transaction Commit At commit point Mark transactional entries
If no cache conflicts, we win. Mark transactional entries Read-only: valid Modified: dirty (eventually written back) That’s all, folks! Except for a few details … © 2006 Herlihy & Shavit

39 Not all Skittles and Beer
Limits to Transactional cache size Scheduling quantum Transaction cannot commit if it is Too big Too slow Actual limits platform-dependent © 2006 Herlihy & Shavit

40 Some Approaches Trap to software if hardware fails
“Hybrid” approach Use locks if speculation fails Lock elision “Virtualize” transactional memory VTM, UTM, etc… © 2006 Herlihy & Shavit

41 Hardware versus Software
Do we need hardware at all? Like virtual memory, probably need HW for performance Do we need software? Policy issues don’t make sense for hardware © 2006 Herlihy & Shavit

42 Transactional Locking
Build STM based on locks Atomic{} simple as course grained locking and it composes Under the covers (transparent to user) we deploy fine grained locks © 2006 Herlihy & Shavit

43 Locking STM Design Choices
Map Array of Versioned- Write-Locks Application Memory V# PS = Lock per Stripe (separate array of locks) PO = Lock per Object (embedded in object) © 2006 Herlihy & Shavit

44 Lock-based STM Mem Locks V# 0 V#+1 0 X Y V# 0 V# 0 V# 0
To Read: load lock + location Location in write-set? (Bloom Filter) Check unlocked add to Read-Set To Write: add value to write set Acquire Locks Validate read/write v#’s unchanged Release each lock with v#+1 X V# V# V# V# V#+1 0 V# Y V# V# V# V# V#+1 0 V#+1 0 V# V# V# V# V# 0 V# © 2006 Herlihy & Shavit

45 Transactional Consistency
Invariant x = 2y 8 4 4 X Transaction A: Write x Write y 2 Y Transaction B: Read x Read y Compute z = 1/(x-y) = 1/2 Application Memory

46 Transactional Inconsistency
Invariant x = 2y 4 8 4 X Transaction B: Read x = 4 2 Y Transaction A: Write x Write y Zombie transactiosn are ones that are doomed to fail (here trans will fail validatuon at the end) but go on Computing in a way that can cause problems Transaction B: Read y = 4 {trans is zombie} Compute z = 1/(x-y) Application Memory DIV by 0 ERROR

47 Solution: The “Global Clock”
Have one shared global clock Incremented by (small subset of) writing transactions Read by all transactions Used to validate that state worked on is always consistent

48 Solution: “Version Clock”
Have one shared global version clock Incremented by (small subset of) writing transactions Read by all transactions Used to validate that state worked on is always consistent Later: how we learned not to worry about contention and love the clock © 2006 Herlihy & Shavit

49 Version Clock: Read-OnlyTrans
Mem Locks 100 VClock V# RV  VClock On Read: read lock, read mem, read lock: check unlocked, unchanged, and v# <= RV Commit. V# Reads form a snapshot of memory. No read set! V# 100 RV © 2006 Herlihy & Shavit

50 Version Clock 100 120 100 121 VClock Mem Locks 87 0 121 0 88 0 V# 0
V# RV  VClock On Read/Write: check unlocked and v# <= RV then add to Read/Write-Set Acquire Locks WV = F&I(VClock) Validate each v# <= RV Release locks with v#  WV X X Y Y V# Reads+Inc+Writes =Linearizable 100 RV Commit © 2006 Herlihy & Shavit

51 Performance Benchmarks
Mechanically Transformed Sequential Red-Black Tree using TL2 Compare to STMs and hand-crafted fine-grained Red-Black implementation On a 16–way Sun Fire™ running Solaris™ 10 © 2006 Herlihy & Shavit

52 Uncontended Large Red-Black Tree
Hand-crafted 5% Delete 5% Update 90% Lookup TL/PO TL2/P0 Ennals TL/PS TL2/PS FarserHarris Lock-free © 2006 Herlihy & Shavit

53 Uncontended Small RB-Tree
5% Delete 5% Update 90% Lookup TL/P0 TL2/P0 © 2006 Herlihy & Shavit

54 Contended Small RB-Tree
TL/P0 30% Delete 30% Update 40% Lookup TL2/P0 Ennals © 2006 Herlihy & Shavit

55 What Next? On multicores like Niagara Global Clock not a big issue.
But there is a lot of TM research work ahead Do you want to help…? © 2006 Herlihy & Shavit


Download ppt "Multiprocessor Programming"

Similar presentations


Ads by Google