Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint.

Slides:



Advertisements
Similar presentations
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Rich Transactions on Reasonable Hardware J. Eliot B. Moss Univ. of Massachusetts,
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Transactional Memory: Architectural Support for Lock- Free Data Structures Herlihy & Moss Presented by Robert T. Bauer.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
The Future of Concurrency Theory Renaissance or Reformation? (Met dank aan Maurice Herlihy) Frits Vaandrager.
Nested Transactional Memory: Model and Preliminary Architecture Sketches J. Eliot B. Moss Antony L. Hosking.
Parallelism and Concurrency Koen Lindström Claessen Chalmers University Gothenburg, Sweden Ulf Norell.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
The Future of Distributed Computing Renaissance or Reformation? Maurice Herlihy Brown University.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
An Introduction to Software Transactional Memory
Art of Multiprocessor Programming 1 Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Transactional Memory The Art of Multiprocessor Programming Herlihy and Shavit.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
WG5: Applications & Performance Evaluation Pascal Felber
Kernel Locking Techniques by Robert Love presented by Scott Price.
December 1, 2006©2006 Craig Zilles1 Threads and Cache Coherence in Hardware  Previously, we introduced multi-cores. —Today we’ll look at issues related.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
How D can make concurrent programming a piece of cake Bartosz Milewski D Programming Language.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Optimistic Design CDP 1. Guarded Methods Do something based on the fact that one or more objects have particular states Make a set of purchases assuming.
Parallel Data Structures. Story so far Wirth’s motto –Algorithm + Data structure = Program So far, we have studied –parallelism in regular and irregular.
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
Commutativity and Coarse-Grained Transactions Maurice Herlihy Brown University Joint work with Eric Koskinen and Matthew Parkinson (POPL 10)
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Novel Paradigms of Parallel Programming Prof. Smruti R. Sarangi IIT Delhi.
Transactional Memory Companion slides for
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Multiprocessor Programming
Software Perspectives on Transactional Memory
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Transactional Memory Companion slides for
Transactional Memory TexPoint fonts used in EMF.
The University of Adelaide, School of Computer Science
Changing thread semantics
Lecture 6: Transactions
Transactional Memory Companion slides for
Lecture 22: Consistency Models, TM
Does Hardware Transactional Memory Change Everything?
Hybrid Transactional Memory
Lecture 23: Transactional Memory
I, for one, Welcome our new Multicore Overlords …
Parallel Data Structures
Presentation transcript:

Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A

2 Our Vision for the Future In this course, we covered …. Best practices … New and clever ideas … And common-sense observations. Art of Multiprocessor Programming

3 Our Vision for the Future In this course, we covered …. Best practices … New and clever ideas … And common-sense observations. Nevertheless … Concurrent programming is still too hard … Here we explore why this is …. And what we can do about it. Art of Multiprocessor Programming

4 Locking Art of Multiprocessor Programming

5 Coarse-Grained Locking Easily made correct … But not scalable. Art of Multiprocessor Programming

6 Fine-Grained Locking Can be tricky … Art of Multiprocessor Programming

7 Locks are not Robust If a thread holding a lock is delayed … No one else can make progress Art of Multiprocessor Programming

Locking Relies on Conventions Relation between –Lock bit and object bits –Exists only in programmer’s mind /* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */ Actual comment from Linux Kernel (hat tip: Bradley Kuszmaul) Art of Multiprocessor Programming

9 Simple Problems are hard enq(x) enq(y) double-ended queue No interference if ends “far apart” Interference OK if queue is small Clean solution is publishable result: [Michael & Scott PODC 97] Clean solution is publishable result: [Michael & Scott PODC 97] Art of Multiprocessor Programming

10 Locks Not Composable Transfer item from one queue to another Must be atomic : No duplicate or missing items Must be atomic : No duplicate or missing items

Art of Multiprocessor Programming 11 Locks Not Composable Lock source Lock target Unlock source & target

Art of Multiprocessor Programming 12 Locks Not Composable Lock source Lock target Unlock source & target Methods cannot provide internal synchronization Objects must expose locking protocols to clients Clients must devise and follow protocols Abstraction broken!

13 Monitor Wait and Signal zzz Empty buffer Yes! Art of Multiprocessor Programming If buffer is empty, wait for item to show up If buffer is empty, wait for item to show up

14 Wait and Signal do not Compose empty zzz… Art of Multiprocessor Programming Wait for either?

Art of Multiprocessor Programming 15 The Transactional Manifesto Current practice inadequate –to meet the multicore challenge Research Agenda –Replace locking with a transactional API –Design languages or libraries –Implement efficient run-times

Art of Multiprocessor Programming 16 Transactions Block of code …. Atomic: appears to happen instantaneously Serializable: all appear to happen in one-at-a-time order Commit: takes effect (atomically) Abort: has no effect (typically restarted)

Art of Multiprocessor Programming 17 atomic { x.remove(3); y.add(3); } atomic { y = null; } Atomic Blocks

Art of Multiprocessor Programming 18 atomic { x.remove(3); y.add(3); } atomic { y = null; } Atomic Blocks No data race

Art of Multiprocessor Programming 19 public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } A Double-Ended Queue Write sequential Code

Art of Multiprocessor Programming 20 public void LeftEnq(item x) atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } A Double-Ended Queue

Art of Multiprocessor Programming 21 public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } A Double-Ended Queue Enclose in atomic block

Art of Multiprocessor Programming 22 Warning Not always this simple –Conditional waits –Enhanced concurrency –Complex patterns But often it is…

Art of Multiprocessor Programming 23 Composition?

Art of Multiprocessor Programming 24 Composition? public void Transfer(Queue q1, q2) { atomic { T x = q1.deq(); q2.enq(x); } Trivial or what?

Art of Multiprocessor Programming 25 public T LeftDeq() { atomic { if (this.left == null) retry; … } Conditional Waiting Roll back transaction and restart when something changes

Art of Multiprocessor Programming 26 Composable Conditional Waiting atomic { x = q1.deq(); } orElse { x = q2.deq(); } Run 1 st method. If it retries … Run 2 nd method. If it retries … Entire statement retries

Art of Multiprocessor Programming 27 Hardware Transactional Memory Exploit Cache coherence Already almost does it –Invalidation –Consistency checking Speculative execution –Branch prediction = optimistic synch!

Art of Multiprocessor Programming 28 HW Transactional Memory Interconnect caches memory read active T

Art of Multiprocessor Programming 29 Transactional Memory read active TT caches memory

Art of Multiprocessor Programming 30 Transactional Memory active TT committed caches memory

Art of Multiprocessor Programming 31 Transactional Memory write active committed TD caches memory

Art of Multiprocessor Programming 32 Rewind active TT write aborted D caches memory

Art of Multiprocessor Programming 33 Transaction Commit At commit point –If no cache conflicts, we win. Mark transactional entries –Read-only: valid –Modified: dirty (eventually written back) That’s all, folks! –Except for a few details …

Art of Multiprocessor Programming 34 Not all Skittles and Beer Limits to –Transactional cache size –Scheduling quantum Transaction cannot commit if it is –Too big –Too slow –Actual limits platform-dependent

HTM Strengths & Weaknesses Ideal for lock-free data structures

HTM Strengths & Weaknesses Ideal for lock-free data structures Practical proposals have limits on –Transaction size and length –Bounded HW resources –Guarantees vs best-effort

HTM Strengths & Weaknesses Ideal for lock-free data structures Practical proposals have limits on –Transaction size and length –Bounded HW resources –Guarantees vs best-effort On fail –Diagnostics essential –Retry in software?

Composition Locks don’t compose, transactions do. Composition necessary for Software Engineering. But practical HTM doesn’t really support composition! Why we need STM

Art of Multiprocessor Programming 39 Simple Lock-Based STM STMs come in different forms –Lock-based –Lock-free Here : a simple lock-based STM

Art of Multiprocessor Programming 40 Synchronization Transaction keeps –Read set: locations & values read –Write set: locations & values to be written Deferred update –Changes installed at commit Lazy conflict detection –Conflicts detected at commit

Art of Multiprocessor Programming 41 STM: Transactional Locking Map Application Memory V# Array of version #s & locks

Art of Multiprocessor Programming 42 Reading an Object Mem Locks V# Add version numbers & values to read set

Art of Multiprocessor Programming 43 To Write an Object Mem Locks V# Add version numbers & new values to write set

Art of Multiprocessor Programming 44 To Commit Mem Locks V# X Y V#+1 Acquire write locks Check version numbers unchanged Install new values Increment version numbers Unlock.

Problem: Internal Inconsistency A Zombie is an active transaction destined to abort. If Zombies see inconsistent states bad things can happen

Art of Multiprocessor Programming 46 Internal Consistency x y Invariant: x = 2y Transaction A: reads x = 4 Transaction B: writes 8 to x, 16 to y, aborts A ) Transaction B: writes 8 to x, 16 to y, aborts A ) Transaction A: (zombie) reads y = 4 computes 1/(x-y) Transaction A: (zombie) reads y = 4 computes 1/(x-y) Divide by zero FAIL!

Art of Multiprocessor Programming 47 Solution: The Global Clock Have one shared global clock Incremented by (small subset of) writing transactions Read by all transactions Used to validate that state worked on is always consistent

100 Art of Multiprocessor Programming 48 Read-Only Transactions Mem Locks Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock

100 Art of Multiprocessor Programming 49 Read-Only Transactions Mem Locks Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock Read lock, version #, and memory

100 Art of Multiprocessor Programming 50 Read-Only Transactions Mem Locks Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock Read lock, version #, and memory On Commit: check unlocked & version # unchanged On Commit: check unlocked & version # unchanged

Art of Multiprocessor Programming 51 Read-Only Transactions Mem Locks Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock Read lock, version #, and memory On Commit: check unlocked & version # unchanged On Commit: check unlocked & version # unchanged Check that version #s less than local read clock 100

Art of Multiprocessor Programming 52 Read-Only Transactions Mem Locks Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock Read lock, version #, and memory On Commit: check unlocked & version # unchanged On Commit: check unlocked & version # unchanged Check that version #s less than local read clock 100 We have taken a snapshot without keeping an explicit read set!

100 Art of Multiprocessor Programming 53 Regular Transactions Mem Locks Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock

100 Art of Multiprocessor Programming 54 Regular Transactions Mem Locks Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock On read/write, check: Unlocked & version # < RV Add to R/W set On read/write, check: Unlocked & version # < RV Add to R/W set

Art of Multiprocessor Programming 55 On Commit Mem Locks 100 Shared Version Clock Private Read Version (RV) Acquire write locks

Art of Multiprocessor Programming 56 On Commit Mem Locks 100 Shared Version Clock Private Read Version (RV) Acquire write locks Increment Version Clock

Art of Multiprocessor Programming 57 On Commit Mem Locks 100 Shared Version Clock Private Read Version (RV) Acquire write locks Increment Version Clock Check version numbers ≤ RV

Art of Multiprocessor Programming 58 On Commit Mem Locks 100 Shared Version Clock Private Read Version (RV) Acquire write locks Increment Version Clock Check version numbers ≤ RV Update memory x y

Art of Multiprocessor Programming 59 On Commit Mem Locks 100 Shared Version Clock Private Read Version (RV) Acquire write locks Increment Version Clock Check version numbers ≤ RV Update memory Update write version #s x y 100

Art of Multiprocessor Programming 60 TM Design Issues Implementation choices Language design issues Semantic issues

Art of Multiprocessor Programming 61 Granularity Object –managed languages, Java, C#, … –Easy to control interactions between transactional & non-trans threads Word –C, C++, … –Hard to control interactions between transactional & non-trans threads

Art of Multiprocessor Programming 62 Direct/Deferred Update Deferred –modify private copies & install on commit –Commit requires work –Consistency easier Direct –Modify in place, roll back on abort –Makes commit efficient –Consistency harder

Art of Multiprocessor Programming 63 Conflict Detection Eager –Detect before conflict arises –“Contention manager” module resolves Lazy –Detect on commit/abort Mixed –Eager write/write, lazy read/write …

Art of Multiprocessor Programming 64 Conflict Detection Eager detection may abort transactions that could have committed. Lazy detection discards more computation.

Art of Multiprocessor Programming 65 Contention Management & Scheduling How to resolve conflicts? Who moves forward and who rolls back? Lots of empirical work but formal work in infancy

Art of Multiprocessor Programming 66 Contention Manager Strategies Exponential backoff Priority to –Oldest? –Most work? –Non-waiting? None Dominates But needed anyway Judgment of Solomon

Art of Multiprocessor Programming 67 I/O & System Calls? Some I/O revocable –Provide transaction- safe libraries –Undoable file system/DB calls Some not –Opening cash drawer –Firing missile

Art of Multiprocessor Programming 68 I/O & System Calls One solution: make transaction irrevocable –If transaction tries I/O, switch to irrevocable mode. There can be only one … –Requires serial execution No explicit aborts –In irrevocable transactions

Art of Multiprocessor Programming 69 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); }

Art of Multiprocessor Programming 70 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException!

Art of Multiprocessor Programming 71 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException! What is printed?

Art of Multiprocessor Programming 72 Unhandled Exceptions Aborts transaction –Preserves invariants –Safer Commits transaction –Like locking semantics –What if exception object refers to values modified in transaction?

Art of Multiprocessor Programming 73 Nested Transactions atomic void foo() { bar(); } atomic void bar() { … } atomic void foo() { bar(); } atomic void bar() { … }

Art of Multiprocessor Programming 74 Nested Transactions Needed for modularity –Who knew that cosine() contained a transaction? Flat nesting –If child aborts, so does parent First-class nesting –If child aborts, partial rollback of child only

Remember 1993?

Citation Count

TM Today 93,300

2,210,000 Second Opinion

Hatin’ on TM STM is too inefficient

Hatin’ on TM Requires radical change in programming style

Hatin’ on TM Erlang-style shared nothing only true path to salvation

Hatin’ on TM There is nothing wrong with what we do today.

Gartner Hype Cycle Hat tip: Jeremy Kemp

Art of Multiprocessor Programming 84 Multicore forces us to rethink almost everything I, for one, Welcome our new Multicore Overlords …

Art of Multiprocessor Programming 85 Multicore forces us to rethink almost everything Standard approaches too complex I, for one, Welcome our new Multicore Overlords …

Art of Multiprocessor Programming 86 Multicore forces us to rethink almost everything Standard approaches won’t scale Transactions might make life simpler… I, for one, Welcome our new Multicore Overlords …

Art of Multiprocessor Programming 87 Multicore forces us to rethink almost everything Standard approaches won’t scale Transactions might … Multicore programming Plenty more to do… Maybe it will be you… I, for one, Welcome our new Multicore Overlords …

Thanks ! תודה Art of Multiprocessor Programming 88