The Complexity of Transactional Memory & What to Do About It Hagit Attiya Technion & EPFL.

The Complexity of Transactional Memory & What to Do About It Hagit Attiya Technion & EPFL

The Challenge of Concurrent Programming A multi-core revolution is underway Exploit the power of concurrent computing, by restructuring applications Writing concurrent applications is harder than sequential programming

Transactional Memory (TM) A way to deal with the difficulty of writing concurrent applications. In its simplest form, just wrap code begin / end transaction TM synchronizes memory accesses so that each transaction seems to execute sequentially and in isolation begin-transaction ------------ ------------- ------------ end transaction

A Brief History of TM TM originally suggested as hardware platform [Herlihy and Moss 1993] Software transactional memory (STM), essentially optimized multi-word synchronization (static) [Shavit & Touitou 1995] Popularization in the programming languages & architecture communities [Rajwar 2002] First made dynamic only with a weaker liveness condition (obstruction-freedom) [Herlihy, Luchnagco, Moir and Schrer 2003]

The Promise TM will track memory accesses and will allow transactions to proceed concurrently, if they are not conflicting Optimismvs. pessimism begin-transaction end-transaction lock (entry) unock (exit) 

2-3 Levels of Abstraction Transactions, each a sequence of operations accessing data items, by a single thread Operations –on data items: E.g., Read and Write –TryCommit / TryAbort Data set = Read set  Write set Primitives on base objects (load, store, CAS) read write tryC

More Modeling Data representation for transactions and data items using base objects Algorithms for operations, applying primitives to base objects –load, store, CAS, DCAS Asynchronous processes invoke these procedures Lead to interleaved executions, in the standard sense STM -----

Safety Serializability: transactions appear to execute sequentially Strict serializability: preserves the order of non-overlapping transactions [Papadimitriou 1979] Opacity: even transactions that later abort are (strictly) serializable [Guerraoui, Kapalka POPL 2008] –Also support for operations other than read and write. Snapshot isolation serializability strict serializability opacity snapshot isolation

The Many Faces of Progress TM may abort transactions, in case of conflicts Could admit trivial implementations Several progress properties When locking is not allowed: Wait-freedom Obstruction-freedom

Progress for Lock-Based TM Better performance with locks [Dice, Shalev, Shavit DISC 2006] Weakly progressive: a transaction aborts only if it has conflicts [Guerraoui, Kapalka POPL 2009] Strongly progressive: at least one of the transactions involved in the conflict commits Minimally progressive: a transaction commits if it runs alone, with no pending transactions Multi-version permissive: only an update transaction that conflicts with another update transaction aborts [Perlman, Fan, Keidar PODC 2010]  Read-only transactions always commit

minimally progressive weakly progressive obstruction free multi-valued permissive strongly progressive wait free

Minimally progressive TMs solve consensus for at most two processes [Guerraoui, Kapalka SPAA 2008]  Their consensus number is 2 Holds for obstruction-free and weakly progressive Key step: equivalence with a consensus object that fails in a very clean manner [A, Guerraoui, Hendler, Kuznetsov PODC 2006] propose decide(v) / fail The Consensus Number of TM

Invisible Reads Optimize read-only transactions, which in principle, need not modify the shared memory Invisible reads: Read operations do not store  Read-only transactions do not store at all Semi-visible read operations store some information, but not very detailed E.g., [Dice, Matveev, Shavit Transact 2010]  Oblivious STM [A & Hillel DISC 2010]

Step Complexity Lower Bound [Guerraoui, Kapalka PPoPP 2008] A read operation has O( | read set | ) step complexity, in an STM that is –single version –with invisible reads –weakly progressive

Predicting TM Scalability Unrelated transactions progress independently even if they are concurrent Represent relations between transactions by a conflict graph: –Vertices represent transactions, –Edges connect transactions that share a data item T1{A,B,C}, T2{A,D}, T3{D,E}, T4{F,L}, T5{L}, T6{J} Disjoint access transactions are not connected in the graph Strictly disjoint access transactions are not adjacent T4 T5 T1 T6 T2 T3

Disjoint Access Parallelism TM is DAP: Two transactions concurrently contend on the same base object, only if they are not disjoint-access ~ [Israeli and Rappoport PODC 1995] Similar definition for strict DAP T4 T5 T1 T6 T2 T3 access the same base object, at least one a store

Achieving Disjoint-Access Parallelism No obstruction-free and strict DAP STM [Guerraoui, Kapalka 2008] But there is obstruction-free and DAP STM [Herlihy, Luchnagco, Moir and Schrer 2003] Not if read-only transactions are invisible and always succeed to commit [A, Hillel, Milani SPAA 2009]

Achieving DAP [A, Hillel, Milani SPAA 2009] Holds for strict serializability and opacity Also for serializability and snapshot isolation (under a slightly stronger notion of DAP) A read-only transaction have O( | read set | ) stores when the STM is –MV-permissive (read-only transactions commit) –DAP

Privatization Apply loads and stores to the underlying data (un instrumented access) Avoids transactional overhead [Spear, Marathe, Dalessandro, Scott 2007] [Shpeisman, Menon, Adl-Tabatabai, Balensiefer, Grossman, Hudson, Moore, Saha 2007] STM

Cost of Privatization Cannot be achieved without prior privatization [Guerraoui, Henzinger, Kapalka, Singh SPAA 2010] [A, Hillel DISC 2010] Must invoke a privatizing transaction or a privatizing barrier [Dice, Matvev Shavit Transact 2010] STM Unless parallelism is reduced or detailed information is kept, privatization cost is linear in the number of privatized items [A, Hillel DISC 2010]

And a few more results…

So, In Theory TM cannot efficiently provide clean semantics either weaken the consistency semantics or compromise the progress guarantees Limited scalability & significant cost TM is not an expressive programming idiom

But In Practice, We are Fine, No? Not really… Worst-case lower bounds are not for corner cases –likely to happen in practice –hard to program around them Implementation-focused research seems to be hitting the same wall [Cascaval, Blundell, Michael, Cain, Wu, Chiras, Chatterjee 2008] Design choices compromise either simplicity –Elastic STM [Felber, Gramoli, Guerraoui, DISC 2009] Or scalability –Single-lock STMs [Olszewski, Cutler, Steffan] [Dalessandro, Spear, Scott]

A Post-TM Era TM cannot make programs run correctly and efficiently, without programmer’s awareness Stop hiding the realities of concurrency Expose a cleaner model of a multi-core that does not hide tradeoffs Provide additional methodologies and tools Multitude of approaches –I will discuss two

Approach I: Optimizing Coarse-Grain Programming For applications with moderate amount of contention (say <32 threads), the overhead of managing the memory can outweigh synchronization cost Access the data mostly “in exclusion” Combining: The thread winning the lock carries out many of the pending operations [Hendler, Incze, Shavit, Tzafrir SPAA 2010] Without locking: optimize the memory utilization of Herlihy's universal construction [Chuong, Ellen, Ramachandran SPAA 2010]

Approach II: Programming with Mini-Transactions Extension of DCAS or kCAS (for small k’s) or multi-location variant of LL/SC [PowerPC, DEC Alpha] –Short –Works on a small, static data set –Simple functionality –No I/O, out-of-core memory accesses, etc. May fail spuriously

Mini-Transactions Lower bounds use large, dynamic data sets long transactions accessed w/ arbitrary operations and unrestricted calculations Mini-transactions small, static data sets short transactions simple functionality, e.g., arithmetic, comparison, and memory access

Mini-Transactions & HTM Mini-transaction are almost provided by recent hardware TM proposals –AMD Advanced Synchronization Facility [2009] –Sun [Chaudhry, Cypher, Ekman, Karlsson, Landin, Yip, Zeffer, and Tremblay Micro 2009] Best-effort: transactions can be aborted for reasons other than conflicts –TLB misses, interrupts, certain function-call sequences, division instructions

Algorithmic Challenges Mini-transactions provide a significant handle on the difficult task of writing concurrent applications –DCAS is already a big help [A, Hillel, 2006, 2009] –Experience with hardware TM support [Dice, Lev, Marathe, Moir, Olszewski, Nussbaum SPAA 2010] [Carouge, Spear, DISC 2010] Design algorithms accommodating the best-effort nature of mini-transactions Avoid sure killers Work around the small data sets –amorphous data parallelism [Pingali, Kulkarni, Nguyen, Burtscher, Mendez-Lojo, Prountzos, Sui, Zhong 2009]

Programming Support Creating patterns for employing mini- transactions, hopefully, encapsulated within programming language support Cleanly combine with native (un instrumented) access to the locations accessed by mini-transactions –Beware of privatization scenarios

Summary Facilitate the design of efficient and correct concurrent applications, in the post-TM era. –Capitalize on lessons learned and wide interest in TM –Multitude of approaches Specifically, develop a model, algorithms and programming patterns that for best-effort mini- transactions

Thank you!

The Complexity of Transactional Memory & What to Do About It Hagit Attiya Technion & EPFL.

Similar presentations

Presentation on theme: "The Complexity of Transactional Memory & What to Do About It Hagit Attiya Technion & EPFL."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Complexity of Transactional Memory & What to Do About It Hagit Attiya Technion & EPFL.

Similar presentations

Presentation on theme: "The Complexity of Transactional Memory & What to Do About It Hagit Attiya Technion & EPFL."— Presentation transcript:

Similar presentations

About project

Feedback