Lowering the Overhead of Software Transactional Memory Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William.

Slides:

Advertisements

Similar presentations

Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.

Advertisements

Concurrent programming for dummies (and smart people too) Tim Harris & Keir Fraser.

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.

1 Lecture 18: Transactional Memories II Papers: LogTM: Log-Based Transactional Memory, HPCA’06, Wisconsin LogTM-SE: Decoupling Hardware Transactional Memory.

Performance of Cache Memory

IDIT KEIDAR DMITRI PERELMAN RUI FAN EuroTM 2011 Maintaining Multiple Versions in Software Transactional Memory 1.

CS510 – Advanced Operating Systems 1 The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton.

Virendra J. Marathe, William N. Scherer III, and Michael L. Scott Department of Computer Science University of Rochester Presented by: Armand R. Burks.

Software Transactional Memory Kevin Boos. Two Papers Software Transactional Memory for Dynamic-Sized Data Structures (DSTM) – Maurice Herlihy et al –

McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha.

Toward High Performance Nonblocking Software Transactional Memory Virendra J. Marathe University of Rochester Mark Moir Sun Microsystems Labs.

Ali Saoud Object Based Transactional Memory. Introduction Resent trends go towards object based SMT because it’s dynamic Word-based STM systems are more.

Nested Transactional Memory: Model and Preliminary Architecture Sketches J. Eliot B. Moss Antony L. Hosking.

PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.

DMITRI PERELMAN ANTON BYSHEVSKY OLEG LITMANOVICH IDIT KEIDAR DISC 2011 SMV: Selective Multi-Versioning STM 1.

TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.

DMITRI PERELMAN IDIT KEIDAR TRANSACT 2010 SMV: Selective Multi-Versioning STM 1.

1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:

1 Lecture 24: Transactional Memory Topics: transactional memory implementations.

Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.

1 Lecture 6: TM – Eager Implementations Topics: Eager conflict detection (LogTM), TM pathologies.

CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.

TxLinux: Using and Managing Hardware Transactional Memory in an Operating System Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan,

Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P.

The Cost of Privatization Hagit Attiya Eshcar Hillel Technion & EPFLTechnion.

Software Transaction Memory for Dynamic-Sized Data Structures presented by: Mark Schall.

Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.

Multiprocessor Cache Coherency

KAUSHIK LAKSHMINARAYANAN MICHAEL ROZYCZKO VIVEK SESHADRI Transactional Memory: Hybrid Hardware/Software Approaches.

File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.

Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.

An Integrated Hardware-Software Approach to Transactional Memory Sean Lie Theory of Parallel Systems Monday December 8 th, 2003.

An Introduction to Software Transactional Memory

Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)

TRANSACT 2006 Hardware Acceleration of Software Transactional Memory 1 Hardware Acceleration of Software Transactional Memory Arrvindh Shriraman, Virendra.

Cosc 4740 Chapter 6, Part 3 Process Synchronization.

Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.

A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.

SWE202 Review. Processes Process State As a process executes, it changes state – new: The process is being created – running: Instructions are being.

CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.

CE Operating Systems Lecture 3 Overview of OS functions and structure.

1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.

State Teleportation How Hardware Transactional Memory can Improve Legacy Data Structures Maurice Herlihy and Eli Wald Brown University.

DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations Hyojin Sung and Sarita Adve Department of Computer Science.

DOUBLE INSTANCE LOCKING A concurrency pattern with Lock-Free read operations Pedro Ramalhete Andreia Correia November 2013.

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.

Processes and Virtual Memory

Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.

CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.

MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.

Concurrent Programming Without Locks Based on Fraser & Harris’ paper.

4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:

December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.

Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III

Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun

Part 2: Software-Based Approaches

Multiprocessor Cache Coherency

Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.

Enforcing Isolation and Ordering in STM Systems

Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III

A Qualitative Survey of Modern Software Transactional Memory Systems

Arrvindh Shriraman, Michael F. Spear, Hemayet Hossain, Virendra J

Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E

Printed on Monday, December 31, 2018 at 2:03 PM.

Virtual Memory Hardware

Hybrid Transactional Memory

Lecture 23: Transactional Memory

Dynamic Performance Tuning of Word-Based Software Transactional Memory

Presentation transcript:

Lowering the Overhead of Software Transactional Memory Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William N. Scherer III, Michael L. Scott Featuring: RSTM – low overhead STM library for C++ Presenting: Yosef Etigin

What is this paper about? Design and implementation of RSTM. RSTM is meant to be a fast STM library for C++ multi-threaded programs. RSTM main features: Cache-optimized metadata organization. No memory allocations during runtime, except for cloning objects. Use a contention manager to tune performance. Allow different strategies: eager/lazy acquire, visible/invisible readers.

Where RSTM fits in? Requires atomic load/store and CAS in hardware. Provides C++ “Smart Pointers” API that can be used to safely access shared data within transactions. beginTx { openRO, openRW } endTx HW: atomic Load & Store, CAS RSTM Library User application

Overview RSTM Theory Transaction Semantics Readers Writers RSTM Design Descriptor Data Object Shared Object Handle RSTM Implementation Resolving the data object Open for read-only Acquire Open for read-write Commit Abort Performance results Conclusion

Transaction Semantics Data is considered in object granularity. Objects are shadowed, rather than changed “in place”. Inside a transaction, objects may be opened for read-only or for read-write. Objects that are opened for read-write are cloned, and those for read-only are not. “Commit” tries to set the clone as the current object. “Abort” tries to set the original as the current object. Transactions may abort each other, but they consult the Contention Manager (CM) before doing so.

Readers A thread that opens an object for reading may become a “visible” or “invisible” reader. “visible” = visible to writers. Reader must have a consistent view of its opened objects. “consistent” = no writer has made a change that the reader sees only in some of its opened objects. Inconsistency might cause hardware exceptions and infinite loops, thus: Invisible reader, on every “open”, must validate all previously opened objects (O(n 2 ) cost). Visible reader must be explicitly aborted by a writer that acquired it.

Writers Opening an object for writing involves “acquiring” it. Acquiring is getting exclusive access to the object. Writers conflict with other writers and with visible readers. Visible readers can co-exist with each other. Acquiring can be done in eager or lazy fashion: Eager – acquire an object as soon as it’s opened. Lazy – acquire it prior to committing the transaction. Eager acquire aborts doomed transactions immediately, but causes more conflicts. Lazy acquire enables readers to run together with a writer that is not committing yet. Has the same consistency issue as with invisible reads.

Contention Management CM is a Thread-local object Notified of transaction events Decides what to do on a conflict: Abort a transaction or spin-wait Which transaction to abort, if any For instance: “Polka” CM Prefers writers over readers

Overview RSTM Theory Transaction Semantics Readers Writers RSTM Design Descriptor Data Object Shared Object Handle RSTM Implementation Resolving the data object Open for read-only Acquire Open for read-write Commit Abort Performance results Conclusion

RSTM Design Descriptor (writer) Shared Object Handle Data Object (New) owner header Data Object (Old) next visible readers Descriptor (reader) Descriptor (reader) Thread 1Thread 2 Thread 3

Descriptor Each thread has a static descriptor that is used for all transactions of this thread. Don’t support nested transactions Descriptor has: Status: ACTIVE / COMMITTED / ABORTED Lists of opened objects: Visible, invisible reads. Eager, lazy writes.

Data Object Shared objects hold, in addition to data fields, “owner” and “next” fields. Owner is the descriptor of the current writer thread, if any. Next is the original object, if this is a writer- made clone.

Shared Object Handle (1) Encapsulates a reference to a shared object. Global variables are handles rather than pointers. Direct pointers are obtained within a transaction, via “open”. Holds: “header” word - identifies the current version of the object. “visible readers” word – bitmap of the visible readers.

Shared Object Handle (2) The header is a single word that holds a pointer and a dirty bit. Take advantage of address alignment The pointer holds some data object “pObj”. The dirty bit tells whether “pObj” is a clean object, or a writer-made clone. Saves one dereference in the common case of non-conflicting access.

Shared Object Handle (3) “Visible readers” is a bitmap of the visible readers. Bit i of the mask is set if thread i is a visible reader of the object. Allows getting all readers or adding a reader in a single atomic operation. Limits the number of visible readers All others will be invisible

Overview RSTM Theory Transaction Semantics Readers Writers RSTM Design Descriptor Data Object Shared Object Handle RSTM Implementation Resolving the data object Open for read-only Acquire Open for read-write Commit Abort Performance results Conclusion

RSTM Implementation This section will provide pseudo-code for the most important STM operations: Open object for read-only Open object for read-write Commit Abort We present pseudo-code for methods of Descriptor class, which is the object that implements RSTM functionality.

Resolving the Data Object // This function returns the up-to-date data object, associated with // a handle. If the object has an active owner, call CM. Object *Descriptor::resolve(Handle *shared) { long snapshot = shared->header; Object *ptr = snapshot & ~1; // mask out LSB if (snapshot & 1) {// dirty switch (ptr->owner->m_status) { case ACTIVE: m_cm.handleConflict(this, ptr->owner); return NULL; case COMMITTED: return ptr; case ABORTED: return ptr->next; } } else {// clean return ptr; }

Open for Read-Only // Open an object for read-only Object *Descriptor::openRO(Handle *shared) { long headerSnapshot = shared->header; // find the data object Object *ptr; do { ptr = resolve(shared); } while (!ptr); if (m_isVisible) { m_visibleReads.add(shared); // install this tx as a visible reader of the object while (! CAS(&shared->readers, shared->readers, shared->readers | (1 << m_id)) ); // make sure no writer acquired this object before he could see the CAS above if (headerSnapshot != shared->header) abort(); } else { m_invisibleReads.add(shared); } validate(); return ptr; }

Open for Read-Write // Open an object for read-write Object * Descriptor:: openRW(Handle *shared) { // find the data object Object *ptr; do { ptr = resolve(shared); } while (!ptr); // make a writeable clone Object *clone = ptr->clone(); clone->owner = this; clone->next = ptr; // eager acquires now. lazy acquires later. if (m_isEager) { acquire(shared, clone); m_eagerWrites.add(shared, clone); } else { m_lazyWrites.add(shared, clone); } validate(); return clone; }

Acquire // acquire the object void Descriptor::acquire(Handle *shared, Object *clone) { // replace the header with a dirty reference to the clone if (!CAS( &shared->header, shared->header, (long)clone | 1)) abort(); // abort all visible readers for (i = 0; i readers) * 8; ++i) { if (shared->readers & (1 << i)) allDescriptors[i]->abort(); } // record this object for cleanup m_acquiredObjects.add( ); }

Commit // commit a transaction void Descriptor::onCommit() { validate(); // acquire now lazily opened-for-rw objects acquireLazyWrites(); // if this CAS succeeds our clones (if any) become the active objects CAS( &m_status, ACTIVE, COMMITTED ); if (m_status == COMMITTED) { // replace a dirty reference to our clone // with a clean reference to our clone for ( in m_acquiredObjects) { CAS( &shared->header, clone | 1, clone ); } for (Shared *shared in m_visibleReads) { while (!CAS( &shared->readers, shared->readers, shared->readers & ~(1 << m_id)) ); } } else { abort(); } Linearization Point

Abort // called when “Aborted” exception is caught void Descriptor::onAbort() { // after this CAS, our clones (if any) are discarded CAS( &m_status, ACTIVE, ABORTED ); // cleanup the written objects // replace a dirty reference to our clone // with a clean reference to the original object for ( in m_acquiredObjects) { CAS( &shared->header, clone | 1, clone->next ); } // remove the thread from readers bitmap of all // visibly opened objects for (Shared *shared in m_visibleReads) { while (!CAS( &shared->readers, shared->readers, shared->readers & ~(1 << m_id)) ); }

Overview RSTM Theory Transaction Semantics Readers Writers RSTM Design Descriptor Data Object Shared Object Handle RSTM Implementation Resolving the data object Open for read-only Acquire Open for read-write Commit Abort Performance results Conclusion

Performance Results (1) Compare ASTM and RSTM (previous work showed that ASTM outperforms DSTM and OSTM). Platform: 16-processor SunFire 6800 at 1.2GHz. Use several benchmarks with different configurations: visible/invisible readers, eager/lazy writers. Each benchmark was run for 10 seconds with 1 to 28 threads. Contention manager: “Polka”. Count successful transactions.

Performance Results (2) RSTM with invisible readers is ~2 times better than ASTM. Visible readers are expensive because each access reads the root node and causes cache invalidation. The only difference between C++ ASTM and RSTM is metadata organization.

Performance Results (3) In LinkedList, FGL performs bad if #threads > #CPUs due to preemption. In LinkedList, ASTM outperforms RSTM since each writer invalidates objects for many readers. HashTable allows great concurrency, so RSTM works well (~3 times faster than ASTM).

Performance Results (4) In RandomGraph and LFUCache, all STM’s perform worse than CGL, because these data structures do not allow much concurrency. Nevertheless, RSTM beats ASTM.

Conclusion RSTM has a novel metadata organization which reduces overhead, due to: One level of indirection instead of the common two. Using static instead of dynamic data structures. RSTM provides a variety of policies for conflict detection, so can be customized for a given workload. Compared to ASTM, RSTM gives better performance due to better metadata organization.