Transactional Memory: Architectural Support for Lock- Free Data Structures Herlihy & Moss Presented by Robert T. Bauer.

Slides:



Advertisements
Similar presentations
Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.
Advertisements

CSCI 8150 Advanced Computer Architecture
Maurice Herlihy (DEC), J. Eliot & B. Moss (UMass)
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Multiprocessors—Synchronization. Synchronization Why Synchronize? Need to know when it is safe for different processes to use shared data Issues for Synchronization:
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Performance of Cache Memory
Transactional Memory Part 1: Concepts and Hardware- Based Approaches 1Dennis Kafura – CS5204 – Operating Systems.
1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
The University of Adelaide, School of Computer Science
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
Translation Buffers (TLB’s)
CS-510 Transactional Memory: Architectural Support for Lock-Free Data Structures By Maurice Herlihy and J. Eliot B. Moss 1993 Presented by Steve Coward.
NUMA coherence CSE 471 Aut 011 Cache Coherence in NUMA Machines Snooping is not possible on media other than bus/ring Broadcast / multicast is not that.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Snooping Cache and Shared-Memory Multiprocessors
Cache Organization of Pentium
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
An Integrated Hardware-Software Approach to Transactional Memory Sean Lie Theory of Parallel Systems Monday December 8 th, 2003.
Transactional Memory CDA6159. Outline Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional.
1 Hardware Transactional Memory (Herlihy, Moss, 1993) Some slides are taken from a presentation by Royi Maimon & Merav Havuv, prepared for a seminar given.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.
CS510 Concurrent Systems Jonathan Walpole. Transactional Memory: Architectural Support for Lock-Free Data Structures By Maurice Herlihy and J. Eliot B.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
1 Based on: The art of multiprocessor programming Maurice Herlihy and Nir Shavit, 2008 Appendix A – Software Basics Appendix B – Hardware Basics Introduction.
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Advanced Operating Systems (CS 202) Transactional memory Jan, 27, 2016 slide credit: slides adapted from several presentations, including stanford TCC.
The University of Adelaide, School of Computer Science
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
תרגול מס' 5: MESI Protocol
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
A Study on Snoop-Based Cache Coherence Protocols
Cache Coherence for Shared Memory Multiprocessors
Reactive Synchronization Algorithms for Multiprocessors
Multiprocessor Cache Coherency
The University of Adelaide, School of Computer Science
Example Cache Coherence Problem
Cache Coherence (controllers snoop on bus transactions)
Cache Coherence Protocols 15th April, 2006
James Archibald and Jean-Loup Baer CS258 (Prof. John Kubiatowicz)
Lecture 5: Snooping Protocol Design Issues
Part 1: Concepts and Hardware- Based Approaches
Lecture 25: Multiprocessors
Lecture 4: Synchronization
CS 3410, Spring 2014 Computer Science Cornell University
The University of Adelaide, School of Computer Science
Cache coherence CEG 4131 Computer Architecture III
Lecture 24: Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Presentation transcript:

Transactional Memory: Architectural Support for Lock- Free Data Structures Herlihy & Moss Presented by Robert T. Bauer

Problem Software implementations of lock-free (not using locks) data structures do not perform as well as locking-based implementations. –Qualifications: Lock based implementations can suffer from: –Priority Inversion; –Convoying; –Deadlock; and –Contention & Synchronization (Memory Barrier) In the absence of these, lock based implementations can out perform lock-free approaches.

“Solution” If software is the problem, perhaps the solution is hardware. In this case the solution tendered is transactional memory –Modify cache-coherence protocol –Provide new instructions –Goal: Make lock-free approaches as efficient and easy to use as conventional lock-based approaches

Results Demonstrate that transactional memory can be more efficient than: –Test and Test and Set (TTS) –MCS (Software Queueing – instead of spinning wait on a queue) –LL/SC –Hardware Queueing – uses cache-lines to maintain the “list” Important: The reported results were obtained from a simulator

About the simulator 32 processors Regular cache: direct-mapped with 2048 eight byte lines Transaction cache 64 eight byte lines Simulator based on Proteus – doesn’t capture effects of I or D caches. Simulation: –Cache (regular or transaction) access = 1 cycle –Single cycle commit (is this realistic???)

Cache Memory Bus Cycles –Read – (cache line access: shared) –Read For Ownership (RFO) – private read – (cache line access: exclusive) –Write – (cache line access: exclusive) –T_Read –T_RFO rfo is usually issued by a compiler. Read a cache line and gain ownership over the line in anticipation of a subsequent write. –Busy Abort and retry

Transaction Operations: General Transaction operations cache two entries –XCommit (discard on commit) [old value] –XAbort (discard on abort)[new value] Transaction Commits –XCommit  Empty (contains no data) –XAbort  Normal (contains committed data) Transaction Aborts –XCommit  Normal –XAbort  Empty New Entry –Search for Empty entry –Search for Normal entry  If dirty, needs to be “evicted” –Search for XCommit (error in paper, this can never be “dirty”, but might be invalid)

Transaction Operations: LT LT operation –Exists XAbort in Trans. Cache  return value –Exists Normal in Trans. Cache  Change Normal to XAbort Allocate second entry with same data tag XCommit –Otherwise issue a T_Read cycle Create Trans. Cache entry tagged XCommit Create Trans. Cache entry tragged XAbort If read returns busy (cache line is being updated) –Drop all XAbort, set all XCommit  Normal, TStatus = False

Transaction Operations: ST ST Operation –Cache hit XAbort entry is updated –Cache miss Set up two cache lines as before –XCommit –XAbort Use T_RFO, set cache line state to reserved “Exclusive” (so T_READ, T_RFO from other processors will return “BUSY”) As before, if read cycle (T_RFO) returns “Busy” we abort the transaction

Transaction Operations: LTX LTX Operation –Use T_RFO on cache miss

Transaction Operations: Validate Validate –TStatus (false means trans has been aborted)

An Example (Counting) Read (exclusive access) Write Commit In multiprocessor environment, it is possible for all writes to be lost except for one. If each of M processors add “N” to counter (initially 0), the final value of counter is in the range: N ≤ counter ≤ M*N

Performance (Counting) Trans. Mem Locking: read lock, write lock, read counter, write counter, write lock == 5 mem ref LL/SC (single word mem) No commit (cache write)

Another Example (Double Linked List) Read (exclusive); Plan to write If no other processor has modifed anything In the transaction set (read ‘u’ write) Commit fails if another processor/transaction modified anything in the transaction set

Performance (Double Linked List) Trans Mem MCS LL/SC

Observations Many simplifications –Small data sets –Single cycle updates –S.C. Memory (no barriers) –Write back cache More complex cache control logic –Can only snoop on a write, but in transaction system write-first won’t work; so need to “propagate” ownership.