Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Maurice Herlihy (DEC), J. Eliot & B. Moss (UMass)
Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.
Distributed Systems 2006 Styles of Client/Server Computing.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
KAUSHIK LAKSHMINARAYANAN MICHAEL ROZYCZKO VIVEK SESHADRI Transactional Memory: Hybrid Hardware/Software Approaches.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Programming Paradigms for Concurrency Part 2: Transactional Memories Vasu Singh
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Detecting Atomicity Violations via Access Interleaving Invariants
Hardware and Software transactional memory and usages in MRE
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
James Larus and Christos Kozyrakis
Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam
Speculative Lock Elision
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
PHyTM: Persistent Hybrid Transactional Memory
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Michael, Triplett and Walpole.
Atomic Operations in Hardware
Atomic Operations in Hardware
Faster Data Structures in Transactional Memory using Three Paths
Effective Data-Race Detection for the Kernel
Challenges in Concurrent Computing
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Hardware Multithreading
Changing thread semantics
Lecture 6: Transactions
Lecture 21: Transactional Memory
Transactional Memory An Overview of Hardware Alternatives
Lecture 22: Consistency Models, TM
Hassium: Hardware Assisted Database Synchronization
Hybrid Transactional Memory
Design and Implementation Issues for Atomicity
Software Transactional Memory Should Not be Obstruction-Free
EE 4xx: Computer Architecture and Performance Programming
Decomposing Hardware Lock Elision
Lecture 23: Transactional Memory
Lecture 21: Transactional Memory
Lecture: Consistency Models, TM
Lecture: Transactional Memory
Advanced Operating Systems (CS 202) Memory Consistency and Transactional Memory Feb. 6, 2019.
Presentation transcript:

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012

Overview Shared Memory Access by Multiple Threads Concurrency bugs Transactional Memory (TM) Fixing Concurrency bugs with TM Hardware support for TM (HTM) Hybrid Transactional Memories Hardware acceleration for STM Q & A

Shared Memory Accesses How to prevent shared data access by multiple threads? Locks : allow only one thread to access. Too conservative – performance ? Programmer responsibility? Other idea ? Transactional Memory : Let all threads access, make visible to others only if access is correct.

Concurrency bugs Writing correct parallel programs is really hard! Possible synchronization bugs : Deadlock – multiple locks not managed well Atomicity violation – no lock used Others – priority inversion etc. not considered Possible solutions ? Lock hierarchy; adding more locks! Use Transactional Memory : Worry free atomic execution

Transactional Memory Transactions used in database systems since 1970s All or nothing – Atomicity No interference – Isolation Correctness – Consistency Transactional Memory : Make memory accesses transactional (atomic) Keywords : Commit, Abort, Spec access, Checkpoint

Fixing concurrency bug with Transactional Memory Procedure followed Known bug database – Deadlock, AV Try to apply TM fix instead of lock based Come up with Recipes of fixes Ingredients : 1.Atomic regions 2.Preemptible resources 3.SW Rollback 4.Atomic/Lock serialization

Bug Fix Recipes Recipes Replace Deadlock-prone locks Wrap all Asymmetric Deadlock Preemption Wrap Unprotected

Bug fix summary TM fixes usually easier TM cant fix all bugs Locks better in some cases R3 and R4 more widely applicable

From Using to Implementing TM How is it implemented ? Hardware (HTM) Software (STM) Hardware Memory Thread 1 Thread 2 Thread 3 Hardware Memory Thread 1 Thread 2 Thread 3 STM

Hardware Transactional Memories Save architectural state to checkpoint Use caches to do versioning for memory Updates to coherency protocol Conflict detection in hardware Commit transactions if no conflict Abort transactions if conflict (or special cond) Retry aborted transaction

BlueGene/Q : Hardware TM 16 core with 4 SMT with 32MB shared L2 Multi-versioned L2 cache 128 speculative IDs for versioning L1 speculative writes invisible to other threads Short running mode (L1-bypass) Long running mode (TLB-Aliasing) Upto 10 speculative ways guaranteed in L2 20 MB speculative state (actually much smaller)

Execution of a transaction Spcial cases : Irrevocable mode – for forward progress JMV example – MMIO Actions on commit fail Handling problematic transaction – single rollback

HTM vs. STM HardwareSoftware Fast (due to hardware operations)Slow (due to software validation/commit) Light code instrumentationHeavy code instrumentation HW buffers keep amount of metadata lowLots of metadata No need of a middlewareRuntime library needed Only short transactions allowed (why?)Large transactions possible How would you get the best of both?

Hybrid-TM Best-effort HTM (use STM for long trx) Possible conflicts between HW,SW and HW-SW Trx – What kind of conflicts do SW-Trx care about? – What kind of conflicts do HW-Trx care about? Some initial proposals: – HyTM: uses an ownership record per memory location (overhead?) – PhTM: HTM-only or (heavy) STM-only, low instrumentation

Hybrid NOrec Builds upon NOrec (no fine-grained shared metadata, only one global sequence lock) HW-Trx must wait SW-Trx writeback HW-Trx must notify SW-Trx of updates HW-Trx must be aborted by HW,SW-Trx How to reduce conflicts?

Instrumentation Subscription to SW commit notification – How about HW notification? Separation of subscribing and notifying – How about HW-Trx conflicts? Coordinate notification through HW-Trx – How about validation overhead?

How to Avoid the Narrow Waist? Update 1 variable atomically to access the whole memory Single counter, multiple threads/cores/processors Even worse in Norec, seqlock used for validation/lock Cnt Threads Memory STM OK for random access

How to Avoid the Narrow Waist? Seqlock (or c-lock) used for serial order Update 1 variable atomically to access the whole memory Single counter, multiple threads/cores/processors Cnt Threads Memory STM Cnt1 Cnt2 Threads Memory1 Memory2 OK for random access Better if memory accesses follow patterns

HTM vs. STM HardwareSoftware Fast (due to hardware operations)Slow (due to software validation/commit) Light code instrumentationHeavy code instrumentation HW buffers keep amount of metadata lowLots of metadata No need of a middlewareRuntime library needed Expensive to implement/changeMany versions currently available Different support from different vendorsFlexible middleware Only short transaction allowedLarge transactions possible How would you get the best of both? (HINT: current HW support implemented on processors, at the core of the platform, which means hard design)

TMACC FARM: FPGA coherently connected to 2 CPUs Mainly used for conflict detection (why not using it for operations on memory?) Asynch. Comm. with TMACC (possible? why is it good?)

TMACC Performance On- chip Off-chip SW

Thank you.

Hardware Transactional Memory Natively support transactional operations – Versioning, conflict detection Use L1-cache to buffer read/write set Conflict detection through the existing coherency protocol Commit by checking the state of cache lines in read/write set HardwareSoftware Fast (due to hardware operations)Slow (due to software validation/commit) Light code instrumentationHeavy code instrumentation HW buffers keep amount of metadata lowLots of metadata (to keep consistent) No need of a middlewareRuntime library needed E.g., Intel Haswell Microarch., AMD Advanced Synch. Facility