Transactional Memory – Implementation Lecture 1 COS597C, Fall 2010 Princeton University Arun Raman 1.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
Exploiting Distributed Version Concurrency in a Transactional Memory Cluster Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza University of Toronto,
Code Generation and Optimization for Transactional Memory Construct in an Unmanaged Language Programming Systems Lab Microprocessor Technology Labs Intel.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha.
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Data and Database Administration Chapter 12. Outline What is Concurrency Control? Background Serializability  Locking mechanisms.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Transaction Management and Concurrency Control
1 Lecture 8: Transactional Memory – TCC Topics: “lazy” implementation (TCC)
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
An Introduction to Software Transactional Memory
©2009 HP Confidential1 A Proposal to Incorporate Software Transactional Memory (STM) Support in the Open64 Compiler Dhruva R. Chakrabarti HP Labs, USA.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Concurrency unlocked Programming
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
CS492B Analysis of Concurrent Programs Transactional Memory Jaehyuk Huh Computer Science, KAIST Based on Lectures by Prof. Arun Raman, Princeton University.
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
Adaptive Software Lock Elision
Mihai Burcea, J. Gregory Steffan, Cristiana Amza
James Larus and Christos Kozyrakis
Speculative Lock Elision
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Lecture 19: Transactional Memories III
Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond
Enforcing Isolation and Ordering in STM Systems
A Qualitative Survey of Modern Software Transactional Memory Systems
Lecture: Consistency Models, TM
Lecture 6: Transactions
Chapter 10 Transaction Management and Concurrency Control
Lecture 21: Transactional Memory
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Atomic Commit and Concurrency Control
Design and Implementation Issues for Atomicity
Locking Protocols & Software Transactional Memory
Lecture 10: Coherence, Transactional Memory
Lecture 23: Transactional Memory
Lecture 21: Transactional Memory
Lecture: Transactional Memory
Controlled Interleaving for Transactions
Presentation transcript:

Transactional Memory – Implementation Lecture 1 COS597C, Fall 2010 Princeton University Arun Raman 1

Module Outline Lecture 1 (THIS LECTURE) Transactional Memory System Taxonomy Software Transactional Memory Hardware Accelerated STMs Lecture 2 (Thursday) Hardware based TMs Hybrid TMs When are transactions a good (bad) idea Assignment Compare lock-based and transaction-based implementations of a concurrent data structure 2

[0] TL2 and STAMP Benchmark Suite --- For ASSIGNMENT [1] Transactional Memory, Concepts, Implementations, and Opportunities - Christos Kozyrakis [2] Transactional Memory - Larus and Rajwar Chapter 4: Hardware Transactional Memory [3] McRT-STM: A High Performance Software Transactional Memory System for a Multi-core Runtime [4] Architectural Support for Software Transactional Memory (Hardware Accelerated STM) - Saha et al. [5] Signature-Accelerated STM (SigTM) [6] Case Study: Early Experience with a Commercial Hardware Transactional Memory Implementation [ASPLOS '09] dice.pdf?key1= &key2= &coll=GUIDE&dl=GUIDE&CFID= &CFTOKEN= [7] The Transactional Memory / Garbage Collection Analogy - Dan Grossman [8] Subtleties of Transactional Memory Atomicity Semantics - Blundell et al. Resources 3 Lot of the material in this lecture is sourced from here

Lecture Outline 1.Recap of Transactions 2.Transactional Memory System Taxonomy 3.Software Transactional Memory 4.Hardware Accelerated STM 4

Lecture Outline 1.Recap of Transactions 2.Transactional Memory System Taxonomy 3.Software Transactional Memory 4.Hardware Accelerated STMs 5

Parallel Programming 1.Find independent tasks in the algorithm 2.Map tasks to execution units (e.g. threads) 3.Define and implement synchronization among tasks 1.Avoid races and deadlocks, address memory model issues, … 4.Compose parallel tasks 5.Recover from errors 6.Ensure scalability 7.Manage locality 8.… 6

Parallel Programming 1.Find independent tasks in the algorithm 2.Map tasks to execution units (e.g. threads) 3.Define and implement synchronization among tasks 1.Avoid races and deadlocks, address memory model issues, … 4.Compose parallel tasks 5.Recover from errors 6.Ensure scalability 7.Manage locality 8.… Transactional Memory 7

Transactional Programming 8 void deposit(account, amount) { lock(account); int t = bank.get(account); t = t + amount; bank.put(account, t); unlock(account); } void deposit(account, amount) { atomic { int t = bank.get(account); t = t + amount; bank.put(account, t); } 1.Declarative Synchronization – What, not How 2.System implements Synchronization transparently

Transactional Memory Memory Transaction - An atomic and isolated sequence of memory accesses Transactional Memory – Provides transactions for threads running in a shared address space 9

Transactional Memory - Atomicity Atomicity – On transaction commit, all memory updates appear to take effect at once; on transaction abort, none of the memory updates appear to take effect 10 void deposit(account, amount) { atomic { int t = bank.get(account); t = t + amount; bank.put(account, t); } Thread 1Thread 2 RD A : 0 RD WR RD A : 0 WR A : 10 WR A : 5 COMMIT ABORT CONFLICT

Transactional Memory - Isolation 11 Isolation – No other code can observe updates before commit Programmer only needs to identify operation sequence that should appear to execute atomically to other, concurrent threads

Transactional Memory - Serializability 12 Serializability – Result of executing concurrent transactions on a data structure must be identical to a result in which these transactions executed serially.

Some advantages of TM 1.Ease of use (declarative) 2.Composability 3.Expected performance of fine-grained locking 13

Composability : Locks 14 void transfer(A, B, amount) { synchronized(A) { synchronized(B) { withdraw(A, amount); deposit(B, amount); } void transfer(B, A, amount) { synchronized(B) { synchronized(A) { withdraw(B, amount); deposit(A, amount); } 1.Fine grained locking  Can lead to deadlock 2.Need some global locking discipline now

Composability : Locks 15 void transfer(A, B, amount) { synchronized(bank) { withdraw(A, amount); deposit(B, amount); } void transfer(B, A, amount) { synchronized(bank) { withdraw(B, amount); deposit(A, amount); } 1.Fine grained locking  Can lead to deadlock 2.Coarse grained locking  No concurrency

Composability : Transactions 16 void transfer(A, B, amount) { atomic { withdraw(A, amount); deposit(B, amount); } void transfer(B, A, amount) { atomic { withdraw(B, amount); deposit(A, amount); } 1.Serialization for transfer(A,B,100) and transfer(B,A,100) 2.Concurrency for transfer(A,B,100) and transfer(C,D,100)

Some issues with TM 1.I/O and unrecoverable actions 2.Atomicity violations are still possible 3.Interaction with non-transactional code 17

Atomicity Violation 18 atomic { … ptr = A; … } atomic { … ptr = NULL; } Thread 2 Thread 1 atomic { B = ptr->field; }

Interaction with non-transactional code 19 lock_acquire(lock); obj.x = 1; if (obj.x != 1) fireMissiles(); lock_release(lock); obj.x = 2; Thread 2 Thread 1

Interaction with non-transactional code 20 atomic { obj.x = 1; if (obj.x != 1) fireMissiles(); } obj.x = 2; Thread 2 Thread 1

Interaction with non-transactional code 21 atomic { obj.x = 1; if (obj.x != 1) fireMissiles(); } obj.x = 2; Thread 2 Thread 1 Weak Isolation – Transactions are serializable only against other transactions Strong Isolation – Transactions are serializable against all memory accesses (Non-transactional LD/ST are 1- instruction TXs)

Nested Transactions 22 void transfer(A, B, amount) { atomic { withdraw(A, amount); deposit(B, amount); } void deposit(account, amount) { atomic { int t = bank.get(account); t = t + amount; bank.put(account, t); } Semantics of Nested Transactions Flattened Closed Nested Open Nested

Nested Transactions - Flattened 23 int x = 1; atomic { x = 2; atomic flatten { x = 3; abort; }

Nested Transactions - Closed 24 int x = 1; atomic { x = 2; atomic closed { x = 3; abort; }

Nested Transactions - Open 25 int x = 1; atomic { x = 2; atomic open { x = 3; } abort; }

Nested Transactions – Open – Use Case 26 int counter = 1; atomic { … atomic open { counter++; }

Nested Transactions ≠ Nested Parallelism! 27 Intelligent Speculation for Pipelined Multithreading – Neil Vachharajani, PhD Thesis, Princeton University

Transactional Programming - Summary 1.Transactions do not generate parallelism 2.Transactions target performance of fine-grained effort of coarse-grained locking 3.Various constructs studied previously (atomic, retry, orelse,…) 4. Different semantics (Weak/Strong Isolation, Nesting) 28

Lecture Outline 1.Recap of Transactions 2.Transactional Memory System Taxonomy 3.Software Transactional Memory 4.Hardware Accelerated STMs 29

TM Implementation Data Versioning Eager Versioning Lazy Versioning Conflict Detection and Resolution Pessimistic Concurrency Control Optimistic Concurrency Control 30 Conflict Detection Granularity Object Granularity Word Granularity Cache line Granularity

TM Implementation Data Versioning Eager Versioning Lazy Versioning Conflict Detection and Resolution Pessimistic Concurrency Control Optimistic Concurrency Control 31 Conflict Detection Granularity Object Granularity Word Granularity Cache line Granularity

Data Versioning 32 Eager Versioning (Direct Update)Lazy Versioning (Deferred Update)

TM Implementation Data Versioning Eager Versioning Lazy Versioning Conflict Detection and Resolution Pessimistic Concurrency Control Optimistic Concurrency Control 33 Conflict Detection Granularity Object Granularity Word Granularity Cache line Granularity

Conflict Detection and Resolution - Pessimistic 34 Time No ConflictConflict with StallConflict with Abort

Conflict Detection and Resolution - Optimistic 35 Time No ConflictConflict with AbortConflict with Commit

TM Implementation Data Versioning Eager Versioning Lazy Versioning Conflict Detection and Resolution Pessimistic Concurrency Control Optimistic Concurrency Control 36 Conflict Detection Granularity Object Granularity Word Granularity Cache line Granularity

Examples Hardware TM Stanford TCC: Lazy + Optimistic Intel VTM: Lazy + Pessimistic Wisconsin LogTM: Eager + Pessimistic UHTM SpHT 37 Software TM Sun TL2: Lazy + Optimistic (R/W) Intel STM: Eager + Optimistic (R)/Pessimistic (W) MS OSTM: Lazy + Optimistic (R)/Pessimistic (W) Draco STM STMLite DSTM Can find many more at

Lecture Outline 1.Recap of Transactions 2.Transactional Memory System Taxonomy 3.Software Transactional Memory (Intel McRT-STM) 4.Hardware Accelerated STMs 38

Software Transactional Memory (STM) 39 atomic { a.x = t1 a.y = t2 if (a.z == 0) { a.x = 0 a.z = t3 } tmTXBegin() tmWr(&a.x, t1) tmWr(&a.y, t2) if (tmRd(&a.z) != 0) { tmWr(&a.x, 0) tmWr(&a.z, t3) } tmTXCommit()

40 Intel McRT-STM Strong or Weak IsolationWeak Transaction GranularityWord or Object Lazy or Eager VersioningEager Concurrency ControlOptimistic read, Pessimistic Write Nested TransactionClosed

McRT-STM Runtime Data Structures 41 Transaction Descriptor (per thread) Used for conflict detection, commit, abort, … Includes read set, write set, undo log or write buffer Transaction Record (per datum) Pointer-sized record guarding shared datum Tracks transactional state of datum Shared: Read-only access by multiple readers Value is odd (low bit is 1) Exclusive: Write-only access by single owner Value is aligned pointer to owning transaction’s descriptor

42 atomic { t = foo.x; bar.x = t; t = foo.y; bar.y = t; } T1 atomic { t1 = bar.x; t2 = bar.y; } T2 T1 copies foo into bar T2 reads bar, but should not see intermediate values Class Foo { int x; int y; }; Foo bar, foo; McRT-STM: Example

43 stmStart(); t = stmRd(foo.x); stmWr(bar.x,t); t = stmRd(foo.y); stmWr(bar.y,t); stmCommit(); T1 stmStart(); t1 = stmRd(bar.x); t2 = stmRd(bar.y); stmCommit(); T2 T1 copies foo into bar T2 reads bar, but should not see intermediate values McRT-STM: Example

McRT-STM Operations 44 STM read (Optimistic) Direct read of memory location (eager versioning) Validate read data Check if unlocked and data version <= local timestamp If not, validate all data in read set for consistency validate() {for in transaction’s read set, if (*txnrec != ver) abort();} Insert in read set Return value STM write (Pessimistic) Validate data Check if unlocked and data version <= local timestamp Acquire lock Insert in write set Create undo log entry Write data in place (eager versioning)

45 stmStart(); t = stmRd(foo.x); stmWr(bar.x,t); t = stmRd(foo.y); stmWr(bar.y,t); stmCommit; T1 stmStart(); t1 = stmRd(bar.x); t2 = stmRd(bar.y); stmCommit(); T2 hdr x = 0 y = 0 5 hdr x = 9 y = 7 3 foo bar Reads Reads T1 x = 9 Writes Undo T2 waits y = 7 7 Abort T2 should read [0, 0] or should read [9,7] Commit McRT-STM: Example

Lecture Outline 1.Recap of Transactions 2.Transactional Memory System Taxonomy 3.Software Transactional Memory (Intel McRT-STM) 4.Hardware Accelerated STMs (HASTM) 46

47 Hardware Support?

Types of Hardware Support 48 Hardware-accelerated STM systems (HASTM, SigTM, …) Hardware-based TM systems (TCC, LTM, VTM, LogTM, …) Hybrid TM systems (Sun Rock)

49 Hardware Support? 1.1.8x – 5.6x slowdown over sequential 2.Most time goes in read barriers and validation

50 Hardware Support? Architectural Support for Software Transactional Memory – Saha et al.

Hardware-accelerated STM (HASTM) 51 ISA Extensions loadSetMark(addr) – Loads value at addr and sets mark bit associated with addr loadResetMark(addr) – Loads value at addr and clears mark bit associated with addr loadTestMark(addr) – Loads the value at addr and sets carry flag to value of mark bit resetMarkAll() – Clears all mark bits in cache and increments mark counter resetMarkCounter() – Resets mark counter readMarkCounter() – Reads mark counter value Hardware Support Per-thread mark bits at granularity of cache lines New register (mark counter) Used to build fast filters to speed up read barriers

52 HASTM – Cache line transitions

McRT-STM Operations 53 stmRd = stmRdBar + Rd stmRdBar(TxnRec* rec) { void* recval = *rec; void* txndesc = getTxnDesc(); if (recval == txndesc) return; if (isVersion(recVal) == false) recval = handleContention(rec); logRead(rec,recval); }

McRT-STM Operations 54 stmWr = stmWrBar + LogOldValue + Wr stmWrBar(TxnRec* rec) { void* recval = *rec; void* txndesc = getTxnDesc(); i f (recval == txndesc) return; if (isVersion(recVal) == false) recval = handleContention(rec); while (CAS(rec,recval,txndesc) == false) recval = handleContention(rec); logWrite(rec,recval); }

McRT-STM Operations 55 mov eax, [rec] /* load TxRec */ cmp eax, txndesc /* do I own exclusive */ jeq done test eax, #versionmask /* is a version no. */ jz contention mov ecx, [txndesc + rdsetlog] /*get log ptr*/ test ecx, #overflowmask jz overflow add ecx, 8 /*inc log ptr*/ mov [txndesc + rdsetlog], ecx mov [ecx – 8], rec /* logging */ mov [ecx – 4], eax /* logging */ done: … stmRd = stmRdBar + Rd Manage Contention Manage Overflow Fast Path Slow Path

HASTM Operations 56 loadTestMark eax, [rec] /* check 1st access */ jnae done test eax, #versionmask /* is a version no. */ jz contention mov ecx, [txndesc + rdsetlog] /*get log ptr*/ test ecx, #overflowmask jz overflow add ecx, 8 /*inc log ptr*/ mov [txndesc + rdsetlog], ecx mov [ecx – 8], rec /* logging */ mov [ecx – 4], eax /* logging */ done: … stmRd = stmRdBar + Rd Manage Contention Manage Overflow Fast Path Slow Path loadSetMark eax, [rec]

HASTM Operations 57 validate validate() { markCount = readMarkCounter(); resetMarkAll(); if (markCount == 0) /*no snoop or eviction*/ return; /* perform full read set validation */ for in transaction’s read set if (*txnrec != ver) abort(); }

HASTM Performance Improvement 58

HASTM Performance Improvement – Multicore 59 Binary Search TreeB-Tree

HASTM - Issues 60 Insufficient Cache Capacity Mark bits are only an acceleration mechanism Cache evictions, cause mark bits to be lost – revert to slow software validation Weak Isolation

Other Slides 61

62 Intel McRT-STM Lazy/Eager Eager

Other Transactional Programming Primitives 1.User-triggered abort: abort void transfer(A,B,amount) atomic{ try{ work(); } catch(error1) {fix_code();} catch(error2) {abort;} } 2.Conditional synchronization: retry Object blockingDequeue(q) atomic{ if(q.isEmpty()) retry; return dequeue(); } 3.Composing code sequences: orelse atomic{q1.blockingDequeue(); }orelse{q2.blockingDequeue(); } 63