CS492B Analysis of Concurrent Programs Transactional Memory Jaehyuk Huh Computer Science, KAIST Based on Lectures by Prof. Arun Raman, Princeton University.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
1 Lecture 18: Transactional Memories II Papers: LogTM: Log-Based Transactional Memory, HPCA’06, Wisconsin LogTM-SE: Decoupling Hardware Transactional Memory.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Topic 6.3: Transactions and Concurrency Control Hari Uday.
Code Generation and Optimization for Transactional Memory Construct in an Unmanaged Language Programming Systems Lab Microprocessor Technology Labs Intel.
Transactional Memory – Implementation Lecture 1 COS597C, Fall 2010 Princeton University Arun Raman 1.
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 8: Transactional Memory – TCC Topics: “lazy” implementation (TCC)
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
1 Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of “lazy” TM.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
1 Lecture 10: TM Implementations Topics: wrap-up of eager implementation (LogTM), scalable lazy implementation.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
Transaction Management Chapter 9. What is a Transaction? A logical unit of work on a database A logical unit of work on a database An entire program An.
An Introduction to Software Transactional Memory
©2009 HP Confidential1 A Proposal to Incorporate Software Transactional Memory (STM) Support in the Open64 Compiler Dhruva R. Chakrabarti HP Labs, USA.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
7c.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Module 7c: Atomicity Atomic Transactions Log-based Recovery Checkpoints Concurrent.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.
Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian R. Murphy, Bratin Saha,
Novel Paradigms of Parallel Programming Prof. Smruti R. Sarangi IIT Delhi.
Adaptive Software Lock Elision
Lecture 20: Consistency Models, TM
James Larus and Christos Kozyrakis
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Advanced Operating Systems - Fall 2009 Lecture 8 – Wednesday February 4, 2009 Dan C. Marinescu Office: HEC 439 B. Office hours: M,
A Qualitative Survey of Modern Software Transactional Memory Systems
Changing thread semantics
Lecture: Consistency Models, TM
Lecture 6: Transactions
Chapter 10 Transaction Management and Concurrency Control
Lecture 21: Transactional Memory
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
Lecture 10: Coherence, Transactional Memory
Lecture 23: Transactional Memory
Lecture 21: Transactional Memory
Lecture: Transactional Memory
Transactional Memory Lecture 19:
Dynamic Performance Tuning of Word-Based Software Transactional Memory
Controlled Interleaving for Transactions
Presentation transcript:

CS492B Analysis of Concurrent Programs Transactional Memory Jaehyuk Huh Computer Science, KAIST Based on Lectures by Prof. Arun Raman, Princeton University

Parallel Programming 1.Find independent tasks in the algorithm 2.Map tasks to execution units (e.g. threads) 3.Define and implement synchronization among tasks 1.Avoid races and deadlocks, address memory model issues, … 4.Compose parallel tasks 5.Recover from errors 6.Ensure scalability 7.Manage locality 8.… 2

Parallel Programming 1.Find independent tasks in the algorithm 2.Map tasks to execution units (e.g. threads) 3.Define and implement synchronization among tasks 1.Avoid races and deadlocks, address memory model issues, … 4.Compose parallel tasks 5.Recover from errors 6.Ensure scalability 7.Manage locality 8.… Transactional Memory 3

Transactional Programming 4 void deposit(account, amount) { lock(account); int t = bank.get(account); t = t + amount; bank.put(account, t); unlock(account); } void deposit(account, amount) { atomic { int t = bank.get(account); t = t + amount; bank.put(account, t); } 1.Declarative Synchronization – What, not How 2.System implements Synchronization transparently

Transactional Memory Memory Transaction - An atomic and isolated sequence of memory accesses Transactional Memory – Provides transactions for threads r unning in a shared address space 5

Transactional Memory - Atomicity Atomicity – On transaction commit, all memory updates appear to take effect at once; on transaction abort, none o f the memory updates appear to take effect 6 void deposit(account, amount) { atomic { int t = bank.get(account); t = t + amount; bank.put(account, t); } Thread 1Thread 2 RD A : 0 RD WR RD A : 0 WR A : 10 WR A : 5 COMMIT ABORT CONFLICT

Transactional Memory - Isolation 7 Isolation – No other code can observe updates before commit Programmer only needs to identify operation sequence that should appear to execute atomically to other, concurrent threads

Transactional Memory - Serializability 8 Serializability – Result of executing concurrent transactions on a data structure must be identical to a result in which these transactions executed serially.

Some advantages of TM 1.Ease of use (declarative) 2.Composability 3.Expected performance of fine-grained locking 9

Composability : Locks 10 void transfer(A, B, amount) { synchronized(A) { synchronized(B) { withdraw(A, amount); deposit(B, amount); } void transfer(B, A, amount) { synchronized(B) { synchronized(A) { withdraw(B, amount); deposit(A, amount); } 1.Fine grained locking  Can lead to deadlock 2.Need some global locking discipline now

Composability : Locks 11 void transfer(A, B, amount) { synchronized(bank) { withdraw(A, amount); deposit(B, amount); } void transfer(B, A, amount) { synchronized(bank) { withdraw(B, amount); deposit(A, amount); } 1.Fine grained locking  Can lead to deadlock 2.Coarse grained locking  No concurrency

Composability : Transactions 12 void transfer(A, B, amount) { atomic { withdraw(A, amount); deposit(B, amount); } void transfer(B, A, amount) { atomic { withdraw(B, amount); deposit(A, amount); } 1.Serialization for transfer(A,B,100) and transfer(B,A,100) 2.Concurrency for transfer(A,B,100) and transfer(C,D,100)

Some issues with TM 1.I/O and unrecoverable actions 2.Atomicity violations are still possible 3.Interaction with non-transactional code 13

Atomicity Violation 14 atomic { … ptr = A; … } atomic { … ptr = NULL; } Thread 2 Thread 1 atomic { B = ptr->field; }

Interaction with non-transactional code 15 lock_acquire(lock); obj.x = 1; if (obj.x != 1) fireMissiles(); lock_release(lock); obj.x = 2; Thread 2 Thread 1

Interaction with non-transactional code 16 atomic { obj.x = 1; if (obj.x != 1) fireMissiles(); } obj.x = 2; Thread 2 Thread 1

Interaction with non-transactional code 17 atomic { obj.x = 1; if (obj.x != 1) fireMissiles(); } obj.x = 2; Thread 2 Thread 1 Weak Isolation – Transactions are serializable only against other transactions Strong Isolation – Transactions are serializable against all memory accesses (Non-transactional LD/ST are 1-instruc tion TXs)

Nested Transactions 18 void transfer(A, B, amount) { atomic { withdraw(A, amount); deposit(B, amount); } void deposit(account, amount) { atomic { int t = bank.get(account); t = t + amount; bank.put(account, t); } Semantics of Nested Transactions Flattened Closed Nested Open Nested

Nested Transactions - Flattened 19 int x = 1; atomic { x = 2; atomic flatten { x = 3; abort; }

Nested Transactions - Closed 20 int x = 1; atomic { x = 2; atomic closed { x = 3; abort; }

Nested Transactions - Open 21 int x = 1; atomic { x = 2; atomic open { x = 3; } abort; }

Nested Transactions – Open – Use Case 22 int counter = 1; atomic { … atomic open { counter++; }

Transactional Programming - Summary 1.Transactions do not generate parallelism 2.Transactions target performance of fine-grained effort of coarse-grained locking 3.Various constructs studied previously (atomic, retry, orelse,…) 4. Different semantics (Weak/Strong Isolation, Nesting) 23

TM Implementation Data Versioning Eager Versioning Lazy Versioning Conflict Detection and Resolution Pessimistic Concurrency Control Optimistic Concurrency Control 24 Conflict Detection Granularity Object Granularity Word Granularity Cache line Granularity

Data Versioning 25 Eager Versioning (Direct Update)Lazy Versioning (Deferred Update)

Conflict Detection and Resolution - Pessimistic 26 Time No ConflictConflict with StallConflict with Abort

Conflict Detection and Resolution - Optimistic 27 Time No ConflictConflict with AbortConflict with Commit

TM Implementation Data Versioning Eager Versioning Lazy Versioning Conflict Detection and Resolution Pessimistic Concurrency Control Optimistic Concurrency Control 28 Conflict Detection Granularity Object Granularity Word Granularity Cache line Granularity

Examples Hardware TM Stanford TCC: Lazy + Optimistic Intel VTM: Lazy + Pessimistic Wisconsin LogTM: Eager + Pessimistic UHTM SpHT 29 Software TM Sun TL2: Lazy + Optimistic (R/W) Intel STM: Eager + Optimistic (R)/Pessimistic (W) MS OSTM: Lazy + Optimistic (R)/Pessimistic (W) Draco STM STMLite DSTM Can find many more at

Software Transactional Memory (STM) 30 atomic { a.x = t1 a.y = t2 if (a.z == 0) { a.x = 0 a.z = t3 } tmTXBegin() tmWr(&a.x, t1) tmWr(&a.y, t2) if (tmRd(&a.z) != 0) { tmWr(&a.x, 0) tmWr(&a.z, t3) } tmTXCommit()

31 Intel McRT-STM Strong or Weak IsolationWeak Transaction GranularityWord or Object Lazy or Eager VersioningEager Concurrency ControlOptimistic read, Pessimistic Write Nested TransactionClosed

McRT-STM Runtime Data Structures 32 Transaction Descriptor (per thread) Used for conflict detection, commit, abort, … Includes read set, write set, undo log or write buffer Transaction Record (per datum) Pointer-sized record guarding shared datum Tracks transactional state of datum Shared: Read-only access by multiple readers Value is odd (low bit is 1) Exclusive: Write-only access by single owner Value is aligned pointer to owning transaction’s descriptor

33 atomic { t = foo.x; bar.x = t; t = foo.y; bar.y = t; } T1 atomic { t1 = bar.x; t2 = bar.y; } T2 T1 copies foo into bar T2 reads bar, but should not see intermediate values Class Foo { int x; int y; }; Foo bar, foo; McRT-STM: Example

34 stmStart(); t = stmRd(foo.x); stmWr(bar.x,t); t = stmRd(foo.y); stmWr(bar.y,t); stmCommit(); T1 stmStart(); t1 = stmRd(bar.x); t2 = stmRd(bar.y); stmCommit(); T2 T1 copies foo into bar T2 reads bar, but should not see intermediate values McRT-STM: Example

McRT-STM Operations 35 STM read (Optimistic) Direct read of memory location (eager versioning) Validate read data Check if unlocked and data version <= local timestamp If not, validate all data in read set for consistency validate() {for in transaction’s read set, if (*txnrec != ver) abort();} Insert in read set Return value STM write (Pessimistic) Validate data Check if unlocked and data version <= local timestamp Acquire lock Insert in write set Create undo log entry Write data in place (eager versioning)

36 stmStart(); t = stmRd(foo.x); stmWr(bar.x,t); t = stmRd(foo.y); stmWr(bar.y,t); stmCommit; T1 stmStart(); t1 = stmRd(bar.x); t2 = stmRd(bar.y); stmCommit(); T2 hdr x = 0 y = 0 5 hdr x = 9 y = 7 3 foo bar Reads Reads T1 x = 9 Writes Undo T2 waits y = 7 7 Abort T2 should read [0, 0] or should read [9,7] Commit McRT-STM: Example

Hardware Transactional Memory Transactional memory implementations require tracking read / write sets Need to know whether other cores have accessed data we are using Expensive in software – Have to maintain logs / version ID in memory – Every read / write turns into several instructions – These instructions are inherently concurrent with the actual accesses, but STM does them in series

Hardware Transactional Memory Idea: Track read / write sets in Hardware – Unlike Hardware Accelerated TM, handle commit / rollback in hardware as well Cache coherent hardware already manages much of this Basic idea: map storage to cache HTM is basically a smarter cache – Plus potentially some other storage buffers etc Can support many different TM paradigms – Eager, lazy – optimistic, pessimistic Default seems to be Lazy, pessimistic

HTM – The good Most hardware already exists Only small modification to cache needed Core Regular Accesses L1 $ Tag Data L1 $ Kumar et al. (Intel)

HTM – The good Most hardware already exists Only small modification to cache needed Core Regular Accesses Transactional $L1 $ Tag Data Tag Addl. Tag Old Data New Data Transactional Accesses L1 $ Kumar et al. (Intel)

HTM Example TagdataTrans?StateTagdataTrans?state atomic { read A write B =1 } atomic { read B Write A = 2 } Bus Messages:

HTM Example TagdataTrans?StateTagdataTrans?state B0YS atomic { read A write B =1 } atomic { read B Write A = 2 } Bus Messages: 2 read B

HTM Example TagdataTrans?StateTagdataTrans?state A0YS B0YS atomic { read A write B =1 } atomic { read B Write A = 2 } Bus Messages: 1 read A

HTM Example TagdataTrans?StateTagdataTrans?state A0YS B1YMB0YS atomic { read A write B =1 } atomic { read B Write A = 2 } Bus Messages: NONE

Conflict, visibility on commit TagdataTrans?StateTagdataTrans?state A0NS B1NMB0YS atomic { read A write B =1 } atomic { read B ABORT Write A = 2 } Bus Messages: 1 B modified

Conflict, notify on write TagdataTrans?StateTagdataTrans?state A0YS B1YMB0YS atomic { read A write B =1 ABORT ? } atomic { read B ABORT? Write A = 2 } Bus Messages:1 speculative write to B 2: 1 conflicts with me

HTM – The good Strong isolation

HTM – The good ISA Extensions Allows ISA extentions (new atomic operations) Double compare and swap Necessary for some non-blocking algorithms Similar performance to handtuned java.util.concurrent im plementation (Dice et al, ASPLOS ’09) int DCAS(int *addr1, int *addr2, int old1, int old2, int new1, int new2) atomic { if ((*addr1 == old1) && (*addr2 == old2)) { *addr1 = new1; *addr2 = new2; return(TRUE); } else return(FALSE); }

HTM – The good ISA Extensions Allows ISA extentions (new atomic operations) Atomic pointer swap Elem 1 Elem 2 Loc 1 Loc 2

HTM – The good ISA Extensions Allows ISA extentions (new atomic operations) Atomic pointer swap – 21-25% speedup on canneal benchmark (Dice et al, SPAA’10) Elem 1 Elem 2 Loc 1 Loc 2

HTM – The bad False Sharing TagdataTrans?StateTagdataTrans?state C/D0/0YS atomic { read A write D = 1 } atomic { read C Write B = 2 } Bus Messages: Read C/D

HTM – The bad False Sharing TagdataTrans?StateTagdataTrans?state C/D0/0YS A/B0/0YS atomic { read A write D = 1 } atomic { read C Write B = 2 } Bus Messages: Read A/B

HTM – The bad False sharing TagdataTrans?StateTagdataTrans?state C/D0/1YMC/D0/0YS A/B0/0YS atomic { read A write D = 1 } atomic { read C Write B = 2 } Bus Messages: Write C/D UH OH

HTM – The bad Context switching Cache is unaware of context switching, paging, etc OS switching typically aborts transactions

HTM – The bad Inflexible Poor support for advanced TM constructs Nested Transactions Open variables etc

HTM – The bad Limited Size TagdataTrans?StateTagdataTrans?state A0YM atomic { read A read B read C read D } Write C/ Bus Messages: Read A

HTM – The bad Limited Size TagdataTrans?StateTagdataTrans?state A0YM B0YM atomic { read A read B read C read D } Bus Messages: Read B

HTM – The bad Limited Size TagdataTrans?StateTagdataTrans?state A0YM B0YM C0YM atomic { read A read B read C read D } Bus Messages: Read C

HTM – The bad Limited Size TagdataTrans?StateTagdataTrans?state A0YM B0YM C0YM atomic { read A read B read C read D } Bus Messages: … UH OH

Kumar (Intel) Hardware vs. Software TM Hardware Approach Low overhead – Buffers transactional state i n Cache More concurrency – Cache-line granularity Bounded resource Software Approach High overhead – Uses Object copying to kee p transactional state Less Concurrency – Object granularity No resource limits Useful BUT Limited What if we could have both worlds simultaneously?