Transactional Memory: Architectural support for lock-free data structures Maurice Herlihy and J. Eliot B. Moss, ISCA '93 Proceedings of the 20th annual international symposium on computer architecture Presented By: Ajithchandra Saya Virginia Tech
Outline WHY – The NEED WHAT – The solution My opinion Transactional memory – Idea Architectural support Cache coherency Mechanisms Cache line states Bus cycles Working Simulation Results Positives Extensions Current work Conclusions My opinion
WHY - NEED Increasing need for concurrent programs Growing shared data access Conventional locking mechanisms Priority Inversion Lock convoying Hard to write concurrent programs Data races Deadlocks
Solution LOCK-FREE SYNCRONIZATION
Transactional Memory Multiprocessor architecture support intended to make lock-free synchronization easy and efficient Extension of cache coherence protocols
IDEA TRANSACTIONS Finite sequence of machine instructions executed by a single process Satisfies two important properties - Serializability Atomicity Analogous to transactions in conventional database systems
IDEA Contd… Transactional primitives For memory access - Load Transactional (LT) Load Transactional Exclusive (LTX) Store Transactional (ST) Read Sets, Write sets, Data sets For manipulating transactional states - Commit Abort Validate
IDEA Contd … Example A simple memory read and update LT VALIDATE ST COMMIT If (2) or (4) fail, try again from step (1)
Architectural support Extension to cache coherence protocol If access conflict can be avoided then transaction conflict can also be avoided Current cache coherence mechanisms – Snoopy cache (Bus based) Directory based (network based) Separate caches Regular cache – Direct mapped Transaction cache – Fully associative
Cache line states
Bus cycles
Snoopy cache working Memory responds to read cycles only if no cache responds Memory responds to write cycles Snoops bus address lines Ignores if not interested For regular cache – T_READ, READ - Returns value if valid -> valid If Reserved, Dirty -> valid For T_READ -> invalidates RFO/T_RFO – Returns value and invalidates
Snoopy cache working For transactional cache TSTATUS False: True: Behavior same as regular cache Ignore tags other than NORMAL True: T_READ – returns value Others – Returns BUSY
Working Transactional operation modifications are made to XABORT tags only Two copies of cached items XABORT XCOMMIT COMMIT Success Failure XCOMMIT Empty Normal XABORT
Working Contd… Processor flags Internally managed by the processor TACTIVE Indicates whether a transaction is active TSTATUS If transaction is active i.e. TACTIVE = true, then if TSTATUS True - transaction is ongoing False - transaction is aborted
Working Contd … Taking the previous example LT Operations – LT, VALIDATE, ST, COMMIT LT Data availability Action XABORT Nothing NORMAL Mark as XABORT Create another copy marked as XCOMMIT T_READ Two cached copies marked as XABORT and XCOMMIT BUSY Drop all XABORT Change XCOMMIT to NORMAL
Working Contd … VALIDATE – Returns TSTATUS flag ABORT True – Continues False - Sets TSTATUS = true TACTIVE = false ABORT COMMIT – Returns TSTATUS flag
Working Contd … New data entry to cache Replace EMPTY Replace NORMAL Replace XCOMMIT Replacing XCOMMIT should write back current data to memory XCOMMIT state is used to improve performance
Simulations Comparison 2 Software techniques 2 Hardware techniques TTS with exponential backoff Software queues 2 Hardware techniques LL/SC with exponential backoff Hardware queues SETUP Proteus simulator Number of processors 1 to 32 Simple benchmarks
Counting Benchmark
Producer/Consumer benchmark
Doubly linked list Benchmark
Positives Uses same cache coherence and control protocols The additional hardware support required only at primary cache Commit/abort are operations internal to cache Doesn’t require communicating with other processes or writing back to memory
Extensions Transactional cache size – software overflow handling Additional primitives for faster updates Smaller data sets and shorter durations Adaptive backoff techniques or hardware queues Memory consistency models
Current Work HTM Hardware transactional memory is dependent on cache coherency policies and platform’s architecture Unbounded Transactional memory HTM migration during process migration
Current Work Software Transactional Memory (STM) Strong vs. weak isolation Eager vs. lazy updates Eager vs. Lazy contention detection Contention manager Visible vs. invisible reads Privatization
Conclusions Lock-free synchronization to avoid issues with locking techniques It is easy and efficient as conventional locking techniques Competitive/Outperforms existing lock based techniques Uses current techniques to improve performance
Opinion Novel technique to avoid synchronization issues Requires little hardware modifications Extensions are useful to make it practically usable
Why did it take so long for transactional memory to catch the limelight ?? Authors coined the term in 1993 Paper cited 1726 times1 Citation graph Source: Maurice Herlihy: Transactional Memory Today. ICDCIT 2010: 1-12 1 Google scholar citation count
The Gartner Hype cycle Source: Maurice Herlihy: Transactional Memory Today. ICDCIT 2010: 1-12
References Maurice Herlihy: Transactional Memory Today. ICDCIT 2010: 1-12 Publications by author http://www.informatik.uni- trier.de/~ley/pers/hd/h/Herlihy:Maurice.ht ml Transactional Memory Online web page at: http://www.cs.wisc.edu/trans-memory/
Questions ???
Thank you