Download presentation
Presentation is loading. Please wait.
1
Scalable, Reliable, Power-Efficient Communication for Hardware Transactional Memory Seth Pugsley, Manu Awasthi, Niti Madan, Naveen Muralimanohar and Rajeev Balasubramonian School of Computing Introduction Multi-cores have established themselves as the de-facto architecture of today and coming processor generations. To fully exploit the processing power supplied by these cores, methods to exploit concurrency would have to be devised, both at the hardware and system software levels. Low overhead, scalable methods for exploiting concurrency need to be devised. Hardware Transactional Memory (HTM) systems proposed for this purpose have proven effective but not scalable. A lot of overhead is involved in the commit process, which could be avoided by novel commit algorithms that we propose in this work. In this work we propose improvements to Stanford Scalable TCC model which is one of the frontrunners in HTM systems to reduce commit overheads and increase scalability Stanford’s Scalable-TCC Lazy versioning – changes are made locally. The “master copy” is updates only at the successful transaction commit. Lazy conflict detection – Check for conflicts with other transaction happens when a transaction has finished all its “work” Quick Aborts Commit is slow - the bottleneck TCC Commit Process Centralized TID (Transaction ID) vendor grants ID to each transaction, serializing commit order. Probe write-set directories – check if the write set directory is done serving older committing transactions. Send Skip messages to directories not in write set. Send Mark messages – propagate write updates. Probe read-set directories – conflict detection. If read-check passes, send final Commit message to all directories in the commit set. Make changes permanent. Stanford’s Scalable-TCC Lazy versioning – changes are made locally. The “master copy” is updates only at the successful transaction commit. Lazy conflict detection – Check for conflicts with other transaction happens when a transaction has finished all its “work” Quick Aborts Commit is slow - the bottleneck TCC Commit Process Centralized TID (Transaction ID) vendor grants ID to each transaction, serializing commit order. Probe write-set directories – check if the write set directory is done serving older committing transactions. Send Skip messages to directories not in write set. Send Mark messages – propagate write updates. Probe read-set directories – conflict detection. If read-check passes, send final Commit message to all directories in the commit set. Make changes permanent. Optimizations to SEQ SEQ-PRO (Parallel Readers Optimization) Parallel reads to a directory do not cause conflicts Allows parallel reads from transactions to occupy a directory. SEQ-TS (Timestamp Optimization) Transactions have timestamps – An “older” transaction can steal occupied directories from younger transaction. Directories can be occupied in parallel, improving performance Optimizations to SEQ SEQ-PRO (Parallel Readers Optimization) Parallel reads to a directory do not cause conflicts Allows parallel reads from transactions to occupy a directory. SEQ-TS (Timestamp Optimization) Transactions have timestamps – An “older” transaction can steal occupied directories from younger transaction. Directories can be occupied in parallel, improving performance Results References [1] Scalable, Reliable, Power Efficient Communication for hardware transactional memory, S. Pugsley, M. Awasthi. N. Madan, N. Muralimanohar and R. Balasubramonian, SoC Technical Report, UUCS-08-001, Jan 2008 [2] A Scalable, Non-blocking Approach to Transactional Memory, Chafi et al, HPCA 2007 References [1] Scalable, Reliable, Power Efficient Communication for hardware transactional memory, S. Pugsley, M. Awasthi. N. Madan, N. Muralimanohar and R. Balasubramonian, SoC Technical Report, UUCS-08-001, Jan 2008 [2] A Scalable, Non-blocking Approach to Transactional Memory, Chafi et al, HPCA 2007 Proposed Commit Algorithm (Sequential Commit - SEQ) Each directory has an “Occupied” bit. Committing transaction “occupies” all commit directories sequentially. For already occupied directories, request for occupancy is either buffered of NACKed. When all directories are occupied, the transaction sends updates and commits, probably aborting other transactions. Sequential occupancy order => deadlock-free algorithm Number of network messages is reduced. Algorithm is scalable – No centralized agent involved. Proposed Commit Algorithm (Sequential Commit - SEQ) Each directory has an “Occupied” bit. Committing transaction “occupies” all commit directories sequentially. For already occupied directories, request for occupancy is either buffered of NACKed. When all directories are occupied, the transaction sends updates and commits, probably aborting other transactions. Sequential occupancy order => deadlock-free algorithm Number of network messages is reduced. Algorithm is scalable – No centralized agent involved. Issues in Scalable-TCC Centralized TID vendor is a bottleneck as number of cores grow. Large number of on-chip network messages are exchanged, hence bandwidth and power requirements are large. Number of Skip messages is a function of number of cores in system. Commit delays are a bottleneck, if most transactions are relatively short. Issues in Scalable-TCC Centralized TID vendor is a bottleneck as number of cores grow. Large number of on-chip network messages are exchanged, hence bandwidth and power requirements are large. Number of Skip messages is a function of number of cores in system. Commit delays are a bottleneck, if most transactions are relatively short. Scalable-TCC Background Hardware Transactional Memory (HTM) overview - New paradigm to simplify parallel programming. Instead of lock-unlock, uses transaction Begin and End Can yield better performance and eliminate deadlocks. Programmer can freely encapsulate code sections with transactions and not worry about the impact on performance and correctness. Programmer specifies the code sections they’d like to see execute atomically – the hardware takes care of the rest (provides illusion of atomicity). Usually classified by their choice of data versioning and conflict detection mechanisms. Commit Delays reduced by 7x as compared to Scalable-TCC Number of network messages reduces by upto 48x
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.