LogTM-SE: Decoupling Hardware Transactional Memory from Caches

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.
Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
1 Lecture 18: Transactional Memories II Papers: LogTM: Log-Based Transactional Memory, HPCA’06, Wisconsin LogTM-SE: Decoupling Hardware Transactional Memory.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Monitoring Data Structures Using Hardware Transactional Memory Shakeel Butt 1, Vinod Ganapathy 1, Arati Baliga 2 and Mihai Christodorescu 3 1 Rutgers University,
© 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison Supporting Nested Transactional Memory in LogTM Michelle J. Moravan, Jayaram Bobba, Kevin E. Moore,
Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Rich Transactions on Reasonable Hardware J. Eliot B. Moss Univ. of Massachusetts,
EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood Presented by Colleen Lewis.
Interactions Between Compression and Prefetching in Chip Multiprocessors Alaa R. Alameldeen* David A. Wood Intel CorporationUniversity of Wisconsin-Madison.
Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris,
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Rerun: Exploiting Episodes for Lightweight Memory Race Recording Derek R. Hower and Mark D. Hill Computer systems complex – more so with multicore What.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007.
Implementing Signatures for Transactional Memory Daniel Sanchez, Luke Yen, Mark Hill, Karu Sankaralingam University of Wisconsin-Madison.
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
StealthTest: Low Overhead Online Software Testing Using Transactional Memory Jayaram Bobba, Weiwei Xiong*, Luke Yen †, Mark D. Hill, and David A. Wood.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transnational Memory Qi Zhu CSE 340, Spring 2008 University of Connecticut Paper.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
© 2006 Mulitfacet ProjectUniversity of Wisconsin-Madison LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.
Translation Lookaside Buffer
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
ECE232: Hardware Organization and Design
Speculative Lock Elision
18-447: Computer Architecture Lecture 23: Caches
ASR: Adaptive Selective Replication for CMP Caches
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
LogSI-HTM: Log Based Snapshot Isolation in
Multilevel Memories (Improving performance using alittle “cash”)
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Hardware Transactional Memory
Log-Based Transactional Memory
Energy-Efficient Address Translation
What we need to be able to count to tune programs
Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.
Lecture 19: Transactional Memories III
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Lecture 2: Snooping-Based Coherence
Log-based Transactional Memory
Arrvindh Shriraman, Michael F. Spear, Hemayet Hossain, Virendra J
Lecture 6: Transactions
Transactional Memory An Overview of Hardware Alternatives
Improving Multiple-CMP Systems with Token Coherence
Hybrid Transactional Memory
TokenTM: Token-Based Hardware Transactional Memory
Performance Pathologies in Hardware Transactional Memory
Performance Pathologies in Hardware Transactional Memory
Lecture 23: Transactional Memory
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Border Control: Sandboxing Accelerators
Presentation transcript:

LogTM-SE: Decoupling Hardware Transactional Memory from Caches Luke Yen, Jayaram Bobba, Michael R. Marty, Kevin E. Moore, Haris Volos, Mark D. Hill, Michael M. Swift, and David A. Wood Multifacet Project (www.cs.wisc.edu/multifacet) Computer Sciences Dept., Univ. of Wisconsin-Madison © 2007 Multifacet Project University of Wisconsin-Madison

Executive Summary Hardware Transactional Memory (HTM) Fast HW handles old/new versions (e.g., write buffer) HW handles conflict detection (R/W bits & coherence) But Closely Coupled to L1 cache On critical paths & hard for SW to save/restore Our Approach: Decoupled, Simple HW, SW control LogTM Signature Edition (LogTM-SE) HW: LogTM’s Log + Signatures (from Illinois Bulk) SW: Unbounded nesting, thread switching, & paging 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Outline Motivation LogTM Signature Edition (LogTM-SE) Hardware How HTMs accelerate TM HTM Issues Our Approach Logs for version management Signatures for conflict detection LogTM Signature Edition (LogTM-SE) Hardware LogTM-SE in Operation Conclusions and future work 2/24/2019 LogTM-SE Wisconsin Multifacet Project

How HTMs Accelerate TM Version Management Conflict Detection Save old values (for abort) & new values (for commit) Write buffer, cache incoherence, overflow structures Conflict Detection Record read/write sets & detect overlaps Read/Write (R/W) bits on cache blocks & coherence 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Some HTM Issues R/W bits in precious L1 cache design Replicate R/W bits for SMT? Replicate again for (bounded) nesting? Save/restore R/W for thread switch? Modify for paging (virtual page moves)? 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Our Approach Decoupled: Decouple HTM state from L1 caches Simple HW: Keep HW state simple & SW accessible SW Control: Have SW manage rare, complex events (Apply classic “systems” principles to HTMs) 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Decoupling R / W bits from Cache BEFORE: AFTER: Registers R W R W Write Read SMT Thread Context R W Tag Data Data Caches 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Version Management Save old values (for abort) & new values (for commit) Use LogTM’s Log Before writing new values into memory, HW writes old values (and their virtual addresses) into log Allocated per-thread in virtual memory (like pthread’s stacks) Log exposed to SW & not tied to a processor Why? Decoupled, Simple HW, SW Control 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Version Management Processor Hardware Registers Register Checkpoint LogFrame TMcount LogPtr SMT Thread Context Tag Data Data Caches 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Conflict Detection Record read/write sets & detect overlaps Adapt Signatures from Bulk [Ceze et al., ISCA 2006] Over-approximate read/write sets in per-proc. Bloom Filters Check signatures on coherence events (unlike Bulk) Replicate for SMT Exposed to SW for nesting, thread switching, etc. Why? Decoupled, Simple HW, SW Control 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Signature Operation Program: xbegin LD A ST B LD C LD D ST C … External ST E External ST F B C A D FALSE POSITIVE: CONFLICT! Hash Function(s) NO CONFLICT ALIAS R 00000000 00100100 00100100 00100100 00000100 00100100 W 00100010 00000010 00100010 00100010 00000000 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Outline Motivation LogTM Signature Edition (LogTM-SE) Hardware Single-CMP system Processor hardware Experimental Results LogTM-SE in Operation Conclusions and future work 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Single-CMP System L1 $ Core1 L1$ Core2 L1$ Core14 L1$ Core15 L1$ … Interconnect L2 $ DRAM 2/24/2019 LogTM-SE Wisconsin Multifacet Project

LogTM-SE Processor Hardware LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 LogTM-SE Processor Hardware Segmented log, like LogTM Track R / W sets with R / W signatures Over-approximate R / W sets Tracks physical addresses Summary signature used for virtualization Conflict detection by coherence protocol Check signatures on every memory access for SMT Registers Register Checkpoint LogFrame TMcount Read LogPtr Write SummaryRead SummaryWrite SMT Thread Context Tag Data NO TM STATE Data Caches 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

Experimental Methodology Infrastructure Virtutech Simics full-system simulation Wisconsin GEMS timing modules System 32 transactional threads (16 cores x 2 SMT threads/core) 32kB 4-way L1 I and D, 64-byte blocks, 1cycle latency 8MB 8-way unified L2, 34 cycle latency L2 directory for coherence, maintains full sharer bit vector Workloads Radiosity, Raytrace, Mp3d, Cholesky Berkeley DB 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Lock Results 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Perfect Signature Results Perfect signatures similar or better than Locks 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Realistic Signature Results Realistic Signatures similar to Perfect Signatures and Locks For our workloads, false positives are not a problem 2/24/2019 LogTM-SE Wisconsin Multifacet Project

What about scalability? Bigger system Bigger transactions False positives are a function of: Transaction size Transactional duty cycle Number of concurrent transactional threads Filtering due to on-chip directory protocol Signatures gracefully degrade to serialization 2/24/2019 LogTM-SE Wisconsin Multifacet Project

HTM Issues (Solutions) R/W bits in precious L1 cache design R/W signatures: out of L1 cache & software-accessible Replicate R/W bits for SMT? Easier to replicate signatures for SMT, but false positives Replicate again for (bounded) nesting? Save/restore R/W for thread switch? Modify for paging (virtual page moves)? 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Outline Motivation LogTM Signature Edition (LogTM-SE) Hardware LogTM-SE in Operation Unbounded Nested Transactions Thread Switching (Paging) Conclusions and future work 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Unbounded Nesting Support LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Unbounded Nesting Support Why? Composability: libraries Software Constructs: Retry, OrElse [Harris, PPoPP ‘05] What? Signatures for each nesting level How? One R / W signature set per SMT thread Save / Restore signatures using Transaction Log 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Nested Begin Program Processor State Transaction Log xbegin LD … ST … 00000000 01001000 01001000 R 01010010 W 00000000 01010010 Xact header Undo entry Undo entry TMCount 1 Undo entry Log Frame Xact header Log Ptr 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

Nested Begin xbegin LD … ST … Program Processor State Transaction Log LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Nested Begin Program Processor State Transaction Log xbegin LD … ST … 01001000 R W 01010010 Xact header Undo entry Undo entry TMCount 2 Undo entry Log Frame Xact header 01001000 01010010 Log Ptr 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project 24

Partial Abort xbegin LD … ST … Program Processor State Transaction Log LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Partial Abort Program Processor State Transaction Log xbegin LD … ST … ABORT! 01001000 01001001 R W 01110110 01010010 Xact header Undo entry Undo entry TMCount 2 1 Undo entry Log Frame Xact header 01001000 01010010 Log Ptr Undo entry Undo entry 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project 25

Nested Commit xbegin LD … ST … xend Program Processor State LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Nested Commit Program Processor State Transaction Log xbegin LD … ST … xend 01001001 01001000 R W 01110110 01010010 Xact header Undo entry Undo entry TMCount 1 2 Undo entry Log Frame Xact header 01001000 01010010 Log Ptr Undo entry Undo entry 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project 26

Unbounded Nesting Support Summary LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Unbounded Nesting Support Summary Closed nesting: Begin: save signatures Abort: restore signatures Commit: No signature action Open nesting: Commit: restore signatures 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

Thread Switching Support LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Thread Switching Support Why? Support long-running transactions What? Conflict Detection for descheduled transactions How? Summary Read / Write signatures: If thread t of process P is scheduled to use an active signature, the corresponding summary signature holds the union of the saved signatures from all descheduled threads from process P. Updated using TLB-shootdown-like mechanism 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

Handling Thread Switching LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Handling Thread Switching Summary 00000000 W R Summary 00000000 W R Summary 00000000 W R Summary 00000000 W R OS T2 T3 T1 W 00000000 Summary R 00000000 W 01001000 W 0100000 W 00000000 W 0100000 R 01010010 R 01000010 R 00000000 R 01010010 P1 P2 P3 P4 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

Handling Thread Switching LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Handling Thread Switching W 01001000 00000000 OS Summary R 01010010 00000000 Deschedule T2 T3 T1 Summary 00000000 W R Summary 00000000 W R Summary 00000000 W R W 00000000 Summary R 00000000 W 01001000 W 0100000 W 00000000 W 0100000 01001000 R 01010010 R 01000010 R 00000000 R 01010010 01010010 P1 P2 P3 P4 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

Handling Thread Switching LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Handling Thread Switching W 01001000 Summary 01001000 01010010 W R Summary 01001000 01010010 W R OS Summary R 01010010 Deschedule T2 T3 T1 Summary 00000000 W R Summary 00000000 W R Summary 00000000 W R W 00000000 Summary R 00000000 W 01001000 W 0100000 W 00000000 W 0100000 R 01010010 R 01000010 R 00000000 R 01010010 P1 P2 P3 P4 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

Handling Thread Switching LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Handling Thread Switching W 01001000 OS Summary R 01010010 T1 T2 T3 W Summary 01001000 01010010 W R Summary 01001000 01010010 W R W 00000000 00000000 Summary R Summary R 00000000 00000000 W 00000000 W 0100000 W 00000000 W 0100000 R 00000000 R 01000010 R 00000000 R 01010010 P1 P2 P3 P4 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

Thread Switching Support Summary LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Thread Switching Support Summary Summary Read / Write signatures Summarizes descheduled threads with active transactions One OS structure per process Check summary signature on every memory access Updated on transaction deschedule Similar to TLB shootdown Coherence 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

Paging Support Summary LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Paging Support Summary Problem: Changing page frames Need to maintain isolation on transactional blocks Solution: On Page-Out: Save Virtual -> Physical mapping On Page-In: If different page frame, update signatures with physical address of transactional blocks in new page frame. 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

HTM Issues (Solutions) R/W signatures: out of L1 & software-accessible Replicate signatures for SMT, but false positives Replicate again for (bounded) nesting? Signatures saved/restored on log Save/restore R/W for thread switch? Signatures saved & distributed in summary signatures Modify for paging (virtual page moves)? (Physical) signatures updated on page-in 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Future Work Explore scalability of signatures Modify OS for LogTM-SE Bigger system Bigger transactions Modify OS for LogTM-SE Optimize OS-support for Reduced Overheads 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Executive Summary Hardware Transactional Memory (HTM) Fast HW handles old/new versions (e.g., write buffer) HW handles conflict detection (R/W bits & coherence) But Closely Coupled to L1 cache On critical paths & hard for SW to save/restore LogTM Signature Edition (LogTM-SE) HW: LogTM’s Log + Signatures (from Illinois Bulk) SW: Unbounded nesting, thread switching, & paging Our Approach: Decoupled, Simple HW, SW control 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Google “Wisconsin GEMS” Detailed Processor Model Opal Simics Microbenchmarks Random Tester Deterministic Contended locks Trace flie Works w/ Simics (free to academics)  commercial OS/apps SPARC out-of-order, x86 in-order, CMPs, SMPs, & LogTM GPL release of GEMS used in four HPCA 2007 papers 2/24/2019 LogTM-SE Wisconsin Multifacet Project

GEMS News Version 1.4 available Friday: LogTM bugfixes MESI Single-CMP protocol Fixes for GCC 4.X Compilation See www.cs.wisc.edu/gems GEMS 2.0: A future major LogTM release will include: Single-CMP TM protocol Signature support Software abort handler 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Backup 2/24/2019 LogTM-SE Wisconsin Multifacet Project

HTM Virtualization Mechanisms Before Virtualization After Virtualization $Miss Commit Abort $Eviction Paging Thread Switch UTM - H HC VTM S SC SWV UnrestrictedTM A B AS XTM XTM-g ASC SCV PTM-Copy PTM-Select LogTM-SE Shaded = virtualization event - = handled in simple HW H = complex hardware S = handled in software A = abort transaction C = copy values W = walk cache V = validate read set B = block other transactions 2/24/2019 LogTM-SE Wisconsin Multifacet Project

LogTM Overview [HPCA ’06] LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 LogTM Overview [HPCA ’06] Version Management Keep old values for abort AND new values for commit LogTM: Eager: record old values “elsewhere”; update “in place” Other HTMs: Lazy: update “elsewhere”; keep old values “in place” Conflict Detection Find read-write, write-read or write-write conflicts among concurrent transactions Eager: detect conflict on every read/write Lazy: detect conflict at end (commit/abort)  Fast Commit  Allows Stalling 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project

Benchmark Details Benchmark Input Unit of Work Units Measured BerkeleyDB 1000 words 1 database read 128 Cholesky Tk14.O Factorization 1 Radiosity batch 1 task 64 Raytrace teapot Whole parallel phase Mp3d 128 molecules 1 step 1024 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Signature Implementations Results assume 2048-bit signatures, and operates on physical addresses 2/24/2019 LogTM-SE Wisconsin Multifacet Project

Paging Support Summary LogTM-SE: Log-based Transactional Memory: Signature Edition 10/25/06 Paging Support Summary At Page Out, Remember VP->PP At Page In, if (VP->PP’) for each thread t in process if t is in transaction: Update signatures of t 2/24/2019 LogTM-SE Wisconsin Multifacet Project © 2007 Mulitfacet Project