Safe and Efficient Supervised Memory Systems 1) Out-of-band metadata per data block 2) Monitor, control (supervise) data accesses 3) Run handlers on specific metadata states Jayaram Bobba†, Marc Lupon‡, Mark D. Hill, and David A. Wood Department of Computer Sciences University of Wisconsin-Madison †Intel Corporation ‡Universitat Politècnica de Catalunya † ‡ Work done while at University of Wisconsin-Madison
Why Supervised Memory Systems? HW more powerful SW more complex Productivity Wall Hardware Support to Improve Productivity Empty/full-bits Hardware TM Supervised (Memory) Systems MemTracker,SafeMem,iWatcher Deterministic Shared Memory Hardware-assisted GC Information Flow Tracking 2/15/2011 Wisconsin Multifacet Project
Current/Future Supervised Systems Executive Summary Many supervised memory systems Assume SC, but few systems do SC Moving to TSO (x86 & SPARC) non-trivial Supervised Memory for TSO TSOall: TSO for data & metadata slow TSOdata: TSO for data & metadata tricky Safe Supervision Metadata for X only controls data at X Fast & Simple Formal Foundation Current/Future Supervised Systems 2/15/2011 Wisconsin Multifacet Project
Wisconsin Multifacet Project Outline Introduction Move To TSO non-trivial Case Study: Deterministic Multiprocessor (DMP) Supervised Memory for TSO Safe Supervision Evaluation 2/15/2011 Wisconsin Multifacet Project
A TSO-compliant system Reordering can be incorrect A TSO-compliant system P1 P2 PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 Processor ST 0x10, C r1 r1 0x01 ST A LD B r2 r2 r3 r3 0x10 Store Buffer Block Data A 0x00 B 0x01 C 0x11 Metadata Memory 9/20/2018 Wisconsin Multifacet Project
DMP-ShTab [Devietti et al., ASPLOS 09] Reordering can be incorrect DMP-ShTab [Devietti et al., ASPLOS 09] P1 P2 PC PC LD [X], r1 LD [B], r2 ST 1, [B] LD [Y], r2 ST r2, [Y] ST 2,[A] LD [B], r3 Private Processor r1 r1 0x11 Shared r2 r2 0xff 0x00 T1 r3 r3 0x01 T2 STALL STALL X, Y Block Data A 0x10 B X 0x11 Y 0xff Metadata Owned@T1 Owned@T2 Owned@T2 Shared@T1,T2 Memory 0x00 0x01 Shared@T1,T2 Owned@T1 Owned@T2 9/20/2018 Wisconsin Multifacet Project
Is reordering safe? A Case Study DMP-ShTab on TSO Explore relaxed supervised systems Reordering can be incorrect Is reordering safe? A Case Study DMP-ShTab on TSO P1 P2 PC PC LD [X], r1 LD [B], r2 ST 1, [B] LD [Y], r2 ST r2, [Y] ST 2,[A] LD [B], r3 Private Processor ST 0x10, A r1 r1 0x11 Shared r2 r2 0xff T1 r3 r3 0x00 T2 Store Buffer STALL STALL Case1: LD B does not pass ST A r3 gets 0x01 Block Data A 0x10 B X 0x11 Y 0xff Metadata Owned@T1 Owned@T2 Shared@T1,T2 Memory Case2: LD B passes ST A r3 gets 0x00 0x00 Owned@T2 9/20/2018 Wisconsin Multifacet Project
Wisconsin Multifacet Project Outline Introduction Move To TSO non-trivial Supervised Memory for TSO Define Supervised Memory TSOall: Simple but Slow TSOdata: Fast but tricky Safe Supervision Evaluation 2/15/2011 Wisconsin Multifacet Project
Wisconsin Multifacet Project Supervised Memory for TSO Define Supervised Memory Supervised Memory Each memory location A, data (A.d) metadata (A.m) New operations Supervised Load (sLD A) Supervised Store (sST A) Jump on reading special metadata (Optionally) Hardware exception 9/20/2018 Wisconsin Multifacet Project
Supervised Operations Supervised Memory for TSO Define Supervised Memory Supervised Operations sLD A => Start: atomic{ curm = Val[RA.m] // Read metadata nextm = NEXT(Load, curm) // Check software- // specified FSM If nextm == EXCEPTION then Jump to Handler If (nextm != curm) then WA.m,nextm // Update metadata RA.d // Read data } Handler: … 9/20/2018 Wisconsin Multifacet Project
TSO Axioms [Hangal et al., ISCA 2004] Supervised Memory for TSO TSO Axioms [Hangal et al., ISCA 2004] 9/20/2018 Wisconsin Multifacet Project
TSO Axioms [Hangal et al., ISCA 2004] Supervised Memory for TSO TSO Axioms [Hangal et al., ISCA 2004] Axiom Description Order Total Order on all write accesses Atomicity No intervening accesses for atomic operations Termination All write accesses eventually complete Value Reads return latest value from memory or store buffer Memory Barrier No reordering across a barrier ReadAny Accesses cannot pass outstanding reads WriteWrite Write access cannot pass outstanding writes Reordering Axioms Rd A Rd B Rd A Wr B Wr A Wr B Wr A Rd B Allows store buffers 9/20/2018 Wisconsin Multifacet Project
TSOall: A Consistency Model for Supervised Memory Supervised Memory for TSO TSOall: A Consistency Model for Supervised Memory TSO axioms applied to all accesses—data and metadata + (Simple) Like TSO — (Slow) Prohibits optimizations Thread: sST A sLD B => Store buffers ineffective ->[Rd A.m, Wr A.d, Wr A.m] ->[Rd B.m, Rd B.d] 9/20/2018 Wisconsin Multifacet Project
TSOdata: Fast Yet Simple Supervised Memory for TSO TSOdata: Fast Yet Simple Axiom Description Order Total Order on all write accesses Atomicity No intervening accesses for atomic operations Termination All write accesses eventually complete Value Reads return latest value from memory or store buffer Memory Barrier No reordering across a barrier ReadAny Data accesses cannot pass outstanding data reads WriteWrite Data writes cannot pass outstanding data writes Reordering Axioms ->[Rd A.m, Wr A.d, Wr A.m] Thread: sST B sLDA Store buffers can be used ->[Rd B.m, Rd B.d] 9/20/2018 Wisconsin Multifacet Project
Wisconsin Multifacet Project Outline Introduction Move To TSO non-trivial Supervised Memory for TSO Safe Supervision Evaluation 2/15/2011 Wisconsin Multifacet Project
Safe Supervision Motivation No Reordering, Easy to Reason (TSOall) vs Reorder, Performance (TSOdata) 9/20/2018 Wisconsin Multifacet Project
Blast from the Past [Adve and Hill, ISCA1990] Safe Supervision Blast from the Past [Adve and Hill, ISCA1990] No Reordering, Easy to Reason (SC) vs Reorder, Performance (RC) Observation: Simple programs rely only on certain SC orders Ignore non-essential orders. Still appears as SC Challenge: Simple? Non-essential orders? Solution: Data-race-freedom For data-race-free programs, RC = SC 9/20/2018 Wisconsin Multifacet Project
Safe Supervision Motivation No Reordering, Easy to Reason (TSOall) vs Reorder, Performance (TSOdata) Observation: Simple supervised programs rely only on certain orders Ignore non-essential orders. Still appears as TSOall Challenge: Simple? Non-essential orders? Solution: Safe Supervision For safely-supervised programs, TSOdata = TSOall 9/20/2018 Wisconsin Multifacet Project
Wisconsin Multifacet Project Safe Supervision Safe Supervision A location’s metadata is only used to control access to that location’s data Most uses of supervision are safely supervised. E.g., Heap Checker: Initialized/Uninitialized values Transactional Memory: Conflict Detection information DMP is NOT safely-supervised Initially, A.mdata = Empty, B.data = 0 Thread 1: B.data = 1 A.mdata = Full Thread 2: While (A.mdata == Empty); Read B.data 9/20/2018 Wisconsin Multifacet Project
Wisconsin Multifacet Project Outline Introduction Move To TSO non-trivial Supervised Memory for TSO Safe Supervision Evaluation Is reordering useful? 2/15/2011 Wisconsin Multifacet Project
Wisconsin Multifacet Project Reordering is useful Supervised Systems TokenTM [bobba et al., ISCA2008] Transactional Memory Metadata for tracking read/write-sets HARD [zhou et al., HPCA2007] Race Detection Metadata for tracking sharing state and locksets Both safely-supervised 2/15/2011 Wisconsin Multifacet Project
Wisconsin Multifacet Project Reordering is useful Evaluation Setup Systems TokenTM on in-order TokenTMall on TSOall, TokenTMdata on TSOdata HARD on OOO superscalar HARDall on TSOall, HARDdata on TSOdata Simulation built on Multifacet GEMS Workloads TokenTM: STAMP HARD: Wisconsin Commercial Workload Suite 2/15/2011 Wisconsin Multifacet Project
Wisconsin Multifacet Project Reordering is useful Results TokenTM Speedups: 3% in Kmeans to 22% in Labyrinth 2/15/2011 Wisconsin Multifacet Project
Wisconsin Multifacet Project Reordering is useful Results HARD Speedups: 3% in JBB to 5% in Apache 2/15/2011 Wisconsin Multifacet Project
Wisconsin Multifacet Project In the paper… Formal models Formal Definition of Safe Supervision Proofs (in thesis) http://www.cs.wisc.edu/multifacet/theses/jayaram_bobba_phd.pdf OpenSPARC case study How to handle reordering issues? Metadata overhead 2/15/2011 Wisconsin Multifacet Project
Current/Future Supervised Systems Executive Summary Many supervised memory systems Assume SC, but few systems do SC Moving to TSO (x86 & SPARC) non-trivial Supervised Memory for TSO TSOall: TSO for data & metadata slow TSOdata: TSO for data & metadata tricky Safe Supervision Metadata for X only controls data at X Fast & Simple Formal Foundation Current/Future Supervised Systems 2/15/2011 Wisconsin Multifacet Project
Wisconsin Multifacet Project 2/15/2011 Wisconsin Multifacet Project
Deterministic Shared Memory (DMP) [Devietti et al., ASPLOS 2009] Explore relaxed supervised systems Deterministic Shared Memory (DMP) [Devietti et al., ASPLOS 2009] “depending upon the consistency model of the underlying hardware, threads must perform a memory fence at the edge of a quantum” Insert a fence after the last operation in the quantum Insert a fence before the first shared operation in the quantum I3: Reordered metabit-reads 9/20/2018 Wisconsin Multifacet Project Illustration
Is reordering trivial? Empty/full-bits Explore relaxed supervised systems Is reordering trivial? Empty/full-bits PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 LD ST Exception Processor ST 0x10, C r1 r1 0x01 Empty Full r2 r2 r3 r3 LD Store Buffer I2: NO LOAD BYPASS EXCEPTION LD/ST Block Data A 0x00 B 0x01 C 0x11 Metadata Full None Empty None Memory I3: LATE EXCEPTIONS 9/20/2018 Wisconsin Multifacet Project
Wisconsin Multifacet Project TSOdata on OpenSPARC T2 Goal: Explore low-level issues on a real design Late Exceptions with deferred handlers Dump store buffer entries on exception Enhance store buffer to carry Virtual Address (VA) ~200 cycles to read out 4 entries Disable store buffer bypassing for supervised loads Low space overhead for adding metabits (~4%) 9/20/2018 Wisconsin Multifacet Project
Existing proposals assume SC Explore relaxed memory systems Existing proposals assume SC Assume SC or don’t deal with multiprocessors Proposal Base Architecture Implementation WWT MIPS SC Tapeworm LogTM SPARC OneTM Informing Memory MIPS, Alpha SafeMem x86 MemTracker DMP 9/20/2018 Wisconsin Multifacet Project
Non-TSOall Executions 9/20/2018 Wisconsin Multifacet Project
Wisconsin Multifacet Project TSOdata is Complex Empty/full-bits sST Initial State: A.d = 0, A.m = None B.d = 0, B.m = Empty Empty Full T0: dST 1, A sLD B T1: sST B, 1 dLD A sLD sST Can dLD A return 0? Exception 9/20/2018 Wisconsin Multifacet Project
Wisconsin Multifacet Project Safe Supervision 9/20/2018 Wisconsin Multifacet Project
Wisconsin Multifacet Project 2/15/2011 Wisconsin Multifacet Project