Safe and Efficient Supervised Memory Systems

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.
Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
1 Lecture 18: Transactional Memories II Papers: LogTM: Log-Based Transactional Memory, HPCA’06, Wisconsin LogTM-SE: Decoupling Hardware Transactional Memory.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
Memory Consistency Arbob Ahmad, Henry DeYoung, Rakesh Iyer /18-740: Recent Research in Architecture October 14, 2009.
Calvin: Deterministic or Not? Free Will to Choose Derek R. Hower, Polina Dudnik, Mark D. Hill, David A. Wood.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
Lecture 13: Consistency Models
Multiscalar processors
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood Presented by Colleen Lewis.
RCDC SLIDES README Font Issues – To ensure that the RCDC logo appears correctly on all computers, it is represented with images in this presentation. This.
Evaluation of Memory Consistency Models in Titanium.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06) Min Xu Rastislav BodikMark D. Hill Shimin Chen LBA Reading Group Presentation.
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo.
StealthTest: Low Overhead Online Software Testing Using Transactional Memory Jayaram Bobba, Weiwei Xiong*, Luke Yen †, Mark D. Hill, and David A. Wood.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
CISC 879 : Advanced Parallel Programming Rahul Deore Dept. of Computer & Information Sciences University of Delaware Exploring Memory Consistency for Massively-Threaded.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.
An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Explicitly Parallel Programming with Shared-Memory is Insane: At Least Make it Deterministic! Joe Devietti, Brandon Lucia, Luis Ceze and Mark Oskin University.
Translation Lookaside Buffer
Chapter Six.
Multiscalar Processors
Lecture 11: Consistency Models
Real-time Software Design
Jayaram Bobba Dissertation Defense 1/14/2010 Overview:
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Lecture 14: Reducing Cache Misses
Hardware Multithreading
Meltdown CSE 351 Winter 2018 Instructor: Mark Wyse
ECE/CS 757: Advanced Computer Architecture II
Shared Memory Consistency Models: A Tutorial
Lecture 12: TM, Consistency Models
Chapter Six.
Improving Multiple-CMP Systems with Token Coherence
Chapter Six.
Translation Lookaside Buffer
Mark D. Hill Multifacet Project ( Computer Sciences Department
Lecture 10: Consistency Models
LogTM-SE: Decoupling Hardware Transactional Memory from Caches
Relaxed Consistency Part 2
Why we have Counterintuitive Memory Models
Lecture 11: Relaxed Consistency Models
Problems with Locks Andrew Whitaker CSE451.
COMP755 Advanced Operating Systems
DMP: Deterministic Shared Memory Multiprocessing
Lecture 11: Consistency Models
Presentation transcript:

Safe and Efficient Supervised Memory Systems 1) Out-of-band metadata per data block 2) Monitor, control (supervise) data accesses 3) Run handlers on specific metadata states Jayaram Bobba†, Marc Lupon‡, Mark D. Hill, and David A. Wood Department of Computer Sciences University of Wisconsin-Madison †Intel Corporation ‡Universitat Politècnica de Catalunya † ‡ Work done while at University of Wisconsin-Madison

Why Supervised Memory Systems? HW more powerful SW more complex Productivity Wall Hardware Support to Improve Productivity Empty/full-bits Hardware TM Supervised (Memory) Systems MemTracker,SafeMem,iWatcher Deterministic Shared Memory Hardware-assisted GC Information Flow Tracking 2/15/2011 Wisconsin Multifacet Project

Current/Future Supervised Systems Executive Summary Many supervised memory systems Assume SC, but few systems do SC Moving to TSO (x86 & SPARC) non-trivial Supervised Memory for TSO TSOall: TSO for data & metadata slow TSOdata: TSO for data & metadata tricky Safe Supervision Metadata for X only controls data at X Fast & Simple Formal Foundation Current/Future Supervised Systems 2/15/2011 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Introduction Move To TSO non-trivial Case Study: Deterministic Multiprocessor (DMP) Supervised Memory for TSO Safe Supervision Evaluation 2/15/2011 Wisconsin Multifacet Project

A TSO-compliant system Reordering can be incorrect A TSO-compliant system P1 P2 PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 Processor ST 0x10, C r1 r1 0x01 ST A LD B r2 r2 r3 r3 0x10 Store Buffer Block Data A 0x00 B 0x01 C 0x11 Metadata Memory 9/20/2018 Wisconsin Multifacet Project

DMP-ShTab [Devietti et al., ASPLOS 09] Reordering can be incorrect DMP-ShTab [Devietti et al., ASPLOS 09] P1 P2 PC PC LD [X], r1 LD [B], r2 ST 1, [B] LD [Y], r2 ST r2, [Y] ST 2,[A] LD [B], r3 Private Processor r1 r1 0x11 Shared r2 r2 0xff 0x00 T1 r3 r3 0x01 T2 STALL STALL X, Y Block Data A 0x10 B X 0x11 Y 0xff Metadata Owned@T1 Owned@T2 Owned@T2 Shared@T1,T2 Memory 0x00 0x01 Shared@T1,T2 Owned@T1 Owned@T2 9/20/2018 Wisconsin Multifacet Project

Is reordering safe? A Case Study DMP-ShTab on TSO Explore relaxed supervised systems Reordering can be incorrect Is reordering safe? A Case Study DMP-ShTab on TSO P1 P2 PC PC LD [X], r1 LD [B], r2 ST 1, [B] LD [Y], r2 ST r2, [Y] ST 2,[A] LD [B], r3 Private Processor ST 0x10, A r1 r1 0x11 Shared r2 r2 0xff T1 r3 r3 0x00 T2 Store Buffer STALL STALL Case1: LD B does not pass ST A r3 gets 0x01 Block Data A 0x10 B X 0x11 Y 0xff Metadata Owned@T1 Owned@T2 Shared@T1,T2 Memory Case2: LD B passes ST A r3 gets 0x00 0x00 Owned@T2 9/20/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Introduction Move To TSO non-trivial Supervised Memory for TSO Define Supervised Memory TSOall: Simple but Slow TSOdata: Fast but tricky Safe Supervision Evaluation 2/15/2011 Wisconsin Multifacet Project

Wisconsin Multifacet Project Supervised Memory for TSO Define Supervised Memory Supervised Memory Each memory location A, data (A.d) metadata (A.m) New operations Supervised Load (sLD A) Supervised Store (sST A) Jump on reading special metadata (Optionally) Hardware exception 9/20/2018 Wisconsin Multifacet Project

Supervised Operations Supervised Memory for TSO Define Supervised Memory Supervised Operations sLD A => Start: atomic{ curm = Val[RA.m] // Read metadata nextm = NEXT(Load, curm) // Check software- // specified FSM If nextm == EXCEPTION then Jump to Handler If (nextm != curm) then WA.m,nextm // Update metadata RA.d // Read data } Handler: … 9/20/2018 Wisconsin Multifacet Project

TSO Axioms [Hangal et al., ISCA 2004] Supervised Memory for TSO TSO Axioms [Hangal et al., ISCA 2004] 9/20/2018 Wisconsin Multifacet Project

TSO Axioms [Hangal et al., ISCA 2004] Supervised Memory for TSO TSO Axioms [Hangal et al., ISCA 2004] Axiom Description Order Total Order on all write accesses Atomicity No intervening accesses for atomic operations Termination All write accesses eventually complete Value Reads return latest value from memory or store buffer Memory Barrier No reordering across a barrier ReadAny Accesses cannot pass outstanding reads WriteWrite Write access cannot pass outstanding writes Reordering Axioms Rd A Rd B Rd A Wr B Wr A Wr B Wr A Rd B Allows store buffers 9/20/2018 Wisconsin Multifacet Project

TSOall: A Consistency Model for Supervised Memory Supervised Memory for TSO TSOall: A Consistency Model for Supervised Memory TSO axioms applied to all accesses—data and metadata + (Simple) Like TSO — (Slow) Prohibits optimizations Thread: sST A sLD B => Store buffers ineffective ->[Rd A.m, Wr A.d, Wr A.m] ->[Rd B.m, Rd B.d] 9/20/2018 Wisconsin Multifacet Project

TSOdata: Fast Yet Simple Supervised Memory for TSO TSOdata: Fast Yet Simple Axiom Description Order Total Order on all write accesses Atomicity No intervening accesses for atomic operations Termination All write accesses eventually complete Value Reads return latest value from memory or store buffer Memory Barrier No reordering across a barrier ReadAny Data accesses cannot pass outstanding data reads WriteWrite Data writes cannot pass outstanding data writes Reordering Axioms ->[Rd A.m, Wr A.d, Wr A.m] Thread: sST B sLDA Store buffers can be used ->[Rd B.m, Rd B.d] 9/20/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Introduction Move To TSO non-trivial Supervised Memory for TSO Safe Supervision Evaluation 2/15/2011 Wisconsin Multifacet Project

Safe Supervision Motivation No Reordering, Easy to Reason (TSOall) vs Reorder, Performance (TSOdata) 9/20/2018 Wisconsin Multifacet Project

Blast from the Past [Adve and Hill, ISCA1990] Safe Supervision Blast from the Past [Adve and Hill, ISCA1990] No Reordering, Easy to Reason (SC) vs Reorder, Performance (RC) Observation: Simple programs rely only on certain SC orders Ignore non-essential orders. Still appears as SC Challenge: Simple? Non-essential orders? Solution: Data-race-freedom For data-race-free programs, RC = SC 9/20/2018 Wisconsin Multifacet Project

Safe Supervision Motivation No Reordering, Easy to Reason (TSOall) vs Reorder, Performance (TSOdata) Observation: Simple supervised programs rely only on certain orders Ignore non-essential orders. Still appears as TSOall Challenge: Simple? Non-essential orders? Solution: Safe Supervision For safely-supervised programs, TSOdata = TSOall 9/20/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Safe Supervision Safe Supervision A location’s metadata is only used to control access to that location’s data Most uses of supervision are safely supervised. E.g., Heap Checker: Initialized/Uninitialized values Transactional Memory: Conflict Detection information DMP is NOT safely-supervised Initially, A.mdata = Empty, B.data = 0 Thread 1: B.data = 1 A.mdata = Full Thread 2: While (A.mdata == Empty); Read B.data 9/20/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Introduction Move To TSO non-trivial Supervised Memory for TSO Safe Supervision Evaluation Is reordering useful? 2/15/2011 Wisconsin Multifacet Project

Wisconsin Multifacet Project Reordering is useful Supervised Systems TokenTM [bobba et al., ISCA2008] Transactional Memory Metadata for tracking read/write-sets HARD [zhou et al., HPCA2007] Race Detection Metadata for tracking sharing state and locksets Both safely-supervised 2/15/2011 Wisconsin Multifacet Project

Wisconsin Multifacet Project Reordering is useful Evaluation Setup Systems TokenTM on in-order TokenTMall on TSOall, TokenTMdata on TSOdata HARD on OOO superscalar HARDall on TSOall, HARDdata on TSOdata Simulation built on Multifacet GEMS Workloads TokenTM: STAMP HARD: Wisconsin Commercial Workload Suite 2/15/2011 Wisconsin Multifacet Project

Wisconsin Multifacet Project Reordering is useful Results TokenTM Speedups: 3% in Kmeans to 22% in Labyrinth 2/15/2011 Wisconsin Multifacet Project

Wisconsin Multifacet Project Reordering is useful Results HARD Speedups: 3% in JBB to 5% in Apache 2/15/2011 Wisconsin Multifacet Project

Wisconsin Multifacet Project In the paper… Formal models Formal Definition of Safe Supervision Proofs (in thesis) http://www.cs.wisc.edu/multifacet/theses/jayaram_bobba_phd.pdf OpenSPARC case study How to handle reordering issues? Metadata overhead 2/15/2011 Wisconsin Multifacet Project

Current/Future Supervised Systems Executive Summary Many supervised memory systems Assume SC, but few systems do SC Moving to TSO (x86 & SPARC) non-trivial Supervised Memory for TSO TSOall: TSO for data & metadata slow TSOdata: TSO for data & metadata tricky Safe Supervision Metadata for X only controls data at X Fast & Simple Formal Foundation Current/Future Supervised Systems 2/15/2011 Wisconsin Multifacet Project

Wisconsin Multifacet Project 2/15/2011 Wisconsin Multifacet Project

Deterministic Shared Memory (DMP) [Devietti et al., ASPLOS 2009] Explore relaxed supervised systems Deterministic Shared Memory (DMP) [Devietti et al., ASPLOS 2009] “depending upon the consistency model of the underlying hardware, threads must perform a memory fence at the edge of a quantum” Insert a fence after the last operation in the quantum Insert a fence before the first shared operation in the quantum I3: Reordered metabit-reads 9/20/2018 Wisconsin Multifacet Project Illustration

Is reordering trivial? Empty/full-bits Explore relaxed supervised systems Is reordering trivial? Empty/full-bits PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 LD ST Exception Processor ST 0x10, C r1 r1 0x01 Empty Full r2 r2 r3 r3 LD Store Buffer I2: NO LOAD BYPASS EXCEPTION LD/ST Block Data A 0x00 B 0x01 C 0x11 Metadata Full None Empty None Memory I3: LATE EXCEPTIONS 9/20/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project TSOdata on OpenSPARC T2 Goal: Explore low-level issues on a real design Late Exceptions with deferred handlers Dump store buffer entries on exception Enhance store buffer to carry Virtual Address (VA) ~200 cycles to read out 4 entries Disable store buffer bypassing for supervised loads Low space overhead for adding metabits (~4%) 9/20/2018 Wisconsin Multifacet Project

Existing proposals assume SC Explore relaxed memory systems Existing proposals assume SC Assume SC or don’t deal with multiprocessors Proposal Base Architecture Implementation WWT MIPS SC Tapeworm LogTM SPARC OneTM Informing Memory MIPS, Alpha SafeMem x86 MemTracker DMP 9/20/2018 Wisconsin Multifacet Project

Non-TSOall Executions 9/20/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project TSOdata is Complex Empty/full-bits sST Initial State: A.d = 0, A.m = None B.d = 0, B.m = Empty Empty Full T0: dST 1, A sLD B T1: sST B, 1 dLD A sLD sST Can dLD A return 0? Exception 9/20/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Safe Supervision 9/20/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project 2/15/2011 Wisconsin Multifacet Project