Persistency for Synchronization-Free Regions

Slides:



Advertisements
Similar presentations
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
Advertisements

The Case for a SC-preserving Compiler Madan Musuvathi Microsoft Research Dan Marino Todd Millstein UCLA University of Michigan Abhay Singh Satish Narayanasamy.
Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Loose-Ordering Consistency for Persistent Memory
Memory Consistency Arbob Ahmad, Henry DeYoung, Rakesh Iyer /18-740: Recent Research in Architecture October 14, 2009.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.
Silberschatz, Galvin and Gagne  Operating System Concepts Common System Components Process Management Main Memory Management File Management.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.
CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.
Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
CIS/SUSL1 Fundamentals of DBMS S.V. Priyan Head/Department of Computing & Information Systems.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
…and region serializability for all JESSICA OUYANG, PETER CHEN, JASON FLINN & SATISH NARAYANASAMY UNIVERSITY OF MICHIGAN.
ThyNVM Enabling Software-Transparent Crash Consistency In Persistent Memory Systems Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and.
The C++11 Memory Model CDP Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
System Components Operating System Services System Calls.
Free Transactions with Rio Vista Landon Cox April 15, 2016.
Multiprocessors – Locks
Persistent Memory (PM)
CPSC-310 Database Systems
Hathi: Durable Transactions for Memory using Flash
Lecture 20: Consistency Models, TM
Delegated Persist Ordering
Failure-Atomic Slotted Paging for Persistent Memory
Free Transactions with Rio Vista
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Current Generation Hypervisor Type 1 Type 2.
Software Coherence Management on Non-Coherent-Cache Multicores
Speculative Lock Elision
High-performance transactions for persistent memories
Distributed Processors
Memory Consistency Models
Lecture 11: Consistency Models
Memory Consistency Models
Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor
HyperLoop: Group-Based NIC Offloading to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh.
Changing thread semantics
Implementing synchronization
Free Transactions with Rio Vista
Shared Memory Programming
Lecture 22: Consistency Models, TM
Shared Memory Consistency Models: A Tutorial
Chapter 2: Operating-System Structures
Lecture 20: Intro to Transactions & Logging II
Introduction to Microprocessor Programming
Memory Consistency Models
Relaxed Consistency Finale
Lecture: Consistency Models, TM
Chapter 2: Operating-System Structures
Problems with Locks Andrew Whitaker CSE451.
Controlled Interleaving for Transactions
TEE-Perf A Profiler for Trusted Execution Environments
Janus Optimizing Memory and Storage Support for Non-Volatile Memory Systems Sihang Liu Korakit Seemakhupt, Gennady Pekhimenko, Aasheesh Kolli, and Samira.
Testing Persistent Memory Applications
Stream-based Memory Specialization for General Purpose Processors
Presentation transcript:

Persistency for Synchronization-Free Regions Vaibhav Gogte, Stephan Diestelhorst$, William Wang$, Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch PLDI 2018 06/20/2018 $

Promise of persistent memory (PM) Performance * “Optane DC Persistent Memory will be offered in packages of up to 512GB per stick.” Density “… expanding memory per CPU socket to as much as 3TB.” Non-volatility * Source: www.extremetech.com Byte-addressable, load-store interface to storage

Persistent memory system Core-1 Core-2 Core-3 Core-4 L1 $ LLC Recovery DRAM Persistent Memory (PM) Recovery can inspect the data-structures in PM to restore system to a consistent state

Memory persistency models Provide guarantees required for recoverable software Academia [Condit ‘09][Pelley ‘14][Joshi ‘15][Kolli ‘17] … Industry [Intel ‘14][ARM ‘16] Define the program state that recovery observes post failure Key primitives: Ensure failure atomicity for a group of persists Govern ordering constraints on memory accesses to PM

Contributions Persistency model using synchronization-free regions Define precise state of the program post failure Employ existing synchronization primitives in C++ Provide semantics as a part of language implementation Build compiler pass that emits logging code for persistent accesses Propose two implementations: Coupled-SFR and Decoupled-SFR Achieve 65% better performance over state-of-the-art tech.

Outline Design space for language-level persistency models Our proposal: Persistency for SFRs Coupled-SFR implementation Decoupled-SFR implementation Evaluation

Language-level persistency models [Chakrabarti ’14][Kolli ’17] Enables writing portable, recoverable software Extend language memory-model with persistency semantics Persistency model guarantees: Atomic_begin() A = 100; B = 200; Atomic_end() A = 100; L1.rel(); L1.acq(); A = 200; St(A) <p St(B) Failure-atomicity: Which group of stores persist atomically? Ordering: How can programmers order accesses?

Why failure-atomicity? Task: Fill node and add to linked list, safely In-memory data Initial linked-list fillNewNode() Fence updateTailPtr()

Why failure-atomicity? Task: Fill node and add to linked list, safely In-memory data Atomic updateTailPtr() fillNewNode() Failure-atomicity  Persistent memory programming easier

Semantics for failure-atomicity Assures that either all or none of the updates visible post failure Guaranteed by hardware, library or language implementations Design space for semantics Individual persists Outer critical-sections Existing mechanisms provide unsatisfying semantics or suffer high performance overhead

Granularity of failure-atomicity - I L1.lock(); x -= 100; y += 100; L2.lock(); a -= 100; b += 100; L2.unlock(); L1.unlock(); Individual persists [Condit ‘09][Pelley ‘14][Joshi ‘16][Kolli ‘17] Mechanisms ensure atomicity of individual persists Non-sequentially consistent state visible to recovery  Need additional custom logging

Granularity of failure-atomicity - II L1.lock(); x -= 100; y += 100; L2.lock(); a -= 100; b += 100; L2.unlock(); L1.unlock(); Outer critical sections [Chakrabarti ‘14][Boehm ‘16] Guarantees recovery to observe SC state  Easier to build recovery code Require complex dependency tracking between logs  > 2x performance cost Do not generalize to certain sync. constructs  eg. condition variables

Our proposal: Failure-atomic SFRs l1.acq(); x -= 100; y += 100; l2.acq(); a -= 100; b += 100; l2.rel(); l1.rel(); Synchronization free regions (SFR) SFR1 Thread regions delimited by synchronization operations or system calls SFR2 Enable precise post-failure state with low performance overhead

Guarantees by failure-atomic SFRs Thread 1 Thread 2 l1.acq(); x += 100; y -= 100; l1.rel(); Intra-thread guarantees Ensure failure-atomicity of updates within SFR Define precise points for post-failure program state Inter-thread guarantees Order SFRs using synchronizing acq and rel ops Serialize ordered SFRs in PM l1.acq(); x -= 100; y += 100; l1.rel(); SFR 1 SFR 2 Two logging impl.  Coupled-SFR and Decoupled-SFR

Undo-logging for SFRs Failure-atomic SFR1 SFR1 persistUndoLog (L) mutateData (M) commitLog (C) persistData (P) SFR1 L1.acq(); x = 100; L1.rel(); Failure-atomic SFR1 Need to ensure the ordering of steps in undo-logging for SFRs to be failure-atomic

Impl. 1: Coupled-SFR ACQ2 L1.acq(); x = 100; L1.rel(); L1 L2 M1 Thread 1 Thread 2 L1 M1 P1 C1 REL1 SFR1 Thread 1 L2 M2 P2 C2 ACQ2 SFR2 Thread 2 L1.acq(); x = 100; L1.rel(); SFR1 L1.acq(); x = 200; L1.rel(); SFR2 + Persistent state lags execution by at most one SFR  Simpler implementation, latest state at failure - Need to flush updates at the end of each SFR  High performance cost

Require recording order of log creation Impl. 2: Decoupled-SFR Thread 1 Thread 2 Thread 1 Thread 2 L1.acq(); x = 100; L1.rel(); ACQ2 L1 SFR1 SFR1 L2 L1.acq(); x = 200; L1.rel(); M1 SFR2 SFR2 P1 C1 M2 Key idea: Persist updates and commit logs in background P2 C2 REL1 Require recording order of log creation

Log ordering in Decoupled-SFR Init x = 0 Thread 1 Header Thread 2 Header Thread 1 Thread 2 Sequence Table x = 100; L1.rel(); Store X xold = 0 Acq L1 Seq = 1 L1 0  1 L1.acq(); x = 200; Rel L1 Seq = 0 Store X xold = 100 Sequence numbers record inter-thread order of log creation Background threads commit logs using recorded sequence nos. In paper: Racing sync. operations, commit optimizations etc.

Evaluation setup Designed our logging approaches in LLVM v3.6.0 Instruments stores and sync. ops. to emit undo logs Creates log space for managing per-thread undo-logs Launches background threads to flush/commit logs in Decoupled-SFR Workloads: write-intensive micro-benchmarks 12 threads, 10M operations Performed experiments on Intel E5-2683 v3 2GHz, 12 physical cores, 2-way hyper-threading

Performance evaluation [Chakrabarti ’14] 65% Better Decoupled-SFR performs 65% better than state-of-the-art ATLAS design

Performance evaluation [Chakrabarti ’14] Better Coupled-SFR performs better that Decoupled-SFR when fewer stores/SFR

Conclusion Failure-atomic synchronization-free regions Coupled-SFR Persistent state moves from one sync. operation to the next Extends clean SC semantics to post-failure recovery Coupled-SFR Easy to reason about PM state after failure; high performance cost Decoupled-SFR Persistent state lags execution; performs 65% better than ATLAS

Persistency for Synchronization-Free Regions Vaibhav Gogte, Stephan Diestelhorst$, William Wang$, Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch If the surgery proves unnecessary, we’ll revert your architectural state at no charge.* $ *Source: Paul Kocher, ISCA 2018