High-performance transactions for persistent memories

Slides:

Advertisements

Similar presentations

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )

Advertisements

Crash Recovery John Ortiz. Lecture 22Crash Recovery2 Review: The ACID properties  Atomicity: All actions in the transaction happen, or none happens 

IDA / ADIT Lecture 10: Database recovery Jose M. Peña

Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.

Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

Loose-Ordering Consistency for Persistent Memory

Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.

Scalable Load and Store Processing in Latency Tolerant Processors Amit Gandhi 1,2 Haitham Akkary 1 Ravi Rajwar 1 Srikanth T. Srinivasan 1 Konrad Lai 1.

1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,

Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.

Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.

By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.

Kevin Lim*, Jichuan Chang +, Trevor Mudge*, Parthasarathy Ranganathan +, Steven K. Reinhardt* †, Thomas F. Wenisch* June 23, 2009 Disaggregated Memory.

Software-Based Cache Coherence with Hardware-Assisted Selective Self Invalidations Using Bloom Filters Authors ： Thomas J. Ashby, Pedro D´ıaz, Marcelo.

An Efficient Programmable 10 Gigabit Ethernet Network Interface Card Paul Willmann, Hyong-youb Kim, Scott Rixner, and Vijay S. Pai.

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

More on Locks: Case Studies

NVSleep: Using Non-Volatile Memory to Enable Fast Sleep/Wakeup of Idle Cores Xiang Pan and Radu Teodorescu Computer Architecture Research Lab

Microsoft Research Asia Ming Wu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, Zheng Zhang MIT Fan Long, Xi Wang, Zhilei Xu.

University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.

ReSlice: Selective Re-execution of Long-retired Misspeculated Instructions Using Forward Slicing Smruti R. Sarangi, Wei Liu, Josep Torrellas, Yuanyuan.

Dilip N Simha, Maohua Lu, Tzi-cher chiueh Park Chanhyun ASPLOS’12 March 3-7, 2012 London, England, UK.

By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.

Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Storage Systems CSE 598d, Spring 2007 Rethink the Sync April 3, 2007 Mark Johnson.

CS533 Concepts of Operating Systems Jonathan Walpole.

CISC 879 : Advanced Parallel Programming Rahul Deore Dept. of Computer & Information Sciences University of Delaware Exploring Memory Consistency for Massively-Threaded.

An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy.

Conditional Memory Ordering Christoph von Praun, Harold W.Cain, Jong-Deok Choi, Kyung Dong Ryu Presented by: Renwei Yu Published in Proceedings of the.

Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,

CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.

Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.

NVWAL: Exploiting NVRAM in Write-Ahead Logging

Free Transactions with Rio Vista Landon Cox April 15, 2016.

C++11 Atomic Types and Memory Model

Persistent Memory (PM)

Hathi: Durable Transactions for Memory using Flash

Delegated Persist Ordering

An Analysis of Persistent Memory Use with WHISPER

Hands-Off Persistence System (HOPS)

Failure-Atomic Slotted Paging for Persistent Memory

Free Transactions with Rio Vista

Algorithmic Improvements for Fast Concurrent Cuckoo Hashing

An Analysis of Persistent Memory Use with WHISPER

Speculative Lock Elision

PHyTM: Persistent Hybrid Transactional Memory

Transaction Management and Concurrency Control

Memory Consistency Models

Lecture 11: Consistency Models

Memory Consistency Models

Automatic Detection of Extended Data-Race-Free Regions

Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor

NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah,

Persistency for Synchronization-Free Regions

Synchronization trade-offs in GPU implementations of Graph Algorithms

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Chapter 10 Transaction Management and Concurrency Control

Free Transactions with Rio Vista

Shared Memory Consistency Models: A Tutorial

Presented by Neha Agrawal

PMTest: A Fast and Flexible Testing Framework for Persistent Memory Programs Sihang Liu1, Yizhou Wei1, Jishen Zhao2, Aasheesh Kolli3,4, and Samira Khan1.

Problems with Locks Andrew Whitaker CSE451.

Janus Optimizing Memory and Storage Support for Non-Volatile Memory Systems Sihang Liu Korakit Seemakhupt, Gennady Pekhimenko, Aasheesh Kolli, and Samira.

Presentation transcript:

High-performance transactions for persistent memories Aasheesh Kolli Steven Pelley$ Ali Saidi* Peter M. Chen Thomas F. Wenisch U. of Michigan Snowflake Computing$ ARM* ASPLOS ‘16, Atlanta, GA 05/05/2016

Persistent memory system Core-1 Core-2 Core-3 Core-4 L1 $ LLC Recovery DRAM Persistent Memory (PM)

Motivation Design transactions w/ persistency models PM  in-memory recoverable data structures Recovery relies on ordering updates to PM Persistency models [Pelley ‘14] Interface to constrain order Academia/Industry [Condit ’09, Pelley ‘14, Joshi ’15, Intel ’15, ARM ’16] Hard to reason about  Transactions for users Design transactions w/ persistency models

Transaction performance Simple transaction design  Unnecessary ordering constraints Exacerbated by high PM latencies acqLocks relLocks prepareUndoLog (P) mutateData (M) commitTx (C) Constraints Performance Need to minimize unnecessary ordering constraints

Contributions Optimize transactions for ordering constraints Synchronous Commit Tx (SCT) Deferred Commit Tx (DCT) acqLocks relLocks prepareUndoLog (P) mutateData (M) commitTx (C) Up to 2.5x speedup! Deferred commit Challenge: ensure correct commit order

Outline Background Synchronous commit transactions Deferred commit transactions Compare persistency models Evaluation

Terminology Persist Persist memory order (PMO) Act of making a store durable in PM Persist memory order (PMO) Memory events ordered by persistency model Fewer constrains  higher performance Better re-ordering, coalescing, scheduling Considering only epoch-based persistency models

Persistency model Consistency /* Init: a = 0, b=0 a,b are persistent */ Core-1 Core-2 St a = 1 while (a==0){} St b = 1 Constraint: St a <p St b Writeback caching Mem Ctrl reordering Core-1 Core-2 L1 $ L1 $ Constraints Performance Constraints Performance LLC Persistency b PM a

Transactions Synchronous Commit Transaction (SCT) acqLocks relLocks prepareUndoLog (P) mutateData (M) commitTx (C) . . . P1 P2 PN Recoverable . . . M1 M2 MN . . . C1 C2 CN Synchronous Commit Transaction (SCT)

Epoch persistency [Condit ‘09, Pelley ‘14, Joshi ‘15] Barriers break thread execution into epochs Persists across epochs are ordered No ordering within epoch PMO acqLocks relLocks prepareUndoLog (P) mutateData (M) commitTx (C) … Barrier Epoch-1 P1 P2 Pn … Epoch-2 M1 M2 Mn … Epoch-3 C1 C2 Cn

Inter-thread ordering Conflicting accesses establish persist order PMO must match coherence order Else, recovery to inconsistent states /* Init: a = 0, b=0 a,b are persistent */ Core-1 Core-2 St a = 1 while (a==0){} BARRIER St b = 1 Constraint: St a <p St b Conflicting accesses

Conflicting transactions acqLocks1 relLocks1 P1 M1 C1 acqLocks2 relLocks2 P2 M2 C2 Conflicting accesses Thread - 1 Thread - 2 PMO P1 M1 C1 P2 M2 C2 Sufficient. Necessary?

Necessary PMO constraints acqLocks1 relLocks1 P1 M1 C1 acqLocks2 relLocks2 P2 M2 C2 Conflicting accesses Thread - 1 Thread - 2 P1 M1 C1 P2 M2 C2 Memory order (As observed by cores) Core-1 Core-2 L1 $ L1 $ LLC P1 M1 C1 P2 M2 C2 PMO (Req. for recovery) PMO has fewer constraints

Deferred Commit Transaction (DCT) Defer commit past lock release Breaks persist dependency chain Reduces barriers within transaction Must ensure correct commit order Non-trivial changes to locks, logging Details in the paper Increases executed instructions acqLocks relLocks prepareUndoLog (P) mutateData (M) commitTx (C)

Inter-thread DCT DCT incurs minimal inter-thread constraints acqLocks1 PMO P1 Conflicting accesses P1 M1 C1 M1 P2 M2 C2 relLocks1 Thread - 2 acqLocks2 C1 P2 DCT incurs minimal inter-thread constraints M2 relLocks2 Explicitly ordered In paper: DCT also reduces intra-thread constraints C2

Outline Background Synchronous commit transactions Deferred commit transactions Compare persistency models Evaluation

Eager sync v. Epoch persistency Eager sync (based on [Intel ‘15]) Epoch persistency [Pelley ‘14] Ordering + Completion Stronger guarantee Easier to reason Potentially fewer barriers Conservative stalling Completion not always req. Synchronous barrier Ordering only (mostly) Weaker guarantee Harder to reason Potentially more barriers Stall when no buffer space Ordering mostly sufficient Delegation barrier

Barriers in action Synchronous: Epoch-1 persists before Epoch-2 begins Delegation: Epoch-1 ordered to persist before Epoch-2 E1 E2 Finish Core $ Time E2 E1 > Finish Core $ Time

Outline Background Synchronous commit transactions Deferred commit transactions Compare persistency models Evaluation

Methodology For Epoch Persistency For Eager Sync Two benchmarks: Trace-based evaluation Exec time = Max(Inst. latency, Persist latency) Inst. latency = Exec time without persist ordering constrains Persist latency = Time to drain persists to PM Governed by #persists, #barriers, PM access latency For Eager Sync Barrier stalls implemented using rdtscp timer Two benchmarks: Update Location (TATP): Write-intensive, small transactions New Order (TPCC): Write-intensive, large transactions

DCT v. SCT – TATP Better Eager sync Epoch persistency 2.56x 1.48x Speedup Speedup Avg. persist epoch latency (us) Avg. persist epoch latency (us) *Normalized to SCT

DCT v. SCT - TPCC Better Eager sync Epoch persistency 1.52x 1.47x Speedup Speedup Avg. persist epoch latency (us) Avg. persist epoch latency (us) *Normalized to SCT

In the paper Precise definition of the persistency models More persistency models Strict persistency Strand persistency Detailed analysis of ordering constraints What are necessary ordering constraints? How are they enforced? Especially, commit order tracking for DCT

Conclusions Minimize ordering constrains for performance Optimize transactions for ordering constrains Deferred Commit Transactions – 2.5x speedup Trade-offs between persistency models

Questions?

Thank you!