Consistency Oblivious Programming Hillel Avni Tel Aviv University.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

CM20145 Concurrency Control
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Concurrency Control III. General Overview Relational model - SQL Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Concurrent programming for dummies (and smart people too) Tim Harris & Keir Fraser.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Monitoring Data Structures Using Hardware Transactional Memory Shakeel Butt 1, Vinod Ganapathy 1, Arati Baliga 2 and Mihai Christodorescu 3 1 Rutgers University,
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Chapter 4: Trees Part II - AVL Tree
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Nested Transactional Memory: Model and Preliminary Architecture Sketches J. Eliot B. Moss Antony L. Hosking.
DMITRI PERELMAN IDIT KEIDAR TRANSACT 2010 SMV: Selective Multi-Versioning STM 1.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Hash Tables1 Part E Hash Tables  
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
An Integrated Hardware-Software Approach to Transactional Memory Sean Lie Theory of Parallel Systems Monday December 8 th, 2003.
An Introduction to Software Transactional Memory
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
A Simple Optimistic skip-list Algorithm Maurice Herlihy Brown University & Sun Microsystems Laboratories Yossi Lev Brown University & Sun Microsystems.
Range Queries in Non-blocking k-ary Search Trees Trevor Brown Hillel Avni.
Transactional Locking Nir Shavit Tel Aviv University Joint work with Dave Dice and Ori Shalev.
Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
How D can make concurrent programming a piece of cake Bartosz Milewski D Programming Language.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
Hardware and Software transactional memory and usages in MRE
Concurrent Cache-Oblivious B-trees Using Transactional Memory
COMP 430 Intro. to Database Systems Transactions, concurrency, & ACID.
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
15-740/ Computer Architecture Lecture 3: Performance
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
Combining HTM and RCU to Implement Highly Efficient Balanced Binary Search Trees Dimitrios Siakavaras, Konstantinos Nikas, Georgios Goumas and Nectarios.
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Atomic Operations in Hardware
Atomic Operations in Hardware
Concurrent Data Structures Concurrent Algorithms 2017
Trafodion Distributed Transaction Management
Chapter 10 Transaction Management and Concurrency Control
Lecture 22: Consistency Models, TM
Hassium: Hardware Assisted Database Synchronization
Software Transactional Memory Should Not be Obstruction-Free
Locking Protocols & Software Transactional Memory
Kernel Synchronization II
Multicore programming
Multicore programming
Lecture: Consistency Models, TM
Concurrent Cache-Oblivious B-trees Using Transactional Memory
Problems with Locks Andrew Whitaker CSE451.
Dynamic Performance Tuning of Word-Based Software Transactional Memory
Concurrency control (OCC and MVCC)
Presentation transcript:

Consistency Oblivious Programming Hillel Avni Tel Aviv University

Agenda  Transactional Memory and Locking  Consistency Oblivious Programming (COP)  COP with STM  COP With HTM  Future Work 2

Global Lock Easy to use Composable - Concatenate critical sections Not scalable 3

Fine Grain Locking Hard to use Not Composable Scalable Lazy linked list is a good example… 4

Lazy Traversal b d e a add(c) Aha! 5

Lock and Validate b d e a add(c) Yes, b still points to d 6

Perform Updates and Release Locks b d e a add(c) c 7

Transactional Memory Easy to use Composable Scalable How is it done? 8

9 Java (Duece) bool CAS(int location, int expected, int new val) { atomic { if (location != expected) return false; location = new val; } return true; }

10 bool CAS(int location, int expected, int new val) { __transaction_atomic { if (location != expected) return false; location = new val; } return true; } C/C++ (GCC-4.7)

11 Software Transactional Memory Different algorithms are used. consistency checking rollback Compiler recognizes shared accesses. Compiler recognizes shared accesses.

STM Problem - Overhead template static V load(const V* addr, ls_modifier mod) { if (unlikely(mod == RfW)) { pre_write(addr, sizeof(V)); return *addr; } if (unlikely(mod == RaW)) return *addr; gtm_thread *tx = gtm_thr(); gtm_rwlog_entry* log = pre_load(tx, addr, sizeof(V)); V v = *addr; atomic_thread_fence(memory_order_acquire); post_load(tx, log); return v; } load function from GCC

STM Problem - Overhead static gtm_rwlog_entry* pre_load(gtm_thread *tx, const void* addr, size_t len) { size_t log_start = tx->readlog.size(); gtm_word snapshot = tx->shared_state.load(memory_order_relaxed); gtm_word locked_by_tx = ml_mg::set_locked(tx); size_t orec = ml_mg::get_orec(addr); size_t orec_end = ml_mg::get_orec_end(addr, len); do { gtm_word o = o_ml_mg.orecs[orec].load(memory_order_acquire); if (likely (!ml_mg::is_more_recent_or_locked(o, snapshot))) { success: gtm_rwlog_entry *e = tx->readlog.push(); e->orec = o_ml_mg.orecs + orec; e->value = o; } else if (!ml_mg::is_locked(o)) {snapshot = extend(tx); goto success; } else { if (o != locked_by_tx) tx->restart(RESTART_LOCKED_READ);} orec = o_ml_mg.get_next_orec(orec); } while (orec != orec_end); return &tx->readlog[log_start]; } load always call pre_load 13

STM Problem - Overhead static void post_load(gtm_thread *tx, gtm_rwlog_entry* log) { for (gtm_rwlog_entry *end = tx->readlog.end(); log != end; log++) { gtm_word o = log->orec->load(memory_order_relaxed); if (log->value != o) tx->restart(RESTART_VALIDATE_READ); } and post_load Compare to mov eax, [ebx] on x86 Compare to mov eax, [ebx] on x86 14

15 Hardware Transactional Memory Exploit native cache coherence consistency checking rollback

16 HTM Problem – Resources limits cache size limits data footprint A transaction cannot commit if it is too big too slow quantum size limits duration

17 All TM Problem – False Conflicts Any address that was encountered during the transaction is monitored until the end of that transaction. An address may abort a transaction long After it is not relevant… Any address that was encountered during the transaction is monitored until the end of that transaction. An address may abort a transaction long After it is not relevant…

Agenda  Transactional Memory and Locking  Consistency Oblivious Programming (COP)  COP with STM  COP With HTM  Future Work 18

COP Operation In non transactional mode: –Execute the read-only prefix of the operation and record its output. In transactional mode: –Verify output is correct. –Perform updates. 19

COP Example – RB Tree

Add 26 – Tree Unbalanced TM Search

Tree Balanced TM Search continues from 27 Conflict and Abort 22

Add 26 – Tree Unbalanced COP Search

Tree Balanced TM Search continues from 27 Found 24

COP RB-Tree Verify To facilitate verification: all nodes in the RB-Tree are connected in a successor- predecessor doubly linked list, and each node has a live mark. Search returns a node n with k or a leaf with k’s successor or predecessor. 25

COP RB-Tree Suffix Resume a transaction Verify: –k found and n is live – done. –K not found, check: (n.k>k>n.pred.k && !n.right) or (n.k<k<n.succ.k && !n.left) If verification failed – abort the transaction. Complete updates, add / remove / rebalance, using n. 26

COP Template for op start-transaction any-code suspend-transaction output = op-rop(); resume-transaction If(not(op-verify(output))) abort-transaction op-complete(output) any-code end-transaction 27

COP Correctness The underlying TM: Transactional Regular Registers The COP algorithm: Obliviousness Verifiability Separation We prove that if the TM yields transactional regular registers, and the COP algorithm demonstrates obliviousness, verifiability, and separation, than the COP operation is linearizeable. 28

Agenda  Transactional Memory and Locking  Consistency Oblivious Programming (COP)  COP with STM  COP With HTM  Future Work 29

STM Algorithm GCC default STM algorithm is the one that proved to be the most efficient and scalable in most scenarios: –Write Through (WT) –Encounter Time Locking (ETL) –Multi Lock (ML) 30

STM: WT – ETL - ML 1.RV  Shared Version Clock 2.On Read: check unlocked and v# <= RV then add to read-Set 3.On write: check v# <= RV, lock, and add to undo-Set 4.WV = F&I(VClock) 5.Validate that in the read-set each v# <= RV 6.Release locks with v#  WV 100 Shared Version Clock V# Mem Locks X Y Commit V# V# RV X Y 31

GCC Constructs __transaction_atomic{}: Mark the transaction. __transaction_cancel: Explicit abort. __attribute__((transaction_safe)): Instrument the code. __attribute__((transaction_pure)): Do not instrument the code. We will show this attribute can be used efficiently as __transaction_suspend with WT – ETL – ML default STM algorithm in GCC. 32

pure = suspend Transactional Regular Registers – All values upto one architecture-word size are written and read atomically. The rollback may use memcpy, but the memcpy is optimized to write maximal alignment. Now we will compare the future Power architecture HTM suspended mode, to transaction_pure with WT- ETL-ML STM algorithm. 33

Power tsuspend - tresume 1.Until failure occurs, load instructions that access memory locations that were transactionally written by the same thread will return the transactionally written data. 2.In the event of transaction failure, failure recording is performed, but failure handling is deferred until transactional execution is resumed. 3.The initiation of a new transaction is prevented. 4.Store instructions that access memory locations that have been accessed transactionally (due to load or store) by the same thread will cause the transaction to fail. 34

RB – 1M sz – 20%U - 10 op/tx 35

RB – 1K sz – 8 Threads – 20% U 36

Agenda  Transactional Memory and Locking  Consistency Oblivious Programming (COP)  COP with STM  COP With HTM  Future Work 37

Haswell HTM with COP There is no suspend mode, so to compose COP operations, we execute all ROP before the transaction. This limits the composition to one writing COP operation in a transaction at most. 38

Capacity and Cache Associativity Packed Memory Array (PMA) search is done by divide and conquer. Assume a PMA size is 0x800000, and it starts at address 0. A searches for an item that is found in address 0x0…0x7FFF, must go through the addresses: 0x x x x x x x x8000 As cache size in Haswell is 0x8000, all these addresses have the same cache index (0), and will always abort. 39

PMA 40

RB-Tree Capacity Aborts 41

RB-Tree Conflict Aborts 42

Agenda  Transactional Memory and Locking  Consistency Oblivious Programming (COP)  COP with STM  COP With HTM  Future Work 43

Data Structures We already have COP versions of: RB-Tree Linked list PMA Cache Oblivious B-Tree Leaplist (k-ary skip list, tailored for range queries) Can we design more COP data structures? 44

Applications Use COP in applications. Many applications use shared data structures, so it is interesting to see the impact of COP on their performance. 45

Infrastructure Add statistics (transactional accesses, conflicts) to GCC. Add real suspend-mode to GCC, hardware. 46

Theory How to make transformation to COP automatic? Is COP applicable outside the data-structures area? Bounds on the amount of transactional accesses? Bounds on the amount of false conflicts? 47

Thank You