1 Johannes Schneider Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner Johannes Schneider David Hasenfratz Roger Wattenhofer.

Slides:



Advertisements
Similar presentations
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Advertisements

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Raphael Eidenbenz Roger Wattenhofer Roger Wattenhofer Good Programming in Transactional Memory Game Theory Meets Multicore Architecture.
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Concurrency Control II
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Exploiting Distributed Version Concurrency in a Transactional Memory Cluster Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza University of Toronto,
Steal-on-abort Improving Transactional Memory Performance through Dynamic Transaction Reordering Mohammad Ansari University of Manchester.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Concurrency Control Chapter 17 Sections
Concurrency Control II. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Concurrency Control Part 2 R&G - Chapter 17 The sequel was far better than the original! -- Nobody.
IDIT KEIDAR DMITRI PERELMAN RUI FAN EuroTM 2011 Maintaining Multiple Versions in Software Transactional Memory 1.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based.
Lock-Based Concurrency Control
1 Chapter 3. Synchronization. STEMPusan National University STEM-PNU 2 Synchronization in Distributed Systems Synchronization in a single machine Same.
1 CMSC421: Principles of Operating Systems Nilanjan Banerjee Principles of Operating Systems Acknowledgments: Some of the slides are adapted from Prof.
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
Selfishness in Transactional Memory Raphael Eidenbenz, Roger Wattenhofer Distributed Computing Group Game Theory meets Multicore Architecture.
CSE 326 Randomized Data Structures David Kaplan Dept of Computer Science & Engineering Autumn 2001.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
1 Scalable Transactional Memory Scheduling Gokarna Sharma (A joint work with Costas Busch) Louisiana State University.
James the GIANT killer: evaluating locking schemes in james francis toy iv David Hemmendinger.
Synchronization (Barriers) Parallel Processing (CS453)
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Window-Based Greedy Contention Management for Transactional Memory Gokarna Sharma (LSU) Brett Estrade (Univ. of Houston) Costas Busch (LSU) 1DISC 2010.
Oct Multi-threaded Active Objects Ludovic Henrio, Fabrice Huet, Zsolt Istvàn June 2013 –
Chapter 5 – CPU Scheduling (Pgs 183 – 218). CPU Scheduling  Goal: To get as much done as possible  How: By never letting the CPU sit "idle" and not.
Hotspot Detection in a Service Oriented Architecture Pranay Anchuri,
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
State Teleportation How Hardware Transactional Memory can Improve Legacy Data Structures Maurice Herlihy and Eli Wald Brown University.
Eric Chang and Rutwik Parikh. Goal: Determine the largest subset of edges in a graph such that no vertex of the graph is touched by more than one edge.
Transactional Lee’s Algorithm 1 A Study of a Transactional Routing Algorithm Ian Watson, Chris Kirkham & Mikel Lujan School of Computer Science University.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
The Relational Model1 Transaction Processing Units of Work.
CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Optimistic Methods for Concurrency Control By: H.T. Kung and John Robinson Presented by: Frederick Ramirez.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
1 Concurrency Control Lecture 22 Ramakrishnan - Chapter 19.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are.
Lecture 6 Page 1 CS 111 Summer 2013 Concurrency Solutions and Deadlock CS 111 Operating Systems Peter Reiher.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
NOEA/IT - FEN: Databases/Transactions1 Transactions ACID Concurrency Control.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
CPU Scheduling Operating Systems CS 550. Last Time Deadlock Detection and Recovery Methods to handle deadlock – Ignore it! – Detect and Recover – Avoidance.
Silberschatz, Galvin and Gagne ©2009 Edited by Khoury, 2015 Operating System Concepts – 9 th Edition, Chapter 7: Deadlocks.
Lecture 4 Page 1 CS 111 Summer 2013 Scheduling CS 111 Operating Systems Peter Reiher.
Window-Based Greedy Contention Management for Transactional Memory Gokarna Sharma (LSU) Brett Estrade (Univ. of Houston) Costas Busch (LSU) DISC
Process Management Deadlocks.
Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Dan C. Marinescu Office: HEC 439 B. Office hours: M, Wd 3 – 4:30 PM.
Advanced Topics in Concurrency and Reactive Programming: Asynchronous Programming Majeed Kassis.
Transactional Memory: How to Perform Load Adaption
Faster Data Structures in Transactional Memory using Three Paths
Challenges in Concurrent Computing
CPSC 531: System Modeling and Simulation
CPU Scheduling G.Anuradha
Lecture 6: Transactions
Chapter5: CPU Scheduling
Lecture 22: Consistency Models, TM
Distributed Database Management Systems
Hybrid Transactional Memory
Software Transactional Memory Should Not be Obstruction-Free
Kernel Synchronization II
EdgeWise: A Better Stream Processing Engine for the Edge
Presentation transcript:

1 Johannes Schneider Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner Johannes Schneider David Hasenfratz Roger Wattenhofer

2 Johannes Schneider “computer science will become washing machine science.“ Without easy and efficient parallel programming methods…

How to handle access to shared data?  Locks, Monitors…  Coarse grained vs. fine grained locking easy but slow program demanding, time consuming but fast programs  Problems  difficult  error prone  Composability …… Johannes Schneider lock all data modify/use data unlock all data lock A lock B modify/use A,B lock C modify/use A,B,C unlock A modify/use B,C unlock B,C lock B lock A modify/use A,B unlock A,B Deadlock! Only 1 thread can execute 3 Thread 1 Thread 2

Transactional memory(TM) - a possible solution  Simple for the programmer  Composable  Idea from database community  Many TM systems (internally) still use locks  But the TM system (not the programmer) takes care of  Performance  Correctness (no deadlocks...) Johannes Schneider Begin transaction modify/use data End transaction Method A.x() Begin Transaction B.y() … End Transaction Method B.y() Begin transaction … End transaction 4

Transactional memory systems  If transactions modify different data, everything is ok  the same data, conflicts arise that must be resolved  Transactions might get delayed or aborted  Job of a contention manager  A transaction keeps track of all modified values  It restores all values, if it is aborted  A transaction successfully finishes with a commit Johannes Schneider 5

 Abort or delay a transaction, i.e. adapt load  Distributed  Each thread has its own manager  Example  Initially: A=1, B=1 Manager 1 Manager 2 T1 Trans. 1 T1 Trans. 2 B:=2 … A:=3 … conflict … A:=2 … ‏ Abort (undo all changes, i.e. set A:=1)‏ and restart (after a while) T1 Trans.1 … A:=2 … Trans. 2 B:=2 … A:=3 … conflict Abort (set B:=1) and restart OR wait and retry Conflicts – A contention manager decides Johannes Schneider 6 Manager 1 Manager 2 Delay to adapt load!

Prior work  Contention Managers [PODC03,PODC05,ISAAC09…]  System load was not (explicitly) considered  Load adaption (based on contention)  Estimate contention intensity: CI [SPAA08]  If abort: CI = a CI + (1-a) with parameter a [0,1]  If commit: CI = a CI  If CI > parameter b then resort to central scheduler  Keep a transaction queue per core [PODC08]  Central dispatcher assigns transactions to a core, i.e. its queue  Each core iteratively executes transactions from queue  If transaction A on core 1 is aborted due to B on core 2 then A is appended to the queue of core 2  Central scheduler will become a bottleneck Johannes Schneider 7 Core 1 Core 2 A B C D Core 1 Core 2 A B C D B aborts A

This paper  Theoretical analysis  Decentralized (simple) approaches to load adaption  based on contention Johannes Schneider 8

Strategies  Ignore: Do not learn from conflicts  ImmediateRestart  Stay real: Remember faced conflicts  SerializeFacedConflicts  Do not schedule prior conflicting transactions concurrently  Be cautious: Assume additional conflicts  SerializeAll  All transactions in a subgraph are assumed to conflict Johannes Schneider 9 B A D C Conflict graph A conflicted with C D conflicted with B A D C B A D C B A C B D

Load Adaption Strategies  AbortBackoff  If aborted wait for a random time [0,2 #aborts ]  Priority = number of aborts #aborts  Who wins a conflict?  2 strategies  Estimate the work done  Unrelated to work done Johannes Schneider 10

Theory Part - Model  n transactions (and threads)  Start concurrently on n cores  Transaction  sequence of operations  operation takes 1 time unit  duration (number of operations) t T is fixed  2 types of operations  Write = modify (shared) resource and lock it until commit  Compute/abort/commit  Ignore overhead of load adaption  Remembering transactions, scheduling… Johannes Schneider 11 Core 1 Core 2 B A Core n Z … A

Moderate parallelism  Shared counter  Conflicts directly after transaction start  Linked List  Conflicts at arbitrary time  Expected time span until all transactions committed  Speed-up log n (at best) Johannes Schneider 12 PolicyCounterList ImmediateRestart AbortBackoff SerializeFacedConflicts SerializeAll Transaction run time #transactions

Substantial parallelism  Worst case  Conflict graph is d-ary tree of logarithmic height  Exponential gap in worst case  SerializeAll and others Johannes Schneider 13 PolicyTime until transactions committed ImmediateRestart AbortBackoff SerializeFacedConflicts SerializeAll T1 T2 T3 T4 T5 …

Practical investigation  Remembering conflicts causes too much overhead  Good for analysis but not for implementation  Quickadapter  Serializes transactions  Each core has a “waiting” flag  If aborted, set flag and wait until flag unset  If commit, unset some flag  AbortBackOff  (Also considered some variants) Johannes Schneider 14

Practical investigation  Evaluation on 16 core machine  DSTM2 system  Visible readers  Six benchmarks  Little parallelism  Shared counter, Sorted List (accessed objects not released), Listcounter  Considerable parallelism  Red Black Tree, LFUCache, RandomAccessArray  Compare new load adaption policies to existing contention managers Johannes Schneider 15

Discussion  Hard to keep maximum throughput, also in [SPAA08, PODC08]  Even without conflicts  Improvement for 1 benchmark worsens another  On average better than schemes without load adaption 16 Johannes Schneider

Conclusion  Simple and distributed load adaption strategies  Theory  (For now) constants and parameters matter a lot  Practice  Hard to keep load at peak for all usage patterns 17 Johannes Schneider

18 Johannes Schneider \vspace{10pt} Thanks for your attention! Questions? ???

Analysis AbortBackoff for counter  Recall: If aborted wait for a random time [0,2 #aborts ]  Assume #aborts ~ log (nt T ) + x (for some x)  Define: a(x) := fraction of active nodes  a(0) = 1 (after time ~2 log (nt T ) = nt T a constant fraction still active)  Chance conflict for interval [0,2 #aborts ] Interval [0, 2 log(ntT)+x ] ~ a(x) nt T / 2 log (nt T ) +x = a(x) /2 x  a(x+1) = a(x)/2 x = 1/2 ∑ i=0..x i ~ 1/2 x 2  a(√log n) = 1/2 (√log n) 2 = 1/n  ∑ i=0.. log (nt T ) +√log n length interval = ∑ i=0.... log (nt T ) +√log n 2 i = nt T 2 √log n+1 Johannes Schneider 19 T1 T2 T3 a(x)nt T = 3/n n t T = 3t T