Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009.

Slides:



Advertisements
Similar presentations
Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Advertisements

Impossibilities for Disjoint-Access Parallel Transactional Memory : Alessia Milani [Guerraoui & Kapalka, SPAA 08] [Attiya, Hillel & Milani, SPAA 09]
© 2005 P. Kouznetsov Computing with Reads and Writes in the Absence of Step Contention Hagit Attiya Rachid Guerraoui Petr Kouznetsov School of Computer.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Chapter 6: Process Synchronization
Scheduling Memory Transactions. Synchronization alternatives: Transactional Memory  A (memory) transaction is a sequence of memory reads and writes executed.
Scheduling-based TM Contention Management A survey talk 3 rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome Danny Hendler Ben-Gurion.
Safety Definitions and Inherent Bounds of Transactional Memory Eshcar Hillel.
Inherent limitations on DAP TMs 1 Inherent Limitations on Disjoint-Access Parallel Transactional Memory Hagit Attiya, Eshcar Hillel, Alessia Milani Technion.
Transactional Contention Management as a Non-Clairvoyant Scheduling Problem Alessia Milani [Attiya et al. PODC 06] [Attiya and Milani OPODIS 09]
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Threads Irfan Khan Myo Thein What Are Threads ? a light, fine, string like length of material made up of two or more fibers or strands of spun cotton,
Presented by: Ofer Kiselov & Omer Kiselov Supervised by: Dmitri Perelman Final Presentation.
1 Johannes Schneider Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner Johannes Schneider David Hasenfratz Roger Wattenhofer.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
Algorithmics for Software Transactional Memory Hagit Attiya Technion.
Transaction Management and Concurrency Control
The Cost of Privatization Hagit Attiya Eshcar Hillel Technion & EPFLTechnion.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.
A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011 Presnted.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
An Introduction to Software Transactional Memory
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
Window-Based Greedy Contention Management for Transactional Memory Gokarna Sharma (LSU) Brett Estrade (Univ. of Houston) Costas Busch (LSU) 1DISC 2010.
Solution to Dining Philosophers. Each philosopher I invokes the operations pickup() and putdown() in the following sequence: dp.pickup(i) EAT dp.putdown(i)
CAR-STM: Scheduling-based Collision Avoidance and Reduction for Software Transactional Memory Shlomi Dolev, Danny Hendler and Adi Suissa PODC 2008.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Threads in Java. History  Process is a program in execution  Has stack/heap memory  Has a program counter  Multiuser operating systems since the sixties.
Scheduling Basic scheduling policies, for OS schedulers (threads, tasks, processes) or thread library schedulers Review of Context Switching overheads.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
Lowering the Overhead of Software Transactional Memory Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
CSE 153 Design of Operating Systems Winter 2015 Midterm Review.
Kernel-Assisted Scheduling and Deadline Support for Software Transactional Memory Walther Maldonado, Patrick Marlier, Pascal Felber, Etienne Rivière University.
DECS: A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler, Adi Suissa 1 OPODIS 2011.
Introduction to operating systems What is an operating system? An operating system is a program that, from a programmer’s perspective, adds a variety of.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
Window-Based Greedy Contention Management for Transactional Memory Gokarna Sharma (LSU) Brett Estrade (Univ. of Houston) Costas Busch (LSU) DISC
Transactional Contention Management as a Non-Clairvoyant Scheduling Problem Hagit Attiya, Alessia Milani Technion, Haifa-LABRI, University of Bordeaux.
Workshop on Transactional Memory 2012 Walther Maldonado Moreira University of Neuchâtel (UNINE), Switzerland Pascal Felber UNINE Gilles Muller INRIA, France.
Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Mihai Burcea, J. Gregory Steffan, Cristiana Amza
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Challenges in Concurrent Computing
A Lock-Free Algorithm for Concurrent Bags
Anders Gidenstam Håkan Sundell Philippas Tsigas
Introduction What is an operating system bootstrap
Threads Chapter 4.
Introduction of Week 13 Return assignment 11-1 and 3-1-5
CS333 Intro to Operating Systems
Dynamic Performance Tuning of Word-Based Software Transactional Memory
CSE 542: Operating Systems
Presentation transcript:

Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009

Synchronization alternatives: Transactional Memory  A (memory) transaction is a sequence of memory reads and writes executed by a single thread that either commits or aborts  If a transaction commits, all the reads and writes appear to have executed atomically  If a transaction aborts, none of its operations take effect  Transaction operations aren't visible until they commit (if they do) Parallel computing day, Ben-Gurion University, October 20, 2009

Transactional Memory Implementations Hardware Transactional Memory  Transactional Memory [Herlihy & Moss, '93]  Transactional Memory Coherence and Consistency [Hammond et al., '04]  Unbounded transactional memory [Ananian, Asanovic, Kuszmaul, Leiserson, Lie, '05] … Software Transactional Memory  Software Transactional Memory [Shavit &Touitou, '97]  DSTM [Herlihy, Luchangco, Moir, Scherer, '03]  RSTM [Marathe et al., '06]  WSTM [Harris & Fraser, '03], OSTM [Fraser, '04], ASTM [Marathe, Scherer, Scott, '05], SXM [Herlihy] … Parallel computing day, Ben-Gurion University, October 20, 2009

“Conventional” STM system high-level structure TM system OS-scheduler-controlled threads Contention Manager Contention Detection arbitrate proceed Abort/retry, wait Parallel computing day, Ben-Gurion University, October 20, 2009

Talk outline Preliminaries Memory Transactions Scheduling: Rationale CAR-STM Adaptive TM Schedulers TM-scheduling OS support Parallel computing day, Ben-Gurion University, October 20, 2009

TM-ignorant schedulers are problematic! 1)Does not permit serializing contention management and collision avoidance. 2)Makes it difficult to dynamically reduce concurrency level. 3)Hurts TM performance stability/predictability. TM-ignorant scheduling: Parallel computing day, Ben-Gurion University, October 20, 2009

Enter TM schedulers  “Adaptive transaction scheduling for transactional memory systems” [Yoo & Lee, SPAA'08]  “CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory” [Dolev, Hendler & Suissa, PODC '08]  “Steal-on-abort: dynamic transaction reordering to reduce conflicts in transactional memory” [Ansari et al., HiPEAC'09]  “Preventing versus curing: avoiding conflicts in transactional memories” [Dragojevic, Guerraoui, Singh & Singh, PODC'09]  “Transactional scheduling for read-dominated workloads” [Attiya & Milani, OPODIS'09]  “On the impact of Serializing Contention Management on STM performance” [Heber, Hendler & Suissa, OPODIS '09, to appear]  “Scheduling support for transactional memory contention management” [Fedorova, Felber, Hendler, Lawall, Maldonado, Marlier Muller & Suissa, PPoPP'10]

Parallel computing day, Ben-Gurion University, October 20, 2009 Our work  “CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory” [Dolev, Hendler & Suissa, PODC '08]  “On the impact of Serializing Contention Management on STM performance” [Heber, Hendler & Suissa, OPODIS '09]  “Scheduling support for transactional memory contention management” [Fedorova, Felber, Hendler, Lawall, Maldonado, Marlier Muller & Suissa, PPoPP'10]

CAR-STM (Collision Avoidance and Reduction for STM) Design Goals Parallel computing day, Ben-Gurion University, October 20, 2009  Limit Parallelism to a single transaction per core (or hardware thread)  Serialize conflicting transactions  Contention avoidance

CAR-STM high-level architecture Transaction queue #1 TQ thread Transaction thread T-Info Core #1 Serializing contention mgr. Dispatcher Collision Avoider Core #k Transaction queue #k Parallel computing day, Ben-Gurion University, October 20, 2009

TQ-Entry Structure Transaction queue #1 TQ thread Transaction thread T-Info Core #1 Serializing contention mgr. Dispatcher Collision Avoider Core #k Transaction queue #k wrapper method Transaction data T-Info Trans. thread Lock, condition var Parallel computing day, Ben-Gurion University, October 20, 2009

Transaction dispatching process Call Dispatcher with a T-Info pointer argument 1 Call app-specific conflict probability method 3 Dispatcher calls Collision Avoider 2 Enque transaction in most-conflicting queue. Put thread to sleep, notify TQ thread. 4 4 Parallel computing day, Ben-Gurion University, October 20, 2009

Transaction execution TQ thread Core #i Transaction queue #i wrapper method Transaction data T-Info Trans. thread Lock, condition var TQ thread executes transaction 1 TQ thread wakes-up transaction thread 2 TQ thread dequeues entry 3 Parallel computing day, Ben-Gurion University, October 20, 2009

Dispatcher / TQ-thread synchronization TQ thread Core #i Transaction queue #i Dispatcher When TQ is emptied, TQ thread goes to sleep 1 When dispatcher adds a transaction, it wakes-up TQ thread 2 Parallel computing day, Ben-Gurion University, October 20, 2009

Serializing Contention Managers  When two transactions collide, fail the newer transaction and move it to the TQ of the older transaction  Fast elimination of live-lock scenarios  Two SCMs implemented o Basic (BSCM) – move failed transaction to end of the other transactions' TQ o Permanent (PSCM) – Make the failed transaction a subordinate-transaction of the other transaction Parallel computing day, Ben-Gurion University, October 20, 2009

PSCM TaTa Transaction queue #1 TQ thread Core #1 PSCM TbTb Transaction queue #k TQ thread Core #k TcTc TdTd TeTe Transactions a and b collide, b is older Parallel computing day, Ben-Gurion University, October 20, 2009

PSCM Transaction queue #1 TQ thread Core #1 PSCM TbTb Transaction queue #k TQ thread Core #k TaTa TcTc TdTd TeTe Losing transaction and its subordinates are made subordinates of winning transaction TaTa TcTc Parallel computing day, Ben-Gurion University, October 20, 2009

Execution time: STMBench7 R/W dominated workloads

Throughput: STMBench7 R/W dominated workloads

CAR-STM Shortcomings  May restrict parallelism too much  At most a single transactional thread per core/hardware- thread  Transitive serialization  High overhead  Non-adaptive

Talk outline Parallel computing day, Ben-Gurion University, October 20, 2009 Preliminaries Memory Transactions Scheduling: Rationale CAR-STM Adaptive TM Scheduling TM-scheduling OS support

“On the impact of Serializing Contention Management on STM performance”  CBench – synthetic benchmark generating workloads with pre-determined length and abort probability.  A low-overhead serialization mechanism  Better understanding of adaptive serialization algorithms Parallel computing day, Ben-Gurion University, October 20, 2009

A Low Overhead Serialization Mechanism (LO-SER) Transactional threads Condition variables

Parallel computing day, Ben-Gurion University, October 20, 2009 A Low Overhead Serialization Mechanism (cont'd) 1) t Identifies a collision 2) t calls contention manager: ABORT_OTHER 3) t change status of t' to ABORT (writes that t is winner) tt' 4) t' identifies it was aborted

Parallel computing day, Ben-Gurion University, October 20, 2009 A Low Overhead Serialization Mechanism (cont'd) t t' 5) t' rolls back transaction and goes to sleep on the condition variable of t 6) Eventually t commits and broadcasts on its condition variable…

Parallel computing day, Ben-Gurion University, October 20, 2009 A Low Overhead Serialization Mechanism (cont'd) tt'

Parallel computing day, Ben-Gurion University, October 20, 2009 Requirements for serialization mechanism  Commit broadcasts only if transaction won a collision since last broadcast (or start of transaction)  No waiting cycles (deadlock-freedom)  Avoid race conditions

Parallel computing day, Ben-Gurion University, October 20, 2009 LO-SER algorithm: data structures

Parallel computing day, Ben-Gurion University, October 20, 2009 LO-SER algorithm: pseudo-code

Parallel computing day, Ben-Gurion University, October 20, 2009 LO-SER algorithm: pseudo-code (cont'd)

Parallel computing day, Ben-Gurion University, October 20, 2009 LO-SER algorithm: pseudo-code (cont'd)

Parallel computing day, Ben-Gurion University, October 20, 2009 Adaptive algorithms  Collect (local or global) statistics on contention level.  Apply serialization only when contention is high. Otherwise, apply a “conventional” contention-management algorithm.  We find that Stabilized adaptive algorithms perform better. First adaptive TM scheduler: “Adaptive transaction scheduling for transactional memory systems” [Yoo & Lee, SPAA'08]

CBench Evaluation CAR-STM incurs high overhead as compared with other algorithms Always serializing is bad in medium contention Always serializing is best in high contention Always serializing incurs no overhead in the lack of contention

CBench Evaluation Adaptive serialization fares well for all contention levels

CBench Evaluation Conventional CM performance degrades for high contention

Parallel computing day, Ben-Gurion University, October 20, 2009 CBench Evaluation (cont'd) CAR-STM has best efficiency but worst throughput

RandomGraph Evaluation Stabilized algorithm improves throughput by up to 30% Throughput and efficiency of conventional algorithms are bad

Preliminaries Memory Transactions Scheduling: Rationale CAR-STM Adaptive TM Schedulers TM-scheduling OS support Parallel computing day, Ben-Gurion University, October 20, 2009 Talk outline

Parallel computing day, Ben-Gurion University, October 20, 2009 “Scheduling Support for Transactional Memory Contention Management”  Implement CM scheduling support in the kernel scheduler (Linux & OpenSolaris)  (Strict) serialization  Soft serialization  Time-slice extension  Different mechanisms for communication between user- level STM library and kernel scheduler

Parallel computing day, Ben-Gurion University, October 20, 2009 TM Library / Kernel Communication via Shared Memory Segment (Ser-k)  User code notifies kernel on events such as: transaction start, commit and abort (in which case thread yields)  Kernel code handles moving thread between ready and blocked queues

Parallel computing day, Ben-Gurion University, October 20, 2009 Soft Serialization  Instead of blocking, reduce loser thread priority and yield  Efficient in scenarios where loser transactions may take a different execution path when retrying (non-determinism)  Priority should be restored upon commit or when conflicting transactions terminate

Parallel computing day, Ben-Gurion University, October 20, 2009 Time-slice extention  Preemption in the midst of a transaction increases conflict “window of vulnerability”  Defer preemption of transactional threads  avoid CPU monopolization by bounding number of extensions and yielding after commit  May be combined with serialization/soft serialization

Evaluation (STMBench7, 16 core machine) Conventional CM deteriorates when threads>cores Serializing by local spinning is efficient as long as threads ≤ cores

Evaluation - STMBench7 throughput Serializing by sleeping on condition var is best when threads>cores, since system call overhead is negligible (long transactions)

Evaluation - STMBench7 aborts data

Evaluation (STAMP applications)

Conclusions  Scheduling-based CM results in  Improved throughput in high contention  Improved efficiency in all contention levels  LO-SER-based serialization incurs no visible overhead  Lightweight kernel support can improve performance and efficiency  Dynamically selecting best CM algorithm for workload at hand is a challenging research direction Parallel computing day, Ben-Gurion University, October 20, 2009

Thank you. Any questions? Parallel computing day, Ben-Gurion University, October 20, 2009