Download presentation
Presentation is loading. Please wait.
1
Scheduling Memory Transactions Parallel computing day, Ben-Gurion University, October 20, 2009
2
Synchronization alternatives: Transactional Memory A (memory) transaction is a sequence of memory reads and writes executed by a single thread that either commits or aborts If a transaction commits, all the reads and writes appear to have executed atomically If a transaction aborts, none of its operations take effect Transaction operations aren't visible until they commit (if they do) Parallel computing day, Ben-Gurion University, October 20, 2009
3
Transactional Memory Implementations Hardware Transactional Memory Transactional Memory [Herlihy & Moss, '93] Transactional Memory Coherence and Consistency [Hammond et al., '04] Unbounded transactional memory [Ananian, Asanovic, Kuszmaul, Leiserson, Lie, '05] … Software Transactional Memory Software Transactional Memory [Shavit &Touitou, '97] DSTM [Herlihy, Luchangco, Moir, Scherer, '03] RSTM [Marathe et al., '06] WSTM [Harris & Fraser, '03], OSTM [Fraser, '04], ASTM [Marathe, Scherer, Scott, '05], SXM [Herlihy] … Parallel computing day, Ben-Gurion University, October 20, 2009
4
“Conventional” STM system high-level structure TM system OS-scheduler-controlled threads Contention Manager Contention Detection arbitrate proceed Abort/retry, wait Parallel computing day, Ben-Gurion University, October 20, 2009
5
Talk outline Preliminaries Memory Transactions Scheduling: Rationale CAR-STM Adaptive TM Schedulers TM-scheduling OS support Parallel computing day, Ben-Gurion University, October 20, 2009
6
TM-ignorant schedulers are problematic! 1)Does not permit serializing contention management and collision avoidance. 2)Makes it difficult to dynamically reduce concurrency level. 3)Hurts TM performance stability/predictability. TM-ignorant scheduling: Parallel computing day, Ben-Gurion University, October 20, 2009
7
Enter TM schedulers “Adaptive transaction scheduling for transactional memory systems” [Yoo & Lee, SPAA'08] “CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory” [Dolev, Hendler & Suissa, PODC '08] “Steal-on-abort: dynamic transaction reordering to reduce conflicts in transactional memory” [Ansari et al., HiPEAC'09] “Preventing versus curing: avoiding conflicts in transactional memories” [Dragojevic, Guerraoui, Singh & Singh, PODC'09] “Transactional scheduling for read-dominated workloads” [Attiya & Milani, OPODIS'09] “On the impact of Serializing Contention Management on STM performance” [Heber, Hendler & Suissa, OPODIS '09, to appear] “Scheduling support for transactional memory contention management” [Fedorova, Felber, Hendler, Lawall, Maldonado, Marlier Muller & Suissa, PPoPP'10]
8
Parallel computing day, Ben-Gurion University, October 20, 2009 Our work “CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory” [Dolev, Hendler & Suissa, PODC '08] “On the impact of Serializing Contention Management on STM performance” [Heber, Hendler & Suissa, OPODIS '09] “Scheduling support for transactional memory contention management” [Fedorova, Felber, Hendler, Lawall, Maldonado, Marlier Muller & Suissa, PPoPP'10]
9
CAR-STM (Collision Avoidance and Reduction for STM) Design Goals Parallel computing day, Ben-Gurion University, October 20, 2009 Limit Parallelism to a single transaction per core (or hardware thread) Serialize conflicting transactions Contention avoidance
10
CAR-STM high-level architecture Transaction queue #1 TQ thread Transaction thread T-Info Core #1 Serializing contention mgr. Dispatcher Collision Avoider Core #k Transaction queue #k Parallel computing day, Ben-Gurion University, October 20, 2009
11
TQ-Entry Structure Transaction queue #1 TQ thread Transaction thread T-Info Core #1 Serializing contention mgr. Dispatcher Collision Avoider Core #k Transaction queue #k wrapper method Transaction data T-Info Trans. thread Lock, condition var Parallel computing day, Ben-Gurion University, October 20, 2009
12
Transaction dispatching process Call Dispatcher with a T-Info pointer argument 1 Call app-specific conflict probability method 3 Dispatcher calls Collision Avoider 2 Enque transaction in most-conflicting queue. Put thread to sleep, notify TQ thread. 4 4 Parallel computing day, Ben-Gurion University, October 20, 2009
13
Transaction execution TQ thread Core #i Transaction queue #i wrapper method Transaction data T-Info Trans. thread Lock, condition var TQ thread executes transaction 1 TQ thread wakes-up transaction thread 2 TQ thread dequeues entry 3 Parallel computing day, Ben-Gurion University, October 20, 2009
14
Dispatcher / TQ-thread synchronization TQ thread Core #i Transaction queue #i Dispatcher When TQ is emptied, TQ thread goes to sleep 1 When dispatcher adds a transaction, it wakes-up TQ thread 2 Parallel computing day, Ben-Gurion University, October 20, 2009
15
Serializing Contention Managers When two transactions collide, fail the newer transaction and move it to the TQ of the older transaction Fast elimination of live-lock scenarios Two SCMs implemented o Basic (BSCM) – move failed transaction to end of the other transactions' TQ o Permanent (PSCM) – Make the failed transaction a subordinate-transaction of the other transaction Parallel computing day, Ben-Gurion University, October 20, 2009
16
PSCM TaTa Transaction queue #1 TQ thread Core #1 PSCM TbTb Transaction queue #k TQ thread Core #k TcTc TdTd TeTe Transactions a and b collide, b is older Parallel computing day, Ben-Gurion University, October 20, 2009
17
PSCM Transaction queue #1 TQ thread Core #1 PSCM TbTb Transaction queue #k TQ thread Core #k TaTa TcTc TdTd TeTe Losing transaction and its subordinates are made subordinates of winning transaction TaTa TcTc Parallel computing day, Ben-Gurion University, October 20, 2009
18
Execution time: STMBench7 R/W dominated workloads
19
Throughput: STMBench7 R/W dominated workloads
20
CAR-STM Shortcomings May restrict parallelism too much At most a single transactional thread per core/hardware- thread Transitive serialization High overhead Non-adaptive
21
Talk outline Parallel computing day, Ben-Gurion University, October 20, 2009 Preliminaries Memory Transactions Scheduling: Rationale CAR-STM Adaptive TM Scheduling TM-scheduling OS support
22
“On the impact of Serializing Contention Management on STM performance” CBench – synthetic benchmark generating workloads with pre-determined length and abort probability. A low-overhead serialization mechanism Better understanding of adaptive serialization algorithms Parallel computing day, Ben-Gurion University, October 20, 2009
23
A Low Overhead Serialization Mechanism (LO-SER) Transactional threads Condition variables
24
Parallel computing day, Ben-Gurion University, October 20, 2009 A Low Overhead Serialization Mechanism (cont'd) 1) t Identifies a collision 2) t calls contention manager: ABORT_OTHER 3) t change status of t' to ABORT (writes that t is winner) tt' 4) t' identifies it was aborted
25
Parallel computing day, Ben-Gurion University, October 20, 2009 A Low Overhead Serialization Mechanism (cont'd) t t' 5) t' rolls back transaction and goes to sleep on the condition variable of t 6) Eventually t commits and broadcasts on its condition variable…
26
Parallel computing day, Ben-Gurion University, October 20, 2009 A Low Overhead Serialization Mechanism (cont'd) tt'
27
Parallel computing day, Ben-Gurion University, October 20, 2009 Requirements for serialization mechanism Commit broadcasts only if transaction won a collision since last broadcast (or start of transaction) No waiting cycles (deadlock-freedom) Avoid race conditions
28
Parallel computing day, Ben-Gurion University, October 20, 2009 LO-SER algorithm: data structures
29
Parallel computing day, Ben-Gurion University, October 20, 2009 LO-SER algorithm: pseudo-code
30
Parallel computing day, Ben-Gurion University, October 20, 2009 LO-SER algorithm: pseudo-code (cont'd)
31
Parallel computing day, Ben-Gurion University, October 20, 2009 LO-SER algorithm: pseudo-code (cont'd)
32
Parallel computing day, Ben-Gurion University, October 20, 2009 Adaptive algorithms Collect (local or global) statistics on contention level. Apply serialization only when contention is high. Otherwise, apply a “conventional” contention-management algorithm. We find that Stabilized adaptive algorithms perform better. First adaptive TM scheduler: “Adaptive transaction scheduling for transactional memory systems” [Yoo & Lee, SPAA'08]
33
CBench Evaluation CAR-STM incurs high overhead as compared with other algorithms Always serializing is bad in medium contention Always serializing is best in high contention Always serializing incurs no overhead in the lack of contention
34
CBench Evaluation Adaptive serialization fares well for all contention levels
35
CBench Evaluation Conventional CM performance degrades for high contention
36
Parallel computing day, Ben-Gurion University, October 20, 2009 CBench Evaluation (cont'd) CAR-STM has best efficiency but worst throughput
37
RandomGraph Evaluation Stabilized algorithm improves throughput by up to 30% Throughput and efficiency of conventional algorithms are bad
38
Preliminaries Memory Transactions Scheduling: Rationale CAR-STM Adaptive TM Schedulers TM-scheduling OS support Parallel computing day, Ben-Gurion University, October 20, 2009 Talk outline
39
Parallel computing day, Ben-Gurion University, October 20, 2009 “Scheduling Support for Transactional Memory Contention Management” Implement CM scheduling support in the kernel scheduler (Linux & OpenSolaris) (Strict) serialization Soft serialization Time-slice extension Different mechanisms for communication between user- level STM library and kernel scheduler
40
Parallel computing day, Ben-Gurion University, October 20, 2009 TM Library / Kernel Communication via Shared Memory Segment (Ser-k) User code notifies kernel on events such as: transaction start, commit and abort (in which case thread yields) Kernel code handles moving thread between ready and blocked queues
41
Parallel computing day, Ben-Gurion University, October 20, 2009 Soft Serialization Instead of blocking, reduce loser thread priority and yield Efficient in scenarios where loser transactions may take a different execution path when retrying (non-determinism) Priority should be restored upon commit or when conflicting transactions terminate
42
Parallel computing day, Ben-Gurion University, October 20, 2009 Time-slice extention Preemption in the midst of a transaction increases conflict “window of vulnerability” Defer preemption of transactional threads avoid CPU monopolization by bounding number of extensions and yielding after commit May be combined with serialization/soft serialization
43
Evaluation (STMBench7, 16 core machine) Conventional CM deteriorates when threads>cores Serializing by local spinning is efficient as long as threads ≤ cores
44
Evaluation - STMBench7 throughput Serializing by sleeping on condition var is best when threads>cores, since system call overhead is negligible (long transactions)
45
Evaluation - STMBench7 aborts data
46
Evaluation (STAMP applications)
47
Conclusions Scheduling-based CM results in Improved throughput in high contention Improved efficiency in all contention levels LO-SER-based serialization incurs no visible overhead Lightweight kernel support can improve performance and efficiency Dynamically selecting best CM algorithm for workload at hand is a challenging research direction Parallel computing day, Ben-Gurion University, October 20, 2009
48
Thank you. Any questions? Parallel computing day, Ben-Gurion University, October 20, 2009
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.