Download presentation
Presentation is loading. Please wait.
Published byAlban Mitchell Modified over 6 years ago
1
Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory
Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam Maurice Herlihy
2
Multicore Performance Scaling
2
3
Hardware Transactional Memory (HTM)
IBM’s Blue Gene/Q & System Z & Power Architecture Intel’s Haswell TSX: RTM & HLE Low overhead (cache based) 3
4
Haswell RTM if (_xbegin() == _XBEGIN_STARTED) Speculate Execution
_xend() else Abort Handler Speculate Execution, without any locks Read and Write Sets Abort on memory conflict 4
5
Haswell RTM ABORT COMMIT if (_xbegin() == _XBEGIN_STARTED)
Speculate Execution _xend() _xbegin() _xbegin() Add to Write Set Write X Read X Add to Read Set ABORT Add to Write Set Write Y Write Y Add to Write Set _xend() COMMIT _xend() Make the change to Y visible 5
6
Lock Elision <HLE_Aquire_Prefix> Lock(L)
Atomic region executed as a transaction or mutually exclusive on L <HLE_Release_Prefix> Release(L) Execute optimistically, without any locks Track Read and Write Sets Abort on memory conflict: rollback acquire lock 6
7
7 [Anand Tech]
8
Haswell RTM Needs software fallback Small & Medium Transactions
Best-effort Conflicts Overflow Unsupported Instructions Interrupts Needs software fallback 8
9
Overview Best-effort Hardware Transactional Memory Description
Lazy SGL Correctness Results Bloom Filter SGL 9
10
Transactional_Read(Lock) If Lock is taken ABORT
Single Global Lock HyTM (simple and common) Begin HW txn Read L Try_SPEC: Wait until Lock is free Transactional_Read(Lock) If Lock is taken ABORT Speculate critical section End speculation End HW txn Begin SW txn Acquire L On_ABORT: If try_lock(Lock) Critical section Release(Lock) Else Try_SPEC Does not abort! Release L End SW txn 10
11
Single Global Lock HyTM (simple and common)
Begin HW txn Read L Begin HW txn Read L Begin SW txn Acquire L Begin HW txn Read L X X X Begin HW txn Read L End HW txn (1) Time End HW txn (2) X Release L End SW txn End HW txn (3) End HW txn (4) Legend: X = ABORT 11
12
Thread 1 Thread 2 Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION
End_HW_TXN (L) End_HW_TXN (L) Acquire(L) CRITICAL SECTION (SW TXN) Time Release(L) Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION CRITICAL SECTION End_HW_TXN (L) End_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION End_HW_TXN (L) Execution Time 1 12
13
Thread 1 Thread 2 Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION
End_HW_TXN (L) End_HW_TXN (L) Acquire(L) CRITICAL SECTION (SW TXN) Begin_HW_TXN (L) CRITICAL SECTION Time Release(L) End_HW_TXN (L) Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION CRITICAL SECTION End_HW_TXN (L) End_HW_TXN (L) Execution Time 2 Execution Time 1 13
14
Speculate critical section Transactional_Read(Lock)
Lazy SGL Begin HW txn Try_SPEC: Speculate critical section Transactional_Read(Lock) If Lock is taken ABORT End speculation Read L End HW txn Begin SW txn Acquire L On_ABORT: If try_lock(Lock) Critical section Release(Lock) Else Try_SPEC Does not abort! Release L End SW txn 14 14
15
Lazy SGL Begin Begin HW txn HW txn Begin SW txn Acquire L Begin HW txn
Read L End HW txn (1) X Time Read L End HW txn (2) Release L End SW txn Read L End HW txn (3) Read L End HW txn (4) Legend: X = ABORT COMMIT COMMIT 15
16
Overview Best-effort Hardware Transactional Memory Description
Lazy SGL Correctness Results Bloom Filter SGL 16
17
Transactional Memory Correctness
SW Order T2 BEFORE T1 Transaction 2 HW Time COMMIT Order T2 AFTER T1 COMMIT 17
18
Thread 1 (SW) Thread 2 (HW) ABORT Case 1: HW begins SW begins HW ends
SW ends Thread 1 (SW) Thread 2 (HW) TXN_BEGIN … X = b TXN_END Acquire Lock … X = a Release Lock X value: Time a b Correct: a Actual: a Correct: a Actual: b Check Lock ABORT 18
19
Thread 1 (SW) Thread 2 (HW) ABORT Case 2: SW begins HW begins HW ends
SW ends Thread 1 (SW) Thread 2 (HW) Acquire Lock … X = a Release Lock TXN_BEGIN … X = b TXN_END X value: Time Correct: a Actual: a Correct: a Actual: b Check Lock ABORT 19
20
Thread 1 (SW) Thread 2 (HW) COMMIT Case 3: SW begins HW begins SW ends
HW ends Thread 1 (SW) Thread 2 (HW) Acquire Lock … X = a Release Lock TXN_BEGIN … X = b TXN_END X value: Time b a Check Lock Correct: b Actual: b COMMIT 20
21
Thread 1 (SW) Thread 2 (HW) COMMIT Case 4: HW begins SW begins SW ends
HW ends Thread 1 (SW) Thread 2 (HW) TXN_BEGIN … X = b TXN_END Acquire Lock … X = a Release Lock X value: Time Correct: b Actual: b Check Lock COMMIT 21
22
Hardware Sandboxing Thread 1 (SW) Thread 2 (HW) X = 5; Y = 6 TXN_BEGIN
Acquire Lock … ++X ++Y Release Lock TXN_BEGIN … Z = 1/(Y-X) TXN_END Z = 1/0 !!! Time 22
23
Indirect Jumps Thread 1 (SW) Thread 2 (HW) _xbegin X = 5; Y = 6
Acquire Lock … ++X ++Y Release Lock _xbegin … if (X == Y) *p = garbage p() if (lock) abort _xend _xend Indirect jump to garbage location Time 23
24
Overview Best-effort Hardware Transactional Memory Description
Lazy SGL Correctness Results Bloom Filter SGL 24
25
Intruder (medium txns) Better
25
26
Improved Lock Acquisition Rate
Vacation Low (medium txns) Intruder (medium txns) Better Kmeans High (small txns) Labyrinth (large txns) 26
27
No single thread overhead
Slowdown relative to sequential for 1 thread 27
28
Overview Best-effort Hardware Transactional Memory Description
Lazy SGL Correctness Results Bloom Filter SGL 28
29
Bloom Filters Efficient probabilistic data structure to compute fast set intersection Can admit false positives No false negatives Used in TM for Conflict Detection 29
30
Lazy SGL Begin Begin HW txn HW txn Begin SW txn Acquire L Begin HW txn
Read L End HW txn (1) X Time Read L End HW txn (2) Release L End SW txn Read L End HW txn (3) Read L End HW txn (4) Legend: X = ABORT COMMIT COMMIT 30
31
BF SGL Begin Begin HW txn HW txn Begin SW txn Acquire L Begin HW txn
Check BF End HW txn (1) Time Check BF End HW txn (2) Release L End SW txn Read L End HW txn (3) Read L End HW txn (4) Legend: X = ABORT COMMIT COMMIT 31
32
If BFs intersect: ABORT Else: COMMIT
Case 1: HW begins SW begins HW ends SW ends Thread 1 (SW) Thread 2 (HW) TXN_BEGIN … X = b TXN_END Acquire Lock … X = a Release Lock X value: Time b a Correct: a Actual: a Correct: a Actual: b Check BF Check Lock ABORT If BFs intersect: ABORT Else: COMMIT 32
33
If BFs intersect: ABORT Else: COMMIT
Case 2: SW begins HW begins HW ends SW ends Thread 1 (SW) Thread 2 (HW) Acquire Lock … X = a Release Lock TXN_BEGIN … X = b TXN_END X value: Time Correct: a Actual: a Correct: a Actual: b Check BF Check Lock ABORT If BFs intersect: ABORT Else: COMMIT 33
34
Conclusions HTMs are becoming more available
Best-effort – need software fallback Eager SGL simple and fast fallback, often preferred to more efficient solutions 34
35
Conclusions Lazy SGL as simple as Eager SGL more efficient
Bloom Filter SGL more accurate conflict detection Slower Can be implemented directly in hardware 35
37
References
38
Medium transactions 38
39
Small transactions 39
40
Large transactions 40
41
41
42
Software transaction ordered before hardware transaction -> CORRECT
Software Hardware (1) Read(x) Not a conflict (2) Write(x) Software transaction ordered before hardware transaction -> CORRECT (3) Hardware abort (4) (5) (6) (7) Table 1: Potential conflicts between software and hardware transactions 42
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.