Download presentation
Presentation is loading. Please wait.
Published byEliseo Boxer Modified over 10 years ago
1
Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University
2
P 1 $ $ P n Global Clock Transactional Memory Software transactional memory (STM) exploits a global clock to validate transactional data Pros: reduces validation overhead Cons: contention Alternate: Read Write Lock Allocation (RWLA) Pros: no central clock Cons: overhead if a TX aborts Speculative RWLA: changes validation policy dynamically → Speedup: up to 66% 2
3
Outline Background RWLA Speculative RWLA Conclusion 3
4
4 Counter in STM T1T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END();
5
Transactional data are validated using: Global clock Shared variable Timestamp for transactions Lock Memory is mapped to Lock Table Each entry of the table: Version # … … 5 Validation in STM Global Clock Memory Lock Table Version #
6
6 Updating Global Clock & Lock Increment Global Clock Version # = global_clock Global Clock Memory Lock Table Version # … … counter
7
7 Validation in STM rv (read version) is set to global_clock T1T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Metadata for TX 1 rv Global Clock
8
8 Successful Read Validation rv >= version# The most recent write to counter, occurred before TM_BEGIN() T1T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Metadata for TX 1 Global Clock rv
9
9 Failed Read Validation rv < version# The most recent write to counter, occurred after TM_BEGIN() T1T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Metadata for TX 1 Global Clock rv
10
Overhead of Validation This method, called GV4, results in many cache coherence misses if transactions commit frequently 10 P 1 $ $ P n Global Clock
11
Outline Background RWLA Speculative RWLA Conclusion 11
12
Lock Memory is mapped to Lock Table Each entry of the table: Lock bit Read bits Read Write Lock Allocation (RWLA) 12 Lock Table … … Memory P0P0 P1P1 …P n-1 lock bit Read bits
13
13 TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); 000000…..
14
14 TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Set read bit in the corresponding lock entry Yes TM_READ() Lock bit is free? 000000….. 1 lock bit
15
15 TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Abort No 100000….. Set read bit in the corresponding lock entry Yes TM_READ() Lock bit is free?
16
16 TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Abort TM_WRITE All read bits are clear? No 000100…..
17
17 TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Abort TM_WRITE Acquire lock failed All read bits are clear? No Yes 100000…..
18
18 TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); 00000….. Abort TM_WRITE Acquire lock failed All read bits are clear? No Yes 1 0
19
Experimental Framework Benchmarks: Stamp v0.9.7 Run up to competition Measured statistics over 10 runs TL2 as an STM framework Two Intel Xeon E5660, 6-way CMP 19
20
Performance of RWLA 20 better
21
Speculative RWLA Conflict occurs frequently → select GV4 Conflict occurs rarely → select RWLA How to predict conflict? 21
22
Contention Predictor Prediction : y≥0 →predict commit y<0 →predict abort Update If outcome of current TX and TX i agree/disagree →increment/decrement w i 22 1 X1X1 … XnXn y w1w1 w0w0 wnwn x i : global transaction history, bipolar value w i : weight vector
23
Performance of Speculative RWLA # of threads changes between 2 and 16 On average, performance changes from 21% in Bayes to 47% in Labyrinth 23 better
24
Conclusion RWLA to overcome contentions over global clok Applications react differently to GV4 and RWLA Speculative RWLA changes validation policy dynamically Speculative RWLA performance of STMs up to 66% 24
25
25 Thank You! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.