Download presentation
Presentation is loading. Please wait.
Published byMyles Patrick Modified over 9 years ago
1
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo
2
Previous TM systems abort fast, commit slow ◦ Old values “in place” ◦ New values somewhere else Commit is the common case! ◦ Remember Amdahl’s Law Conflicts usually solved by hardware ◦ Fast but myopic ◦ Trapping to SW if needed for careful resolution
3
Version Management LazyEager Conflict Lazy OCC DBMSs TCC none Eager LTM VTM CCC DBMSs UTM LogTM
4
Eager version management ◦ Puts new values in place for faster commits ◦ No data moves even on cache overflow Eager conflict detection ◦ Detects offending ld/st immediately ◦ Fast conflict detection on evicted blocks ◦ Fast commit by lazy reset of directory state Handle aborts by SW ◦ Aborts are much less common than commits
5
Per-thread log in cacheable virtual memory ◦ On st. logs address and previous contents of block Write bit ◦ Tracks if a block has been stored and logged Faster commits ◦ Clear W bits and reset log (pointer) Slower aborts ◦ Also has to write old values back
6
1 2 - - - - - - - - - - - - - - 2 3 3 4 - - - - - - - Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1000 1 00 00 00
7
1 2 - - - - - - - - - - - - - - 2 3 3 4 - - - - - - - Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1000 1 10 00 00
8
1 2 - - - - - - - - - - - - - - 2 3 5 6 - - - - - - - c 0 3 4 - - - - - - - Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1048 10001 10 00 01
9
1 2 - - - - - - - - - - - - - - 2 4 5 6 - - - - - - - c 0 3 4 - - - - - - - 4 0 - - - - - - - 2 3 Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1090 10001 10 11 01
10
1 2 - - - - - - - - - - - - - - 2 4 5 6 - - - - - - - c 0 3 4 - - - - - - - 4 0 - - - - - - - 2 3 Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1000 0 00 00 00
11
1 2 - - - - - - - - - - - - - - 2 3 3 4 - - - - - - - c 0 3 4 - - - - - - - 4 0 - - - - - - - 2 3 Virtual Address Data Block R W LogBase LogPtr 00 40 c0 1000 1040 1080 1000 0 00 00 00
12
Coherence requests sent to directory Directory will forward to other processor(s) Processors will detect conflict ◦ Using local state ◦ Ack/Nack as response ◦ Requester resolves any conflict Adds read bit to each cache block Extends MOESI protocol ◦ “Sticky” states
13
Works even after cache overflow ◦ Forward to conflicting requests to “interested” processors Adds a per processor overflow bit ◦ The transactional block can be updated ◦ Requests will still be redirected to the processor ◦ Processor can Nack on conflict
14
Depends on MOESI state M: Replace with transactional writeback ◦ Sets state as “Sticky@Processor” ◦ Requests are forwarded to the processor S: Silently replaced, ◦ Adds processor to sharer list ◦ Requests forwarded to all sharers O: Write back to directory ◦ Add itself to sharer list, same as S if requested exclusively E: Same as O
15
Directory Idle [old] P TMcount: 1 Overflow: 0 I (--) [none]
16
Directory M@P [old] P TMcount: 1 Overflow: 0 M (R W) [new] GETX DATA ACK
17
Directory M@P [old] P TMcount: 1 Overflow: 0 M (R W) [new] Q TMcount: 1 Overflow: 0 I (- -) [ ] Fwd_GETS NACK GETS NACK
18
Directory M@P[new ] P TMcount: 1 Overflow: 1 I (- -) [ ] PUTX NACK WB_XACT
19
Directory M@P[new ] P TMcount: 1 Overflow: 1 I (- -) [ ] GETS Fwd_GETS NACK Q TMcount: 1 Overflow: 0 I (- -) [ ] NACK
20
Directory E@Q[new] P TMcount: 0 Overflow: 0 I (- -) [ ] GETS Fwd_GETS ACK Q TMcount: 1 Overflow: 0 E (R -) [new] DATA CLEAN
21
Lazy clean up better if overflow is rare ◦ Can be improved otherwise (i.e. use Bloom filters) Ambiguities handled conservatively ◦ Refetch during same against earlier transaction ◦ Set R&W bits ◦ Log old values
23
When two transactions conflict ◦ At least one must stall or abort ◦ Quick myopic decision by HW ◦ Slow and careful by SW Hybrid approach: ◦ HW seeks fast solution, traps to software if problem persists
24
Distributed timestamp Trap to conflict handler (SW) ◦ Transaction could cause deadlock ◦ Logically later than transaction in conflict Per processor possible cycle flag ◦ Conflict if nack received from a logically earlier transaction with possible cycle flag set
25
Target System ◦ SPARC Solaris 32 Processors 1Ghz ◦ L1: 16KB 4-way split, 1 cycle latency ◦ L2: 4 MB 4-way unified, 12-cycle latency ◦ Memory: 4GB 80-cycle latency ◦ Directory: Full-bit vector sharer list, migratory sharing optimization, directory cache, 6-cycle latency ◦ Interconnection: Hierarchical switch topology, 14- cycle link latency Simulated using Simics ◦ LogTM interface added by “magic” instructions
26
Shared counter micro-benchmark Compared to ◦ Exponential Backoff ◦ MCS locks LogTM outperforms them LogTM does not abort transactions
27
Evaluated using a subset of SPLASH-2 Used two versions of raytrace (with/without false sharing) False sharing has significant impact! Performance gains from moderate to large
28
LogTM must read a block before writing it to the log ◦ Benchmarks showed that data is usually read anyway LogTM is more sensitive to false sharing than lock approaches Since the log is required to be valid only until an abort ◦ A k-block log write buffer reduces most writes as shown in the benchmarks.
29
TCC ◦ Lazy version management (slow commits) ◦ Lazy conflict detection (detect on commit) LTM ◦ On overflow stores new values in uncacheable in- memory hash table ◦ LogTM allows both old and new versions cached
30
UTM ◦ Logs blocks targeted by both loads and stores ◦ More complete conflict detection ◦ Must walk log on certain coherence requests VTM ◦ Per address space virtual mode for cache evictions, paging, context switches ◦ Virtualized VTM uses micro-code for conflict detection. (LogTM uses MOESI extension)
31
Presents a TM implementation designed to speed up the common case Efficiently handles cache evictions Requires simple architectural changes ◦ Registers, state, directory extension Work towards hybrid conflict detection No paging or context switch support Very sensitive to false sharing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.