Download presentation
Presentation is loading. Please wait.
Published byNatasha Anthony Modified over 6 years ago
1
TokenTM: Token-Based Hardware Transactional Memory
Jayaram Bobba, Neelam Goyal, Mark D. Hill, Michael M. Swift, and David A. Wood Multifacet Project ( Dept. of Computer Sciences University of Wisconsin-Madison
2
LogTM: Log-based Transactional Memory
2/17/2019 Executive Summary Current Hardware TMs Most Transactions Small & Short Running Penalize large/long transactions Too restrictive for wide-spread TM use? Hypothesis Must Support Efficient Large/Long Transactions As Well Is such an HTM even possible? Yes! TokenTM 1. LogTM’s Log to buffer unbounded values 2. Transactional Tokens for unbounded conflict detection Conflict state in memory metabits Concurrent updates via metastate fission/fusion 2/17/2019 Wisconsin Multifacet Project UW-Madison Architecture Seminar
3
© 2008 Multifacet Project University of Wisconsin-Madison
Existing HTM Systems Assumption: Most transactions small & short running Optimized for small transactions Degrade with large, long running transactions Non-localized Overhead, E.g., LogTM-SE [Yen07] false conflicts OneTM [Blundel07] serializes Complex, Expensive Operations, E.g., XTM [Chung06]& PTM [Chuang06] manipulate page tables Premature Optimization? © 2008 Multifacet Project University of Wisconsin-Madison
4
Why Large Transactions?
LogTM: Log-based Transactional Memory 2/17/2019 Why Large Transactions? Programmers may want large (>>cache) and/or long (>> ctx switch) transactions HLL transactions invoke unpredictable lower-level code Replace critical sections containing syscalls or I/O Avoid concurrency bugs [Lu08] But “Most transactions small & short running” Restrict TM to use by gurus (like OS spin locks)? Self fulfilling prophesy? Must Support Efficient Large/Long Transactions As Well © 2008 Multifacet Project University of Wisconsin-Madison UW-Madison Architecture Seminar
5
Toward a Large-Transaction TM
Efficiently detect conflicts between in-flight transactions using Read/Write Sets Unbounded Globally accessible Fast read/write set ops. E.g., Add to read set Clear read set Small Transactions: Low Overhead Large Transactions: Localized Overhead Accessible read/write set (potentially unbounded) N Minimal Changes to Coherence / VM Heavyweight eviction ops Negative acks Additional page tables O © 2008 Multifacet Project University of Wisconsin-Madison
6
Existing Mechanisms × ×
Synergy between cache coherence and conflict detection Hence, overload cache coherence + Excellent for bounded/small TM But, - ‘Virtualization’ on overflows - Tough to access ‘virtualized’ state Small Transactions: Low Overhead Minimal Changes to Coherence / VM × Large Transactions: Localized Overhead × © 2008 Multifacet Project University of Wisconsin-Madison
7
TokenTM: a Large-Transaction TM
New Conflict Detection Mechanism Transactional Tokens in Tagged Memory Token Coherence [Martin03] at different level Version Management Save old/new values for unbounded Write set LogTM [Moore06] undo log This Talk © 2008 Multifacet Project University of Wisconsin-Madison
8
© 2008 Multifacet Project University of Wisconsin-Madison
Outline Motivation Design Token-Based Conflict Detection Metadata Storage Implementation Results © 2008 Multifacet Project University of Wisconsin-Madison
9
© 2008 Multifacet Project University of Wisconsin-Madison
Transactional Tokens Challenge: How to efficiently track Read/Write sets? Token Coherence [Martin03] Read/Write sets for cache coherence Solution: Transactional Tokens T tokens per memory block At least one token to read, All T tokens to write (token conflict detection) Token Metadata <c0,c1,…,ci,…> where 0≤ci≤T is count of tokens held by thread with TID i. © 2008 Multifacet Project University of Wisconsin-Madison
10
© 2008 Multifacet Project University of Wisconsin-Madison
Tagged Memory Challenge: Where to store Unbounded, Globally Accessible Token Metadata? Virtual Memory unbounded and globally accessible Solution, similar to OneTM [Blundel07] Tag Virtual Memory Piggyback on existing Virtual Memory and Cache Coherence mechanisms © 2008 Multifacet Project University of Wisconsin-Madison
11
TokenTM Logical Operation
Thread X Thread Y PC BEGIN_XACT Undo Log Undo Log PC BEGIN_XACT Load A Load A Store B ABORT Store A COMMIT_XACT COMMIT_XACT Shared Memory Block Data A 0x..00.. B C 0x..10.. Metadata <cx, cy, …> <0,0,…> Insufficient tokens <0,0,…> <1,1,…> <1,0,…> B: 0x..00.. 0x..11.. 0x..00.. <0,0,…> <T,0,…> © 2008 Multifacet Project University of Wisconsin-Madison
12
© 2008 Multifacet Project University of Wisconsin-Madison
Storing Metadata Unbounded Difficult to access globally Thread X Thread Y PC PC BEGIN_XACT BEGIN_XACT Undo Log Token log Undo Log Token log Load A Load A Cx CY Store B Store A COMMIT_XACT COMMIT_XACT Software Tagged Memory Hardware Block Data A 0x..00.. B C 0x..10.. Metadata <cx, cy, …> <0,0,…> Metastate (Sum, TID) (0, -) Concise Accessible Lossy Summary © 2008 Multifacet Project University of Wisconsin-Madison
13
© 2008 Multifacet Project University of Wisconsin-Madison
Hardware Metastate Metadata summary (sum, TID) sum, total number of tokens acquired TID, identify owner when sum = 1 or sum = T (optional) Some summaries, Concise -> Stored in packed field (e.g., State[1:2] , Attr[3:16]) Fast -> Accessed as part of normal memory operation <c0, c1, …, ci, …> (sum, TID) <0, 0, 0, 0> (0, -) <0, 0, 1, 0> (1, 2) <0, T, 0, 0> (T, 2) <0, 1, 1, 1> (3, -) © 2008 Multifacet Project University of Wisconsin-Madison
14
© 2008 Multifacet Project University of Wisconsin-Madison
Token Logs Distributed structures for unbounded Read/Write sets per-thread stored in program memory (e.g., heap) list of <address, num_tokens> Accessible to hardware for fast ops Add to read set -> Append to token log Token log A: 1 B: T © 2008 Multifacet Project University of Wisconsin-Madison
15
Double-entry Bookkeeping (Keeping Metadata Consistent)
Thread X Thread Y PC PC BEGIN_XACT Token log Token log BEGIN_XACT Logical Token State Load A Load A Store B A: 1 A: 1 Store A Metadata <cx, cy, …> COMMIT_XACT COMMIT_XACT Software <1,0,…> <1,1,…> <0,0,…> Hardware <0,0,…> Block Metastate (Sum, TID) A B C <0,0,…> (2, -) (0, -) (1, X) (0, -) (0, -) © 2008 Multifacet Project University of Wisconsin-Madison
16
© 2008 Multifacet Project University of Wisconsin-Madison
Outline Motivation Design Implementation Metastate Fission/Fusion Results © 2008 Multifacet Project University of Wisconsin-Madison
17
Implementing Hardware Metastate
Thread X Thread Y BEGIN_XACT Token log Token log BEGIN_XACT Load A Load A Store B A: 1 Store A COMMIT_XACT COMMIT_XACT Software Load A Load A Coherence State Coherence State Hardware Tag Data Tag Data Sum TID Sum TID Private Caches A 1 X - Modified Exclusive Owned 0x..00.. 0x..00.. 1, X A Shared 0x..00.. 1 X Data A DATA A Fwd_GETS A Metastate (Sum, TID) (0,0) GETS A GETS A Upgrade A Block Directory Data Sum TID 0, - Main Memory A P1 Not Present P1,P2 0x..00.. 0x..00.. - Shared copies cannot update metastate Solution: Fission / Fusion © 2008 Multifacet Project University of Wisconsin-Madison
18
© 2008 Multifacet Project University of Wisconsin-Madison
Metastate Fission Thread X Thread Y BEGIN_XACT Token log Token log BEGIN_XACT Load A Load A A: 1 A: 1 Store B Store A COMMIT_XACT COMMIT_XACT Software Hardware 1,X fission Load A Coherence State Coherence State Tag Data Sum TID Tag Data Sum TID Private Caches 1,X 0,- A Owned Modified 0x..00.. 1 X A Shared 0x..00.. 1 - Y 0x..00.. Data A GETS A Fwd_GETS A Block Directory Data Sum TID Main Memory A P1,P2 P1 0x..00.. © 2008 Multifacet Project University of Wisconsin-Madison
19
© 2008 Multifacet Project University of Wisconsin-Madison
Metastate Fusion Metastate Fusion On store, metastate copies fused back Why does fission/fusion work? Store sees ‘complete’ metastate Load sees ‘complete’ metastate, if writer exists ‘partial’ metastate, otherwise © 2008 Multifacet Project University of Wisconsin-Madison
20
© 2008 Multifacet Project University of Wisconsin-Madison
Hardware Cost Additional metabits in caches/memory Recoded ECC to cull metabits Changes to coherence protocols Additional payload on messages Minimal changes to protocol logic Requires non-silent eviction © 2008 Multifacet Project University of Wisconsin-Madison
21
© 2008 Multifacet Project University of Wisconsin-Madison
Outline Motivation Design Implementation Results Do we meet the two performance goals? Small Transactions: Low Overhead Large Transactions: Localized Overhead © 2008 Multifacet Project University of Wisconsin-Madison
22
Evaluation Methodology
Full System Simulation Multifacet GEMS Base System 32-core CMP system, in-order, single-issue cores Private 4-way 32KB writeback split I&D L1 caches Shared 8-way 8 MB writeback L2 On-chip L2, MESI coherence Packet-switched interconnect in a tiled topology © 2008 Multifacet Project University of Wisconsin-Madison
23
© 2008 Multifacet Project University of Wisconsin-Madison
TM Systems LogTM-SE [Yen07] variant Parallel Bloom Filters for conflict detection 4 2Kbit H3 filters + Compact, less hardware overhead - False Conflicts LogTM-SE_Perfect + No False Conflicts - Unimplementable TokenTM © 2008 Multifacet Project University of Wisconsin-Madison
24
© 2008 Multifacet Project University of Wisconsin-Madison
Results Large Transactions: Localized Overhead Minor degradation with large transactions Comparable on small transactions Small Transactions: Low Overhead © 2008 Multifacet Project University of Wisconsin-Madison
25
TokenTM Conflict Detection
Large Transactions: Localized Overhead Accessible read/write set (potentially unbounded) Fast read/write set ops. E.g., Add to read set Clear read set Small Transactions: Low Overhead N Minimal Changes to Coherence / VM Heavyweight eviction ops Negative acks Additional page tables O © 2008 Multifacet Project University of Wisconsin-Madison
26
© 2008 Multifacet Project University of Wisconsin-Madison
In the paper… Fast Token Release TM ‘virtualization’ events Context Switches, Paging etc. System V shared memory Long Running Critical Sections in server workloads Fission/Fusion useful for other TM systems USTM [Baugh08], set Fault-on-Write UFO bit without exclusive permission © 2008 Multifacet Project University of Wisconsin-Madison
27
LogTM: Log-based Transactional Memory
2/17/2019 Executive Summary Current Hardware TMs Most Transactions Small & Short Running Penalize large/long transactions Too restrictive for TM use up/down software stack? Hypothesis Must Support Efficient Large/Long Transactions As Well Is such an HTM even possible? Yes! TokenTM 1. LogTM’s Log to buffer unbounded values 2. Transactional Tokens for unbounded conflict detection Conflict state in memory metabits Concurrent updates via metastate fission/fusion 2/17/2019 Wisconsin Multifacet Project UW-Madison Architecture Seminar
28
© 2008 Multifacet Project University of Wisconsin-Madison
29
© 2008 Multifacet Project University of Wisconsin-Madison
Common Token Ops Actions by thread X Before (Sum, TID) After Acquire One Token (0, -) (1, X) Acquire T Tokens (T, X) Release One Token (v, -) (v-1, -) Release T tokens Conflicting Load (T, Y), Y≠X Conflicting Store (v, -), v≠0 © 2008 Multifacet Project University of Wisconsin-Madison
30
Workload Characteristics
Benchmark Input Unit o f Work Units Measured Num Xacts Avg Read-Set Avg Write-Set Max Read-Set Max Write-set Barnes 512 bodies parallel phase 1 2,553 6.1 4.2 42 39 Cholesky tk14.O factorization 60,203 2.4 1.7 6 4 Radiosity batch 1 task 1024 21,786 1.8 1.5 25 24 Raytrace teapot 47,783 5.1 2.0 594 Delaunay gen2.2-m30 16,384 51.4 38.8 507 345 Genome g1024-s32-n65536 100,115 14.5 2.1 768 18 Vacation-Low low contention 16,399 70.7 18.1 162 75 Vacation-High High contention 99.1 18.6 331 80 © 2008 Multifacet Project University of Wisconsin-Madison
31
© 2008 Multifacet Project University of Wisconsin-Madison
TokenTM Overheads © 2008 Multifacet Project University of Wisconsin-Madison
32
© 2008 Multifacet Project University of Wisconsin-Madison
Results Minor degradation with large transactions Comparable on small transactions © 2008 Multifacet Project University of Wisconsin-Madison
33
Fast Release (optional)
Thread X (Sum, TID) R W R’ W’ R+ Attr (0, -) - (u, -) 1 u-1 u (1, X) X (1, Y) Y (T, X) (T, Y) PC BEGIN_XACT Token log Load A Store B A: 1 COMMIT_XACT B: T Token LogPtr TID Fast-Release X Flash_Clear 1 Tag Data R W R’ W’ R+ Attr A 0x..00.. 1 X B 0x..01.. 1 X - - © 2008 Multifacet Project University of Wisconsin-Madison
34
Is Fast Release necessary?
© 2008 Multifacet Project University of Wisconsin-Madison
35
Token Operations Double-entry Bookkeeping
Thread X Thread Y PC PC Begin_XACT Token log Token log BEGIN_XACT Logical Token State Load A Load A Store B A: 1 A: 1 Store A Metadata <cx, cy, cz> <0,0,0> Commit_XACT COMMIT_XACT B: T Software <1,1,0> <0,0,0> <1,0,0> Hardware <T,0,0> <0,0,0> Block Data Metastate (Sum, TID) A 0x..00.. B C 0x..10.. (2, -) (0, -) (1, X) 0x..11.. 0x..00.. (T, X) (0, -) (0, -) © 2008 Multifacet Project University of Wisconsin-Madison
36
© 2008 Multifacet Project University of Wisconsin-Madison
Fission Rules Before After Copy1 Copy2 (u, -) (0, -) (1, X) (T, X) Assume Copy2 sent to new Reader Is there a writer? No writer © 2008 Multifacet Project University of Wisconsin-Madison
37
© 2008 Multifacet Project University of Wisconsin-Madison
Fusion Rules Copy1 Copy2 (v, -) (1, Y) (T, Y) (u, -) (u + v, -) (1, Y) if u = 0 (u + 1, - ) else (T, Y) if u = 0 error else (1, X) (1, X) if v = 0 (v + 1, -) else (2, -) error (T, X) (T, X) if v = 0 error else (T, X) if X = Y Add the two counts Forget token owner if count > 1 © 2008 Multifacet Project University of Wisconsin-Madison
38
© 2008 Multifacet Project University of Wisconsin-Madison
Metastate Fusion Thread X Thread Y Begin_XACT Token log Token log BEGIN_XACT Load A Load A A: 1 Store B A: 1 Store A Commit_XACT COMMIT_XACT Software Conflict Store A Hardware Coherence State Coherence State fusion Tag Data Sum TID Tag Data Sum TID Insufficient tokens Private Caches Invalid 1,X A Owned 0x..00.. 1 X A Shared Modified 0x..00.. 1 1,Y 2 Y - 2,- Inv A Ack A Upgrade A Tag Directory Data Sum TID Main Memory A P2 P2 P1,P2 0x..00.. © 2008 Multifacet Project University of Wisconsin-Madison
39
Modifying Hardware Metastate (Take 1)
Thread X Thread Y Begin_XACT Token log Token log BEGIN_XACT Load A Load A Store B A: 1 Store A Commit_XACT COMMIT_XACT Software Load A Coherence State Coherence State Hardware Tag Data Tag Data Private Caches A Exclusive 0x..00.. 1, X DATA A Extra main memory access on every metastate update GETS A Tag Directory Data Sum TID 0, - Main Memory A P1 Not Present 0x..00.. 0x..00.. - © 2008 Multifacet Project University of Wisconsin-Madison
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.