Presentation is loading. Please wait.

Presentation is loading. Please wait.

TokenTM: Token-Based Hardware Transactional Memory

Similar presentations


Presentation on theme: "TokenTM: Token-Based Hardware Transactional Memory"— Presentation transcript:

1 TokenTM: Token-Based Hardware Transactional Memory
Jayaram Bobba, Neelam Goyal, Mark D. Hill, Michael M. Swift, and David A. Wood Multifacet Project ( Dept. of Computer Sciences University of Wisconsin-Madison

2 LogTM: Log-based Transactional Memory
2/17/2019 Executive Summary Current Hardware TMs Most Transactions Small & Short Running Penalize large/long transactions Too restrictive for wide-spread TM use? Hypothesis Must Support Efficient Large/Long Transactions As Well Is such an HTM even possible? Yes! TokenTM 1. LogTM’s Log to buffer unbounded values 2. Transactional Tokens for unbounded conflict detection Conflict state in memory metabits Concurrent updates via metastate fission/fusion 2/17/2019 Wisconsin Multifacet Project UW-Madison Architecture Seminar

3 © 2008 Multifacet Project University of Wisconsin-Madison
Existing HTM Systems Assumption: Most transactions small & short running Optimized for small transactions Degrade with large, long running transactions Non-localized Overhead, E.g., LogTM-SE [Yen07] false conflicts OneTM [Blundel07] serializes Complex, Expensive Operations, E.g., XTM [Chung06]& PTM [Chuang06] manipulate page tables Premature Optimization? © 2008 Multifacet Project University of Wisconsin-Madison

4 Why Large Transactions?
LogTM: Log-based Transactional Memory 2/17/2019 Why Large Transactions? Programmers may want large (>>cache) and/or long (>> ctx switch) transactions HLL transactions invoke unpredictable lower-level code Replace critical sections containing syscalls or I/O Avoid concurrency bugs [Lu08] But “Most transactions small & short running” Restrict TM to use by gurus (like OS spin locks)? Self fulfilling prophesy? Must Support Efficient Large/Long Transactions As Well © 2008 Multifacet Project University of Wisconsin-Madison UW-Madison Architecture Seminar

5 Toward a Large-Transaction TM
Efficiently detect conflicts between in-flight transactions using Read/Write Sets Unbounded Globally accessible Fast read/write set ops. E.g., Add to read set Clear read set Small Transactions: Low Overhead Large Transactions: Localized Overhead Accessible read/write set (potentially unbounded) N Minimal Changes to Coherence / VM Heavyweight eviction ops Negative acks Additional page tables O © 2008 Multifacet Project University of Wisconsin-Madison

6 Existing Mechanisms × × 
Synergy between cache coherence and conflict detection Hence, overload cache coherence + Excellent for bounded/small TM But, - ‘Virtualization’ on overflows - Tough to access ‘virtualized’ state Small Transactions: Low Overhead Minimal Changes to Coherence / VM × Large Transactions: Localized Overhead × © 2008 Multifacet Project University of Wisconsin-Madison

7 TokenTM: a Large-Transaction TM
New Conflict Detection Mechanism Transactional Tokens in Tagged Memory Token Coherence [Martin03] at different level Version Management Save old/new values for unbounded Write set LogTM [Moore06] undo log This Talk © 2008 Multifacet Project University of Wisconsin-Madison

8 © 2008 Multifacet Project University of Wisconsin-Madison
Outline Motivation Design Token-Based Conflict Detection Metadata Storage Implementation Results © 2008 Multifacet Project University of Wisconsin-Madison

9 © 2008 Multifacet Project University of Wisconsin-Madison
Transactional Tokens Challenge: How to efficiently track Read/Write sets? Token Coherence [Martin03] Read/Write sets for cache coherence Solution: Transactional Tokens T tokens per memory block At least one token to read, All T tokens to write (token conflict detection) Token Metadata <c0,c1,…,ci,…> where 0≤ci≤T is count of tokens held by thread with TID i. © 2008 Multifacet Project University of Wisconsin-Madison

10 © 2008 Multifacet Project University of Wisconsin-Madison
Tagged Memory Challenge: Where to store Unbounded, Globally Accessible Token Metadata? Virtual Memory unbounded and globally accessible Solution, similar to OneTM [Blundel07] Tag Virtual Memory Piggyback on existing Virtual Memory and Cache Coherence mechanisms © 2008 Multifacet Project University of Wisconsin-Madison

11 TokenTM Logical Operation
Thread X Thread Y PC BEGIN_XACT Undo Log Undo Log PC BEGIN_XACT Load A Load A Store B ABORT Store A COMMIT_XACT COMMIT_XACT Shared Memory Block Data A 0x..00.. B C 0x..10.. Metadata <cx, cy, …> <0,0,…> Insufficient tokens <0,0,…> <1,1,…> <1,0,…> B: 0x..00.. 0x..11.. 0x..00.. <0,0,…> <T,0,…> © 2008 Multifacet Project University of Wisconsin-Madison

12 © 2008 Multifacet Project University of Wisconsin-Madison
Storing Metadata Unbounded Difficult to access globally Thread X Thread Y PC PC BEGIN_XACT BEGIN_XACT Undo Log Token log Undo Log Token log Load A Load A Cx CY Store B Store A COMMIT_XACT COMMIT_XACT Software Tagged Memory Hardware Block Data A 0x..00.. B C 0x..10.. Metadata <cx, cy, …> <0,0,…> Metastate (Sum, TID) (0, -) Concise Accessible Lossy Summary © 2008 Multifacet Project University of Wisconsin-Madison

13 © 2008 Multifacet Project University of Wisconsin-Madison
Hardware Metastate Metadata summary (sum, TID) sum, total number of tokens acquired TID, identify owner when sum = 1 or sum = T (optional) Some summaries, Concise -> Stored in packed field (e.g., State[1:2] , Attr[3:16]) Fast -> Accessed as part of normal memory operation <c0, c1, …, ci, …> (sum, TID) <0, 0, 0, 0> (0, -) <0, 0, 1, 0> (1, 2) <0, T, 0, 0> (T, 2) <0, 1, 1, 1> (3, -) © 2008 Multifacet Project University of Wisconsin-Madison

14 © 2008 Multifacet Project University of Wisconsin-Madison
Token Logs Distributed structures for unbounded Read/Write sets per-thread stored in program memory (e.g., heap) list of <address, num_tokens> Accessible to hardware for fast ops Add to read set -> Append to token log Token log A: 1 B: T © 2008 Multifacet Project University of Wisconsin-Madison

15 Double-entry Bookkeeping (Keeping Metadata Consistent)
Thread X Thread Y PC PC BEGIN_XACT Token log Token log BEGIN_XACT Logical Token State Load A Load A Store B A: 1 A: 1 Store A Metadata <cx, cy, …> COMMIT_XACT COMMIT_XACT Software <1,0,…> <1,1,…> <0,0,…> Hardware <0,0,…> Block Metastate (Sum, TID) A B C <0,0,…> (2, -) (0, -) (1, X) (0, -) (0, -) © 2008 Multifacet Project University of Wisconsin-Madison

16 © 2008 Multifacet Project University of Wisconsin-Madison
Outline Motivation Design Implementation Metastate Fission/Fusion Results © 2008 Multifacet Project University of Wisconsin-Madison

17 Implementing Hardware Metastate
Thread X Thread Y BEGIN_XACT Token log Token log BEGIN_XACT Load A Load A Store B A: 1 Store A COMMIT_XACT COMMIT_XACT Software Load A Load A Coherence State Coherence State Hardware Tag Data Tag Data Sum TID Sum TID Private Caches A 1 X - Modified Exclusive Owned 0x..00.. 0x..00.. 1, X A Shared 0x..00.. 1 X Data A DATA A Fwd_GETS A Metastate (Sum, TID) (0,0) GETS A GETS A Upgrade A Block Directory Data Sum TID 0, - Main Memory A P1 Not Present P1,P2 0x..00.. 0x..00.. - Shared copies cannot update metastate Solution: Fission / Fusion © 2008 Multifacet Project University of Wisconsin-Madison

18 © 2008 Multifacet Project University of Wisconsin-Madison
Metastate Fission Thread X Thread Y BEGIN_XACT Token log Token log BEGIN_XACT Load A Load A A: 1 A: 1 Store B Store A COMMIT_XACT COMMIT_XACT Software Hardware 1,X fission Load A Coherence State Coherence State Tag Data Sum TID Tag Data Sum TID Private Caches 1,X 0,- A Owned Modified 0x..00.. 1 X A Shared 0x..00.. 1 - Y 0x..00.. Data A GETS A Fwd_GETS A Block Directory Data Sum TID Main Memory A P1,P2 P1 0x..00.. © 2008 Multifacet Project University of Wisconsin-Madison

19 © 2008 Multifacet Project University of Wisconsin-Madison
Metastate Fusion Metastate Fusion On store, metastate copies fused back Why does fission/fusion work? Store sees ‘complete’ metastate Load sees ‘complete’ metastate, if writer exists ‘partial’ metastate, otherwise © 2008 Multifacet Project University of Wisconsin-Madison

20 © 2008 Multifacet Project University of Wisconsin-Madison
Hardware Cost Additional metabits in caches/memory Recoded ECC to cull metabits Changes to coherence protocols Additional payload on messages Minimal changes to protocol logic Requires non-silent eviction © 2008 Multifacet Project University of Wisconsin-Madison

21 © 2008 Multifacet Project University of Wisconsin-Madison
Outline Motivation Design Implementation Results Do we meet the two performance goals? Small Transactions: Low Overhead Large Transactions: Localized Overhead © 2008 Multifacet Project University of Wisconsin-Madison

22 Evaluation Methodology
Full System Simulation Multifacet GEMS Base System 32-core CMP system, in-order, single-issue cores Private 4-way 32KB writeback split I&D L1 caches Shared 8-way 8 MB writeback L2 On-chip L2, MESI coherence Packet-switched interconnect in a tiled topology © 2008 Multifacet Project University of Wisconsin-Madison

23 © 2008 Multifacet Project University of Wisconsin-Madison
TM Systems LogTM-SE [Yen07] variant Parallel Bloom Filters for conflict detection 4 2Kbit H3 filters + Compact, less hardware overhead - False Conflicts LogTM-SE_Perfect + No False Conflicts - Unimplementable TokenTM © 2008 Multifacet Project University of Wisconsin-Madison

24 © 2008 Multifacet Project University of Wisconsin-Madison
Results Large Transactions: Localized Overhead Minor degradation with large transactions Comparable on small transactions Small Transactions: Low Overhead © 2008 Multifacet Project University of Wisconsin-Madison

25 TokenTM Conflict Detection
Large Transactions: Localized Overhead Accessible read/write set (potentially unbounded) Fast read/write set ops. E.g., Add to read set Clear read set Small Transactions: Low Overhead N Minimal Changes to Coherence / VM Heavyweight eviction ops Negative acks Additional page tables O © 2008 Multifacet Project University of Wisconsin-Madison

26 © 2008 Multifacet Project University of Wisconsin-Madison
In the paper… Fast Token Release TM ‘virtualization’ events Context Switches, Paging etc. System V shared memory Long Running Critical Sections in server workloads Fission/Fusion useful for other TM systems USTM [Baugh08], set Fault-on-Write UFO bit without exclusive permission © 2008 Multifacet Project University of Wisconsin-Madison

27 LogTM: Log-based Transactional Memory
2/17/2019 Executive Summary Current Hardware TMs Most Transactions Small & Short Running Penalize large/long transactions Too restrictive for TM use up/down software stack? Hypothesis Must Support Efficient Large/Long Transactions As Well Is such an HTM even possible? Yes! TokenTM 1. LogTM’s Log to buffer unbounded values 2. Transactional Tokens for unbounded conflict detection Conflict state in memory metabits Concurrent updates via metastate fission/fusion 2/17/2019 Wisconsin Multifacet Project UW-Madison Architecture Seminar

28 © 2008 Multifacet Project University of Wisconsin-Madison

29 © 2008 Multifacet Project University of Wisconsin-Madison
Common Token Ops Actions by thread X Before (Sum, TID) After Acquire One Token (0, -) (1, X) Acquire T Tokens (T, X) Release One Token (v, -) (v-1, -) Release T tokens Conflicting Load (T, Y), Y≠X Conflicting Store (v, -), v≠0 © 2008 Multifacet Project University of Wisconsin-Madison

30 Workload Characteristics
Benchmark Input Unit o f Work Units Measured Num Xacts Avg Read-Set Avg Write-Set Max Read-Set Max Write-set Barnes 512 bodies parallel phase 1 2,553 6.1 4.2 42 39 Cholesky tk14.O factorization 60,203 2.4 1.7 6 4 Radiosity batch 1 task 1024 21,786 1.8 1.5 25 24 Raytrace teapot 47,783 5.1 2.0 594 Delaunay gen2.2-m30 16,384 51.4 38.8 507 345 Genome g1024-s32-n65536 100,115 14.5 2.1 768 18 Vacation-Low low contention 16,399 70.7 18.1 162 75 Vacation-High High contention 99.1 18.6 331 80 © 2008 Multifacet Project University of Wisconsin-Madison

31 © 2008 Multifacet Project University of Wisconsin-Madison
TokenTM Overheads © 2008 Multifacet Project University of Wisconsin-Madison

32 © 2008 Multifacet Project University of Wisconsin-Madison
Results Minor degradation with large transactions Comparable on small transactions © 2008 Multifacet Project University of Wisconsin-Madison

33 Fast Release (optional)
Thread X (Sum, TID) R W R’ W’ R+ Attr (0, -) - (u, -) 1 u-1 u (1, X) X (1, Y) Y (T, X) (T, Y) PC BEGIN_XACT Token log Load A Store B A: 1 COMMIT_XACT B: T Token LogPtr TID Fast-Release X Flash_Clear 1 Tag Data R W R’ W’ R+ Attr A 0x..00.. 1 X B 0x..01.. 1 X - - © 2008 Multifacet Project University of Wisconsin-Madison

34 Is Fast Release necessary?
© 2008 Multifacet Project University of Wisconsin-Madison

35 Token Operations Double-entry Bookkeeping
Thread X Thread Y PC PC Begin_XACT Token log Token log BEGIN_XACT Logical Token State Load A Load A Store B A: 1 A: 1 Store A Metadata <cx, cy, cz> <0,0,0> Commit_XACT COMMIT_XACT B: T Software <1,1,0> <0,0,0> <1,0,0> Hardware <T,0,0> <0,0,0> Block Data Metastate (Sum, TID) A 0x..00.. B C 0x..10.. (2, -) (0, -) (1, X) 0x..11.. 0x..00.. (T, X) (0, -) (0, -) © 2008 Multifacet Project University of Wisconsin-Madison

36 © 2008 Multifacet Project University of Wisconsin-Madison
Fission Rules Before After Copy1 Copy2 (u, -) (0, -) (1, X) (T, X) Assume Copy2 sent to new Reader Is there a writer? No writer © 2008 Multifacet Project University of Wisconsin-Madison

37 © 2008 Multifacet Project University of Wisconsin-Madison
Fusion Rules Copy1 Copy2 (v, -) (1, Y) (T, Y) (u, -) (u + v, -) (1, Y) if u = 0 (u + 1, - ) else (T, Y) if u = 0 error else (1, X) (1, X) if v = 0 (v + 1, -) else (2, -) error (T, X) (T, X) if v = 0 error else (T, X) if X = Y Add the two counts Forget token owner if count > 1 © 2008 Multifacet Project University of Wisconsin-Madison

38 © 2008 Multifacet Project University of Wisconsin-Madison
Metastate Fusion Thread X Thread Y Begin_XACT Token log Token log BEGIN_XACT Load A Load A A: 1 Store B A: 1 Store A Commit_XACT COMMIT_XACT Software Conflict Store A Hardware Coherence State Coherence State fusion Tag Data Sum TID Tag Data Sum TID Insufficient tokens Private Caches Invalid 1,X A Owned 0x..00.. 1 X A Shared Modified 0x..00.. 1 1,Y 2 Y - 2,- Inv A Ack A Upgrade A Tag Directory Data Sum TID Main Memory A P2 P2 P1,P2 0x..00.. © 2008 Multifacet Project University of Wisconsin-Madison

39 Modifying Hardware Metastate (Take 1)
Thread X Thread Y Begin_XACT Token log Token log BEGIN_XACT Load A Load A Store B A: 1 Store A Commit_XACT COMMIT_XACT Software Load A Coherence State Coherence State Hardware Tag Data Tag Data Private Caches A Exclusive 0x..00.. 1, X DATA A Extra main memory access on every metastate update GETS A Tag Directory Data Sum TID 0, - Main Memory A P1 Not Present 0x..00.. 0x..00.. - © 2008 Multifacet Project University of Wisconsin-Madison


Download ppt "TokenTM: Token-Based Hardware Transactional Memory"

Similar presentations


Ads by Google