Download presentation
Presentation is loading. Please wait.
Published byΔάμαλις Λιάπης Modified over 6 years ago
1
Hardware Support for Efficient Transactional and Supervised Memory Systems
Jayaram Bobba Dissertation Defense 1/14/2010 Overview: 1) Research Area 2) Challenges/ Contributions 3) Big Picture Dept. of Computer Sciences University of Wisconsin–Madison
2
Research Area Device Scaling Abundant Transistors Emergence of CMPs
Hard to Program Hardware Support to Improve Productivity Empty/full-bits Transactional Memory MemTracker Supervised Systems Deterministic Memory 9/18/2018 Wisconsin Multifacet Project
3
Challenges Supervised Systems Transactional Memory Contribution 1:
Sequential-consistency only Ad hoc hardware Lack of formalism Transactional Memory “Most transactions are small” Self-fulfilling Limited applicability Contribution 1: Supervised Memory TSOdata ,Safe Supervision Contribution 2: TokenTM Link TokenTM to StealthTest Contribution 3: StealthTest 9/18/2018 Wisconsin Multifacet Project
4
Wisconsin Multifacet Project
Big Picture Applications Software Tools StealthTest Supervised Systems TokenTM TSOdata and Safe Supervision Supervised Memory Hardware 9/18/2018 Wisconsin Multifacet Project
5
Wisconsin Multifacet Project
Outline Slide Count Motivation Supervised Memory TokenTM StealthTest Conclusion 4 18 4 /19 16 6 9/18/2018 Wisconsin Multifacet Project
6
On Software Productivity
More Software Better Hardware More Productivity Yannis’s “Law”: Programmer Productivity doubles every 6 years More Performance Moore’s Law Moore’s Law will continue But Yannis’s Law? 9/18/2018 Wisconsin Multifacet Project
7
Wisconsin Multifacet Project
What has changed? “A Fundamental Turn towards Concurrency in Software” [Herb Sutter, 2005] Moore’s Law -> Better Computers Sequential Computers (Past) Memory wall, Power wall etc. Attack of the killer CMPs* (Current) How to program? Expose parallelism to software Parallel programs hard to write * Adapted from “Attack of the killer micros” by Eugene Brooks 9/18/2018 Wisconsin Multifacet Project
8
Who solves the productivity issue?
Why, Of course, hardware architects! Long live Moore’s Law Spend some transistors on productivity issues Architectural Support for Enhancing Productivity for language features for bug avoidance for debugging for performance feedback and so on… 9/18/2018 Wisconsin Multifacet Project
9
Seriously, Who should solve it?
HW Architects or SW Engineers? ‘software crisis’ in the past too… Why HW architects? More bang for the buck (Economic) Software/IT (1,152 billion) vs Hardware (138 billion) [Wen Mei Hwu, Micro-39 Keynote] SW cannot do it alone (Technical) Decades of automatic parallelization efforts Virtual Memory, Tagged Memory for LISP-like languages “We must now reconsider the balance of hardware and software and to provide more specialized function in hardware than we have previously, in order to drastically simplify the programming process” Edward A. Feustel, IEEE TOC, July 1973 in support of Tagged Memory 9/18/2018 Wisconsin Multifacet Project
10
Wisconsin Multifacet Project
Outline Motivation Supervised Memory Background/Motivation Explore relaxed supervised systems Define Supervised Memory Propose formal models TokenTM StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project
11
Why Supervised Systems?
Synchronization Hardware TM systems Empty/Full-bits [Berry et al 2006] Graph processing algorithms on 4 processor MTA > 64K BG/L Controlled non-determinism Deterministic/Interleaving Constrained Multiprocessing Debugging Log-based architectures Safety Heap checkers, Bounds checkers Language Features Hardware-assisted Garbage Collection 9/18/2018 Wisconsin Multifacet Project
12
What are Supervised Systems?
out-of-band metadata per data block monitor & control (supervise) memory accesses to data execute handlers on specific metadata states pure software possible, but inefficient shadow memory E.g., Valgrind. Mean Slowdown 22X [Nethercote et al., VEE2007] Synthesized definition 9/18/2018 Wisconsin Multifacet Project
13
Wisconsin Multifacet Project
State-of-the-Art Expect Sequentially-Consistent (SC) hardware Most hardware is not Ad hoc Whither primitives? Informal treatment of memory consistency Ambiguous/Incorrect 9/18/2018 Wisconsin Multifacet Project
14
Contributions Expect Sequentially-Consistent (SC) hardware Ad hoc
Most hardware is not Ad hoc Whither primitives? Informal treatment of memory consistency Ambiguous/Incorrect Explore relaxed supervised systems Define Supervised Memory Propose formal memory models 9/18/2018 Wisconsin Multifacet Project
15
Wisconsin Multifacet Project
Outline Motivation Supervised Memory Background/Motivation Explore relaxed supervised systems Define Supervised Memory Propose formal models TokenTM StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project
16
TSO-lite: A TSO-compliant system
Explore relaxed supervised systems TSO-lite: A TSO-compliant system PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 Processor ST 0x10, C r1 r1 0x01 ST A LD B r2 r2 r3 r3 0x10 Store Buffer Block Data A 0x00 B 0x01 C 0x11 Metadata Memory 9/18/2018 Wisconsin Multifacet Project
17
Empty/Full-Bits on TSO-lite
Explore relaxed supervised systems Empty/Full-Bits on TSO-lite PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 LD ST Exception Processor ST 0x10, C r1 r1 0x01 Empty Full r2 r2 r3 r3 LD Store Buffer I1: NO LOAD BYPASS EXCEPTION LD/ST Block Data A 0x00 B 0x01 C 0x11 Metadata Full None Empty None Memory I2: LATE EXCEPTIONS 9/18/2018 Wisconsin Multifacet Project
18
Deterministic Shared Memory (DMP) [Devietti et al., ASPLOS 2009]
Explore relaxed supervised systems Deterministic Shared Memory (DMP) [Devietti et al., ASPLOS 2009] “depending upon the consistency model of the underlying hardware, threads must perform a memory fence at the edge of a quantum” Insert a fence after the last operation in the quantum Insert a fence before the first shared operation in the quantum I3: Reordered metabit-reads 9/18/2018 Wisconsin Multifacet Project Illustration
19
Wisconsin Multifacet Project
Outline Motivation Supervised Memory Background/Motivation Explore relaxed supervised systems Define Supervised Memory Propose formal models TokenTM StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project
20
What is Supervised Memory?
Define Supervised Memory What is Supervised Memory? Each memory location A, data (A.d) metadata (A.m) New operations Supervised Load (sLD A) Supervised Store (sST A) Jump on reading special metadata (Optionally) Hardware exception 9/18/2018 Wisconsin Multifacet Project
21
Supervised Operations
Define Supervised Memory Supervised Operations sLD A => Start: atomic{ curm = Val[RA.m] // Read metadata nextm = NEXT(Load, curm) // Check software- // specified FSM If nextm == EXCEPTION then Jump to Handler RA.d // Read data If (nextm != curm) then WA.m,nextm // Update metadata } Handler: … 9/18/2018 Wisconsin Multifacet Project
22
Using Supervised Memory
Define Supervised Memory Using Supervised Memory Software assigns semantics to metadata Metastates stored as metadata E.g., Initialized, Uninitialized Metastate transition function (NEXT) Use supervised operations to monitor/control data operations E.g., catch read access to uninitialized data 9/18/2018 Wisconsin Multifacet Project
23
Wisconsin Multifacet Project
Outline Motivation Supervised Memory Background/Motivation Explore relaxed supervised systems Define Supervised Memory Propose formal models TokenTM StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project
24
TSO Axioms [Hangal et al., ISCA 2004]
Propose formal models TSO Axioms [Hangal et al., ISCA 2004] 9/18/2018 Wisconsin Multifacet Project
25
TSO Axioms [Hangal et al., ISCA 2004]
Propose formal models TSO Axioms [Hangal et al., ISCA 2004] Axiom Description Order Total Order on all write accesses Atomicity No intervening accesses for atomic operations Termination All write accesses eventually complete Value Reads return latest value from memory or store buffer Memory Barrier No reordering across a barrier ReadAny Accesses cannot pass outstanding reads WriteWrite Write access cannot pass outstanding writes Reordering Axioms Rd A Rd B Rd A Wr B Wr A Wr B Wr A Rd B Allows store buffers 9/18/2018 Wisconsin Multifacet Project
26
TSOall: A Consistency Model for Supervised Memory
Propose formal models TSOall: A Consistency Model for Supervised Memory TSO axioms applied to all accesses—data and metadata + (Simple) Like TSO — (Slow) Prohibits optimizations Thread: sST A sLD B => Store buffers ineffective Tension Ease of Reasoning vs Performance ->[Rd A.m, Wr A.d, Wr A.m] ->[Rd B.m, Rd B.d] 9/18/2018 Wisconsin Multifacet Project
27
Blast from the Past [Adve and Hill, ISCA1990]
Propose formal models Blast from the Past [Adve and Hill, ISCA1990] Ease of Reasoning (SC) vs Performance (RC) Observation: Simple programs rely only on certain SC orders Ignore non-essential orders. Still appears as SC Challenge: Simple? Non-essential orders? Solution: Data-race-freedom For data-race-free programs, RC = SC 9/18/2018 Wisconsin Multifacet Project
28
Safe Supervision Motivation
Propose formal models Safe Supervision Motivation Ease of Reasoning (TSOall) vs Performance (?) Observation: Simple supervised programs rely only on certain TSOall orders Ignore non-essential orders. Still appears as TSOall Challenge: Simple? Non-essential orders? Solution: Safe Supervision For safely supervised programs, ? = TSOall 9/18/2018 Wisconsin Multifacet Project Examples
29
Wisconsin Multifacet Project
Safe Supervision metadata accesses to location A not used to order operations to a different location B Most uses of supervision are safely supervised. E.g., Heap Checker: Initialized/Uninitialized values Transactional Memory: Conflict Detection information Initially, A.m = Empty, B.d = 0 Thread 1: B.d = 1 A.m = Full Thread 2: While (A.m == Empty); Read B.d 9/18/2018 Wisconsin Multifacet Project Definition
30
TSOdata: Fast Yet Simple
Propose formal models TSOdata: Fast Yet Simple Axiom Description Order Total Order on all write accesses Atomicity No intervening accesses for atomic operations Termination All write accesses eventually complete Value Reads return latest value from memory or store buffer Memory Barrier No reordering across a barrier ReadAny Data accesses cannot pass outstanding data reads WriteWrite Data writes cannot pass outstanding data writes Reordering Axioms ->[Rd A.m, Wr A.d, Wr A.m] Thread: sST B sLDA Store buffers can be used ->[Rd B.m, Rd B.d] For safely supervised programs, TSOdata = TSOall 9/18/2018 Wisconsin Multifacet Project
31
Wisconsin Multifacet Project
TSOdata on OpenSPARC T2 Goal: Explore low-level issues on a real design Late Exceptions with deferred handlers Dump store buffer entries on exception Enhance store buffer to carry Virtual Address (VA) ~200 cycles to read out 4 entries Disable store buffer bypassing for supervised loads Low space overhead for adding metabits (~4%) 9/18/2018 Wisconsin Multifacet Project
32
Supervised Memory Summary
Expects Sequentially-Consistent (SC) hardware Most hardware is not Ad hoc Whither primitives? Informal treatment of memory consistency Ambiguous/Incorrect Explore relaxed memory systems Define Supervised Memory Propose formal memory models 9/18/2018 Wisconsin Multifacet Project
33
Wisconsin Multifacet Project
Outline Motivation Supervised Memory TokenTM [ISCA 2008] StealthTest Conclusion Longer Version 9/18/2018 Wisconsin Multifacet Project
34
LogTM: Log-based Transactional Memory
9/18/2018 TokenTM Summary Current Hardware TMs Most Transactions Small & Short Running Penalize large/long transactions Too restrictive for wide-spread TM use? Hypothesis Must Support Efficient Large/Long Transactions As Well Is such an HTM even possible? Yes! TokenTM 1. LogTM’s Log to buffer unbounded values 2. Transactional Tokens for unbounded conflict detection Conflict state in memory metabits 9/18/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
35
Wisconsin Multifacet Project
Transactional Tokens Challenge: How to efficiently track Read/Write sets? Token Coherence [Martin03] Read/Write sets for cache coherence Solution: Transactional Tokens T tokens per memory block At least one token to read, All T tokens to write (token conflict detection) Token Metadata <c0,c1,…,ci,…> where 0≤ci≤T is count of tokens held by thread with TID i. 9/18/2018 Wisconsin Multifacet Project
36
Tokens and Supervised Memory
Challenge: Where to store Unbounded, Globally Accessible Token Metadata? unbounded and globally accessible Solution Supervised Memory’s Metadata Piggyback on existing Virtual Memory and Cache Coherence mechanisms Skip Animation 9/18/2018 Wisconsin Multifacet Project
37
TokenTM: a Large-Transaction TM
New Conflict Detection Mechanism Transactional Tokens in Supervised Memory Token Coherence [Martin03] at different level Version Management Save old/new values for unbounded Write set LogTM [Moore06] undo log 9/18/2018 Wisconsin Multifacet Project
38
Wisconsin Multifacet Project
Outline Motivation Supervised Memory TokenTM StealthTest [PACT 2009] Conclusion 9/18/2018 Wisconsin Multifacet Project
39
StealthTest Summary (1/2) The Problem: fork Overhead
LogTM: Log-based Transactional Memory 9/18/2018 StealthTest Summary (1/2) The Problem: fork Overhead Software testing hard Multithreading makes harder Online software testing can help Run tests on deployed software E.g., Delta Execution for patch testing [Tucek et al., ASPLOS 2009] Non-intrusive mechanisms fork (existing) Low Overhead Functionally Hidden Good Scaling fork 9/18/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
40
StealthTest Summary (2/2) Solution: TM for testing
Leverage Transactional Memory for online testing Non-Intrusive? transaction { test(); abort} Fast TM mechanisms Low Overhead Functionally Hidden Good Scaling Demonstrate two uses Delta Execution In vivo Testing StealthTest 9/18/2018 Wisconsin Multifacet Project
41
Wisconsin Multifacet Project
Outline Motivation Supervised Memory TokenTM StealthTest Online Software Testing E.g., Patch Validation StealthTest: TM for online testing Delta Execution using StealthTest In vivo Testing using StealthTest (Optionally) Conclusion 9/18/2018 Wisconsin Multifacet Project
42
Online Patch Validation
Bug fixes can introduce more bugs Patches must be validated Online Validation [Nagaraja et al., OSDI 2004] Increased resource usage Lockstep execution Output Production Input Testing Diff 9/18/2018 Wisconsin Multifacet Project
43
Delta Execution [Tucek et al., ASPLOS 2009]
Online Patch Validation Most patches are small Patched and Un-patched executions similar Delta Execution Run together except when they differ Prior Work Delta Execution Increased Resource Usage O P Lockstep Execution 9/18/2018 Wisconsin Multifacet Project
44
Delta Execution using fork
Install D data Patched execution Testing Production fork Compute D data Isolate D data Merged execution Unpatched execution Time 9/18/2018 Wisconsin Multifacet Project
45
Multi-threading and fork
‘Park’ all other threads Install D data Patched execution Testing Production fork Compute D data Isolate D data Unpatched execution Merged execution Time Stop all threads to get a consistent memory snapshot 9/18/2018 Wisconsin Multifacet Project
46
Wisconsin Multifacet Project
fork Poor Performance ~9.8ms for split/~106ms for merge [Tucek et al, ASPLOS 2009] Poor Scalability Web-server response rate reduced by 43% Want an alternate mechanism 9/18/2018 Wisconsin Multifacet Project
47
Wisconsin Multifacet Project
Outline Motivation Supervised Memory TokenTM StealthTest Online Software Testing E.g., Patch Validation StealthTest: TM for online testing Delta Execution using StealthTest In vivo Testing using StealthTest (Optionally) Conclusion 9/18/2018 Wisconsin Multifacet Project
48
Delta Execution using StealthTest
Isolate patched execution Introspect patched execution Monitor delta data access Delta Execution StealthTest Transactional Memory transaction{…} Version Management Tracks new/old values Conflict Detection Monitor accesses Execute on child process Page diffing mprotect fork 9/18/2018 Wisconsin Multifacet Project
49
StealthTest Interface
Isolate patched execution Introspect patched execution Monitor delta data access Delta Execution ST_begin_transaction ST_abort_transaction ST_get_old ST_get_new ST_protect_set ST_protect_clear StealthTest Transactional Memory transaction{…} Version Management Tracks new/old values Conflict Detection Monitor accesses 9/18/2018 Wisconsin Multifacet Project
50
Wisconsin Multifacet Project
Requirements from TM Strong Atomicity [Martin et al., CAL 2006] Transactions isolated from non-transactions => Test transactions isolated from application code Flexible Conflict Resolution Can abort transactions if necessary => Abort tests if they block application Communication from within transactions => Expose result of a test 9/18/2018 Wisconsin Multifacet Project
51
Wisconsin Multifacet Project
Outline Motivation Supervised Memory TokenTM StealthTest Online Software Testing E.g., Patch Validation StealthTest: TM for online testing Delta Execution using StealthTest In vivo Testing using StealthTest (Optionally) Conclusion 9/18/2018 Wisconsin Multifacet Project
52
Delta Execution using StealthTest
Install D data Patched execution Testing fork Production fork Compute and Isolate D data Merged execution Unpatched execution transaction Compute and Isolate D data Patched execution Introspect and rollback Unpatched execution Install D data Merged execution StealthTest Production ST_get_new ST_get_old ST_begin_tran… ST_abort_tran… ST_protect_set 9/18/2018 Wisconsin Multifacet Project
53
Multi-threaded Delta Execution
Install D data Patched execution Testing fork Production fork Compute and Isolate D data Merged execution Unpatched execution transaction Compute and Isolate D data Patched execution Introspect and rollback Unpatched execution Install D data Merged execution StealthTest Original ST_get_new ST_get_old ST_begin_tran… ST_abort_tran… ST_protect_set 9/18/2018 Wisconsin Multifacet Project
54
Wisconsin Multifacet Project
Evaluation (1) Effective? (2) Non-intrusive? Workloads Collection of multi-threaded server apps Same as Tucek et al., ASPLOS 2009 Pin-based TM Emulation 2-way SMP with 2.4GHz Pentium 4 CPUs and 2.5GB RAM 9/18/2018 Wisconsin Multifacet Project
55
Wisconsin Multifacet Project
(1) Effective? Program Description Patch Patch Verified? fork StealthTest Crafty Chess App Code refactoring P Raytrace Raytracer Result reporting fix Tar Archive Util Incremental archiving fix Apache1 Web Server Buffer overflow fix Apache2 DNSCache DNS Cache Behavior Change MySQL5.0 DB Server Extra permission checks O OpenSSL Security Lib Added bug in TLS handling Squid Web Cache ATPhttpd Works Memory allocation sockets 9/18/2018 Wisconsin Multifacet Project
56
Wisconsin Multifacet Project
(2) Non-intrusive? Program Description fork ForkOverhead(%) PatchDuration(%) Crafty Chess App 0.1 <0.1 Raytrace Raytracer 0.2 0.5 Tar Archive Util 41 7.3 Apache1 Web Server 2.8 Apache2 12 DNSCache DNS Cache 65 MySQL5.0 DB Server 4.7 5.0 OpenSSL Security Lib Squid Web Cache 2.9 ATPhttpd 0.8 9/18/2018 Wisconsin Multifacet Project
57
Wisconsin Multifacet Project
Outline Motivation Supervised Memory TokenTM StealthTest Online Software Testing E.g., Patch Validation StealthTest: TM for online testing Delta Execution using StealthTest In vivo Testing using StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project
58
Wisconsin Multifacet Project
StealthTest Summary Software testing hard Online software testing can help Existing mechanisms inadequate StealthTest leverages TM for non-intrusive online testing Demonstrate two uses Delta Execution In vivo Testing Low Overhead Functionally Hidden Good Scaling StealthTest 9/18/2018 Wisconsin Multifacet Project
59
Wisconsin Multifacet Project
Outline Motivation Supervised Memory TokenTM StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project
60
Contribution 1: Supervised Memory [Under Submission]
Supervised Systems – Useful, Renewed interest Problem SC only, while most systems are not Ad hoc hardware, specific to a supervised system No Formalism, leads to ambiguity/incorrectness Contributions Explore non-SC systems General model for supervision: Supervised Memory Formal Specification 9/18/2018 Wisconsin Multifacet Project
61
Contribution 2: TokenTM [Bobba et al., ISCA 2008]
Transactional Memory, a supervised system Problem “Most transactions are small”, Self-fulfilling assumption Penalize large/long transactions Too restrictive for wide-spread TM use? Contributions TokenTM First HTM to support efficient large/long transactions as well Follow-up: Purdue’s LiteTM [Jafri et al., HPCA 2010] 9/18/2018 Wisconsin Multifacet Project
62
Contribution 3: StealthTest [Bobba et al., PACT 2009]
Using transactional memory for testing Problem Existing fork-based mechanisms High overhead Poor scalability Contributions StealthTest, low-overhead interface for online testing Two StealthTest-based testing frameworks 9/18/2018 Wisconsin Multifacet Project
63
Other Research and Contributions
Performance Pathologies Bobba et al., ISCA 2007 Bobba et al., IEEE Micro Top Picks Jan 2008 LogTM-SE Yen et al., HPCA 2007 Nested LogTM Moravan et al., ASPLOS 2006 LogTM Moore et al., HPCA 2006 GEMS LogTM-SE Implementation Development, Release and Support 9/18/2018 Wisconsin Multifacet Project
64
Wisconsin Multifacet Project
Acknowledgments Advisors Mark Hill, David Wood Mike Swift, Ben Liblit, Shan Lu, Karu Sankaralingam Mikko Lipasti, Jeffrey Naughton Co-authors Kevin Moore, Luke Yen, Haris Volos, Michelle Moravan, Weiwei Xiong, Neelam Goyal Colleagues Alaa Alameldeen, Arkaprava Basu, Brad Beckmann, Polina Dudnik, Dan Gibson, Mike Marty, Somayeh Sardashti, Rathijit Sen, Cong Wang, Yasuko Watanabe, Min Xu Matt Allen, Piramanayagam Arumuga Nainar, Siddharth Barman, Koushik Chakraborthy, Venkat Govindraju, Amit Kumar, Srinath Sridharan, Philip Wells 9/18/2018 Wisconsin Multifacet Project
65
Wisconsin Multifacet Project
Key Contributions Applications Supervised Memory Systems Hardware Software TSOdata and Safe Supervision Tools TokenTM StealthTest 9/18/2018 Wisconsin Multifacet Project
66
Wisconsin Multifacet Project
Backup 9/18/2018 Wisconsin Multifacet Project
67
Wisconsin Multifacet Project
CPU Trends From “The Free Lunch Is Over” by Herb Sutter 9/18/2018 Wisconsin Multifacet Project
68
The ‘Re-Birth’ of Parallel Programming
Sequential Computers Memory wall, Power wall etc. Attack of the killer CMPs* General-purpose parallel computers How to program? Expose parallelism to software “The Free Lunch is Over” – Herb Sutter, 2005 * Adapted from “Attack of the killer micros” by Eugene Brooks 9/18/2018 Wisconsin Multifacet Project
69
Parallel Programming is Hard (currently)
Hard for programmers Correctness Synchronization, Data races, Atomicity violations Performance Communication, Scheduling, Load-Balancing, Critical Path Hard for tools Compilers, Static Analysis Intractable/Inefficient 9/18/2018 Wisconsin Multifacet Project
70
Houston, We Have a Problem!
Who should solve this problem? Yannis’s Law: Programmer Productivity doubles every 6 years Proebsting's Law: Compiler Advances Double Computing Power Every 18 Years 9/18/2018 Wisconsin Multifacet Project
71
Parallel Algorithms vs Moore’s Law
“Improvement resulting from … algorithmic speedup is comparable to that resulting from from the hardware speedup due to Moore’s Law over the same length of time” David E. Keyes “A Science-Based Case for Large-Scale Simulation”, July 2003. 9/18/2018 Wisconsin Multifacet Project
72
Wisconsin Multifacet Project
In the “Landscape of Parallel Computing Research…” [Asanovic et al., 2006] 9/18/2018 Wisconsin Multifacet Project
73
Why not Tagged Memory? [Gehringer and Keedy, CAN 1985]
Type information in tags Arguments do not apply to dynamically-typed languages like Lisp For other languages, Simpler but more specialized designs Compilers improved to make the proposals moot 9/18/2018 Wisconsin Multifacet Project
74
Existing proposals assume SC
Explore relaxed memory systems Existing proposals assume SC Assume SC or don’t deal with multiprocessors Proposal Base Architecture Implementation WWT MIPS SC Tapeworm LogTM SPARC OneTM Informing Memory MIPS, Alpha SafeMem x86 MemTracker DMP 9/18/2018 Wisconsin Multifacet Project
75
Wisconsin Multifacet Project
DMP Correctness 9/18/2018 Wisconsin Multifacet Project
76
Non-TSOall Executions
9/18/2018 Wisconsin Multifacet Project
77
Wisconsin Multifacet Project
Propose formal models TSOdata is Complex Empty/full-bits sST Initial State: A.d = 0, A.m = None B.d = 0, B.m = Empty Empty Full T0: dST 1, A sLD B T1: sST B, 1 dLD A sLD sST Can dLD A return 0? Exception 9/18/2018 Wisconsin Multifacet Project
78
Wisconsin Multifacet Project
Safe Supervision 9/18/2018 Wisconsin Multifacet Project
79
Wisconsin Multifacet Project
9/18/2018 Wisconsin Multifacet Project
80
TokenTM Logical Operation
Thread X Thread Y PC BEGIN_XACT Undo Log Undo Log PC BEGIN_XACT Load A Load A Store B ABORT Store A COMMIT_XACT COMMIT_XACT Shared Memory Block Data A 0x..00.. B C 0x..10.. Metadata <cx, cy, …> <0,0,…> Insufficient tokens <0,0,…> <1,1,…> <1,0,…> B: 0x..00.. 0x..11.. 0x..00.. <0,0,…> <T,0,…> 9/18/2018 Wisconsin Multifacet Project
81
Wisconsin Multifacet Project
Existing HTM Systems Assumption: Most transactions small & short running Optimized for small transactions Degrade with large, long running transactions Non-localized Overhead, E.g., LogTM-SE [Yen07] false conflicts OneTM [Blundel07] serializes Complex, Expensive Operations, E.g., XTM [Chung06]& PTM [Chuang06] manipulate page tables Premature Optimization? 9/18/2018 Wisconsin Multifacet Project
82
Why Large Transactions?
LogTM: Log-based Transactional Memory 9/18/2018 Why Large Transactions? Programmers may want large (>>cache) and/or long (>> ctx switch) transactions HLL transactions invoke unpredictable lower-level code Replace critical sections containing syscalls or I/O Avoid concurrency bugs [Lu08] But “Most transactions small & short running” Restrict TM to use by gurus (like OS spin locks)? Self fulfilling prophesy? Must Support Efficient Large/Long Transactions As Well 9/18/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar
83
Toward a Large-Transaction TM
Efficiently detect conflicts between in-flight transactions using Read/Write Sets Unbounded Globally accessible Fast read/write set ops. E.g., Add to read set Clear read set Small Transactions: Low Overhead Large Transactions: Localized Overhead Accessible read/write set (potentially unbounded) N Minimal Changes to Coherence / VM Heavyweight eviction ops Negative acks Additional page tables O 9/18/2018 Wisconsin Multifacet Project
84
Existing Mechanisms × ×
Synergy between cache coherence and conflict detection Hence, overload cache coherence + Excellent for bounded/small TM But, - ‘Virtualization’ on overflows - Tough to access ‘virtualized’ state Small Transactions: Low Overhead Minimal Changes to Coherence / VM × Large Transactions: Localized Overhead × 9/18/2018 Wisconsin Multifacet Project
85
TokenTM: a Large-Transaction TM
New Conflict Detection Mechanism Transactional Tokens in Tagged Memory Token Coherence [Martin03] at different level Version Management Save old/new values for unbounded Write set LogTM [Moore06] undo log This Talk 9/18/2018 Wisconsin Multifacet Project
86
Wisconsin Multifacet Project
Transactional Tokens Challenge: How to efficiently track Read/Write sets? Token Coherence [Martin03] Read/Write sets for cache coherence Solution: Transactional Tokens T tokens per memory block At least one token to read, All T tokens to write (token conflict detection) Token Metadata <c0,c1,…,ci,…> where 0≤ci≤T is count of tokens held by thread with TID i. 9/18/2018 Wisconsin Multifacet Project
87
Wisconsin Multifacet Project
Tagged Memory Challenge: Where to store Unbounded, Globally Accessible Token Metadata? Virtual Memory unbounded and globally accessible Solution, similar to OneTM [Blundel07] Tag Virtual Memory Piggyback on existing Virtual Memory and Cache Coherence mechanisms 9/18/2018 Wisconsin Multifacet Project
88
TokenTM Logical Operation
Thread X Thread Y PC BEGIN_XACT Undo Log Undo Log PC BEGIN_XACT Load A Load A Store B ABORT Store A COMMIT_XACT COMMIT_XACT Shared Memory Block Data A 0x..00.. B C 0x..10.. Metadata <cx, cy, …> <0,0,…> Insufficient tokens <0,0,…> <1,1,…> <1,0,…> B: 0x..00.. 0x..11.. 0x..00.. <0,0,…> <T,0,…> 9/18/2018 Wisconsin Multifacet Project
89
Wisconsin Multifacet Project
Storing Metadata Unbounded Difficult to access globally Thread X Thread Y PC PC BEGIN_XACT BEGIN_XACT Undo Log Token log Undo Log Token log Load A Load A Cx CY Store B Store A COMMIT_XACT COMMIT_XACT Software Tagged Memory Hardware Block Data A 0x..00.. B C 0x..10.. Metadata <cx, cy, …> <0,0,…> Metastate (Sum, TID) (0, -) Concise Accessible Lossy Summary 9/18/2018 Wisconsin Multifacet Project
90
Wisconsin Multifacet Project
Hardware Metastate Metadata summary (sum, TID) sum, total number of tokens acquired TID, identify owner when sum = 1 or sum = T (optional) Some summaries, Concise -> Stored in packed field (e.g., State[1:2] , Attr[3:16]) Fast -> Accessed as part of normal memory operation <c0, c1, …, ci, …> (sum, TID) <0, 0, 0, 0> (0, -) <0, 0, 1, 0> (1, 2) <0, T, 0, 0> (T, 2) <0, 1, 1, 1> (3, -) 9/18/2018 Wisconsin Multifacet Project
91
Wisconsin Multifacet Project
Token Logs Distributed structures for unbounded Read/Write sets per-thread stored in program memory (e.g., heap) list of <address, num_tokens> Accessible to hardware for fast ops Add to read set -> Append to token log Token log A: 1 B: T 9/18/2018 Wisconsin Multifacet Project
92
Double-entry Bookkeeping (Keeping Metadata Consistent)
Thread X Thread Y PC PC BEGIN_XACT Token log Token log BEGIN_XACT Logical Token State Load A Load A Store B A: 1 A: 1 Store A Metadata <cx, cy, …> COMMIT_XACT COMMIT_XACT Software <1,0,…> <1,1,…> <0,0,…> Hardware <0,0,…> Block Metastate (Sum, TID) A B C <0,0,…> (2, -) (0, -) (1, X) (0, -) (0, -) 9/18/2018 Wisconsin Multifacet Project
93
Implementing Hardware Metastate
Thread X Thread Y BEGIN_XACT Token log Token log BEGIN_XACT Load A Load A Store B A: 1 Store A COMMIT_XACT COMMIT_XACT Software Load A Load A Coherence State Coherence State Hardware Tag Data Tag Data Sum TID Sum TID Private Caches A 1 X - Modified Exclusive Owned 0x..00.. 0x..00.. 1, X A Shared 0x..00.. 1 X Data A DATA A Fwd_GETS A Metastate (Sum, TID) (0,0) GETS A GETS A Upgrade A Block Directory Data Sum TID 0, - Main Memory A P1 Not Present P1,P2 0x..00.. 0x..00.. - Shared copies cannot update metastate Solution: Fission / Fusion 9/18/2018 Wisconsin Multifacet Project
94
Wisconsin Multifacet Project
Metastate Fission Thread X Thread Y BEGIN_XACT Token log Token log BEGIN_XACT Load A Load A A: 1 A: 1 Store B Store A COMMIT_XACT COMMIT_XACT Software Hardware 1,X fission Load A Coherence State Coherence State Tag Data Sum TID Tag Data Sum TID Private Caches 1,X 0,- A Owned Modified 0x..00.. 1 X A Shared 0x..00.. 1 - Y 0x..00.. Data A GETS A Fwd_GETS A Block Directory Data Sum TID Main Memory A P1 P1,P2 0x..00.. 9/18/2018 Wisconsin Multifacet Project
95
Wisconsin Multifacet Project
Metastate Fusion Metastate Fusion On store, metastate copies fused back Why does fission/fusion work? Store sees ‘complete’ metastate Load sees ‘complete’ metastate, if writer exists ‘partial’ metastate, otherwise 9/18/2018 Wisconsin Multifacet Project
96
Wisconsin Multifacet Project
Hardware Cost Additional metabits in caches/memory Recoded ECC to cull metabits Changes to coherence protocols Additional payload on messages Minimal changes to protocol logic Requires non-silent eviction 9/18/2018 Wisconsin Multifacet Project
97
Evaluation Methodology
Full System Simulation Multifacet GEMS Base System 32-core CMP system, in-order, single-issue cores Private 4-way 32KB writeback split I&D L1 caches Shared 8-way 8 MB writeback L2 On-chip L2, MESI coherence Packet-switched interconnect in a tiled topology 9/18/2018 Wisconsin Multifacet Project
98
Wisconsin Multifacet Project
TM Systems LogTM-SE [Yen07] variant Parallel Bloom Filters for conflict detection 4 2Kbit H3 filters + Compact, less hardware overhead - False Conflicts LogTM-SE_Perfect + No False Conflicts - Unimplementable TokenTM 9/18/2018 Wisconsin Multifacet Project
99
Wisconsin Multifacet Project
Results Large Transactions: Localized Overhead Minor degradation with large transactions Comparable on small transactions Small Transactions: Low Overhead 9/18/2018 Wisconsin Multifacet Project
100
Wisconsin Multifacet Project
9/18/2018 Wisconsin Multifacet Project
101
In vivo Testing [Murphy et al. TR 2007, Chu et al. ICST 2008]
Run unit tests on deployed software + More testing + More realistic Catch bugs early 9/18/2018 Wisconsin Multifacet Project
102
In vivo Testing using StealthTest
ST_begin_transaction(); try { test(); ST_begin_escape(); fprintf(log, “…”, success); ST_end_escape(); } catch/except() { fprintf(log, “…”, fail); } ST_abort_transaction(NO_RETRY); 9/18/2018 Wisconsin Multifacet Project
103
Wisconsin Multifacet Project
Evaluation Workloads Bugbench Server Workloads STAMP Transactional Memory benchmarks Implementation Intel STM Language-Based TM TL2 STM Library-Based TM Quad-core workstation with RHEL5 9/18/2018 Wisconsin Multifacet Project
104
Wisconsin Multifacet Project
(1) Effective? Built on Intel STM. Run tests on Bugbench applications Program Description Size (LOC) Bug Type Error Detected? NCOM file compress 1.9K Stack Smash Yes POLY file “unixier” 0.7K GZIP 8.2K Buffer Overflow MAN documentation 4.7K BC calculator 17.0K HTPD1 web server 224K Atomicity SQUD proxy cache 93.5K Possible CVS version control 114.5K Double Free MSQL2 DBMS 514K MSQL3 1028K Works Unsupported Library Calls 9/18/2018 Wisconsin Multifacet Project
105
Wisconsin Multifacet Project
(2) Non-intrusive? Built on TL2 STM. Run tests on STAMP applications (1000 tests per min) 9/18/2018 Wisconsin Multifacet Project
106
Atomicity Violation Bugs?
9/18/2018 Wisconsin Multifacet Project
107
Wisconsin Multifacet Project
Degree-2 Transactions Isolate only writes. Implementation Reads in escape action Early Release Add new type of transaction to TM 9/18/2018 Wisconsin Multifacet Project
108
Wisconsin Multifacet Project
StealthTest Wish List Hardware Support System Calls within Transactions Interaction between Locks and Transactions 9/18/2018 Wisconsin Multifacet Project
109
Wisconsin Multifacet Project
9/18/2018 Wisconsin Multifacet Project
110
Wisconsin Multifacet Project
Related Work Binary Translation (SPROCKETS) Code Emulation (STEM) TLS (Oplinger&LAM, PathExpander) 9/18/2018 Wisconsin Multifacet Project
111
In vivo Testing Motivation
Ordering Bug in MySQL In vivo Test: Data Consistency Checks Buggy Code (In mysys/thr_lock.c): void thr_lock_delete(THR_LOCK *lock) { … pthread_mutex_destroy(&lock->mutex); list_delete(thr_lock_thread_list, &lock->list); } 9/18/2018 Wisconsin Multifacet Project
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.