Jayaram Bobba Dissertation Defense 1/14/2010 Overview:

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.
Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,
Calvin: Deterministic or Not? Free Will to Choose Derek R. Hower, Polina Dudnik, Mark D. Hill, David A. Wood.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Continuously Recording Program Execution for Deterministic Replay Debugging.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Lecture 13: Consistency Models
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood Presented by Colleen Lewis.
RCDC SLIDES README Font Issues – To ensure that the RCDC logo appears correctly on all computers, it is represented with images in this presentation. This.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06) Min Xu Rastislav BodikMark D. Hill Shimin Chen LBA Reading Group Presentation.
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo.
StealthTest: Low Overhead Online Software Testing Using Transactional Memory Jayaram Bobba, Weiwei Xiong*, Luke Yen †, Mark D. Hill, and David A. Wood.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke.
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Explicitly Parallel Programming with Shared-Memory is Insane: At Least Make it Deterministic! Joe Devietti, Brandon Lucia, Luis Ceze and Mark Oskin University.
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Kernel Design & Implementation
Presented by: Daniel Taylor
Speculative Lock Elision
ASR: Adaptive Selective Replication for CMP Caches
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
Lecture 21 Synchronization
Lecture 11: Consistency Models
Safe and Efficient Supervised Memory Systems
CMSC 611: Advanced Computer Architecture
Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.
Lecture 19: Transactional Memories III
Threads and Memory Models Hal Perkins Autumn 2011
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Changing thread semantics
Improving Multiple-CMP Systems with Token Coherence
Threads and Memory Models Hal Perkins Autumn 2009
Adaptive Single-Chip Multiprocessing
Yiannis Nikolakopoulos
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
TokenTM: Token-Based Hardware Transactional Memory
LogTM-SE: Decoupling Hardware Transactional Memory from Caches
Operating Systems : Overview
Chapter 4: Threads.
Operating Systems : Overview
Performance Pathologies in Hardware Transactional Memory
Performance Pathologies in Hardware Transactional Memory
Dynamic Verification of Sequential Consistency
Programming with Shared Memory Specifying parallelism
Lecture 23: Transactional Memory
Problems with Locks Andrew Whitaker CSE451.
Presentation transcript:

Hardware Support for Efficient Transactional and Supervised Memory Systems Jayaram Bobba Dissertation Defense 1/14/2010 Overview: 1) Research Area 2) Challenges/ Contributions 3) Big Picture Dept. of Computer Sciences University of Wisconsin–Madison

Research Area Device Scaling Abundant Transistors Emergence of CMPs Hard to Program Hardware Support to Improve Productivity Empty/full-bits Transactional Memory MemTracker Supervised Systems Deterministic Memory 9/18/2018 Wisconsin Multifacet Project

Challenges Supervised Systems Transactional Memory Contribution 1: Sequential-consistency only Ad hoc hardware Lack of formalism Transactional Memory “Most transactions are small” Self-fulfilling Limited applicability Contribution 1: Supervised Memory TSOdata ,Safe Supervision Contribution 2: TokenTM Link TokenTM to StealthTest Contribution 3: StealthTest 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Big Picture Applications Software Tools StealthTest Supervised Systems TokenTM TSOdata and Safe Supervision Supervised Memory Hardware 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Slide Count Motivation Supervised Memory TokenTM StealthTest Conclusion 4 18 4 /19 16 6 9/18/2018 Wisconsin Multifacet Project

On Software Productivity More Software Better Hardware More Productivity Yannis’s “Law”: Programmer Productivity doubles every 6 years More Performance Moore’s Law Moore’s Law will continue But Yannis’s Law? 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project What has changed? “A Fundamental Turn towards Concurrency in Software” [Herb Sutter, 2005] Moore’s Law -> Better Computers Sequential Computers (Past) Memory wall, Power wall etc. Attack of the killer CMPs* (Current) How to program? Expose parallelism to software Parallel programs hard to write * Adapted from “Attack of the killer micros” by Eugene Brooks 9/18/2018 Wisconsin Multifacet Project

Who solves the productivity issue? Why, Of course, hardware architects! Long live Moore’s Law Spend some transistors on productivity issues Architectural Support for Enhancing Productivity for language features for bug avoidance for debugging for performance feedback and so on… 9/18/2018 Wisconsin Multifacet Project

Seriously, Who should solve it? HW Architects or SW Engineers? ‘software crisis’ in the past too… Why HW architects? More bang for the buck (Economic) Software/IT (1,152 billion) vs Hardware (138 billion) [Wen Mei Hwu, Micro-39 Keynote] SW cannot do it alone (Technical) Decades of automatic parallelization efforts Virtual Memory, Tagged Memory for LISP-like languages “We must now reconsider the balance of hardware and software and to provide more specialized function in hardware than we have previously, in order to drastically simplify the programming process” Edward A. Feustel, IEEE TOC, July 1973 in support of Tagged Memory 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Motivation Supervised Memory Background/Motivation Explore relaxed supervised systems Define Supervised Memory Propose formal models TokenTM StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project

Why Supervised Systems? Synchronization Hardware TM systems Empty/Full-bits [Berry et al 2006] Graph processing algorithms on 4 processor MTA > 64K BG/L Controlled non-determinism Deterministic/Interleaving Constrained Multiprocessing Debugging Log-based architectures Safety Heap checkers, Bounds checkers Language Features Hardware-assisted Garbage Collection 9/18/2018 Wisconsin Multifacet Project

What are Supervised Systems? out-of-band metadata per data block monitor & control (supervise) memory accesses to data execute handlers on specific metadata states pure software possible, but inefficient shadow memory E.g., Valgrind. Mean Slowdown 22X [Nethercote et al., VEE2007] Synthesized definition 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project State-of-the-Art Expect Sequentially-Consistent (SC) hardware Most hardware is not Ad hoc Whither primitives? Informal treatment of memory consistency Ambiguous/Incorrect 9/18/2018 Wisconsin Multifacet Project

Contributions Expect Sequentially-Consistent (SC) hardware Ad hoc Most hardware is not Ad hoc Whither primitives? Informal treatment of memory consistency Ambiguous/Incorrect Explore relaxed supervised systems Define Supervised Memory Propose formal memory models 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Motivation Supervised Memory Background/Motivation Explore relaxed supervised systems Define Supervised Memory Propose formal models TokenTM StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project

TSO-lite: A TSO-compliant system Explore relaxed supervised systems TSO-lite: A TSO-compliant system PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 Processor ST 0x10, C r1 r1 0x01 ST A LD B r2 r2 r3 r3 0x10 Store Buffer Block Data A 0x00 B 0x01 C 0x11 Metadata Memory 9/18/2018 Wisconsin Multifacet Project

Empty/Full-Bits on TSO-lite Explore relaxed supervised systems Empty/Full-Bits on TSO-lite PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 LD ST Exception Processor ST 0x10, C r1 r1 0x01 Empty Full r2 r2 r3 r3 LD Store Buffer I1: NO LOAD BYPASS EXCEPTION LD/ST Block Data A 0x00 B 0x01 C 0x11 Metadata Full None Empty None Memory I2: LATE EXCEPTIONS 9/18/2018 Wisconsin Multifacet Project

Deterministic Shared Memory (DMP) [Devietti et al., ASPLOS 2009] Explore relaxed supervised systems Deterministic Shared Memory (DMP) [Devietti et al., ASPLOS 2009] “depending upon the consistency model of the underlying hardware, threads must perform a memory fence at the edge of a quantum” Insert a fence after the last operation in the quantum Insert a fence before the first shared operation in the quantum I3: Reordered metabit-reads 9/18/2018 Wisconsin Multifacet Project Illustration

Wisconsin Multifacet Project Outline Motivation Supervised Memory Background/Motivation Explore relaxed supervised systems Define Supervised Memory Propose formal models TokenTM StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project

What is Supervised Memory? Define Supervised Memory What is Supervised Memory? Each memory location A, data (A.d) metadata (A.m) New operations Supervised Load (sLD A) Supervised Store (sST A) Jump on reading special metadata (Optionally) Hardware exception 9/18/2018 Wisconsin Multifacet Project

Supervised Operations Define Supervised Memory Supervised Operations sLD A => Start: atomic{ curm = Val[RA.m] // Read metadata nextm = NEXT(Load, curm) // Check software- // specified FSM If nextm == EXCEPTION then Jump to Handler RA.d // Read data If (nextm != curm) then WA.m,nextm // Update metadata } Handler: … 9/18/2018 Wisconsin Multifacet Project

Using Supervised Memory Define Supervised Memory Using Supervised Memory Software assigns semantics to metadata Metastates stored as metadata E.g., Initialized, Uninitialized Metastate transition function (NEXT) Use supervised operations to monitor/control data operations E.g., catch read access to uninitialized data 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Motivation Supervised Memory Background/Motivation Explore relaxed supervised systems Define Supervised Memory Propose formal models TokenTM StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project

TSO Axioms [Hangal et al., ISCA 2004] Propose formal models TSO Axioms [Hangal et al., ISCA 2004] 9/18/2018 Wisconsin Multifacet Project

TSO Axioms [Hangal et al., ISCA 2004] Propose formal models TSO Axioms [Hangal et al., ISCA 2004] Axiom Description Order Total Order on all write accesses Atomicity No intervening accesses for atomic operations Termination All write accesses eventually complete Value Reads return latest value from memory or store buffer Memory Barrier No reordering across a barrier ReadAny Accesses cannot pass outstanding reads WriteWrite Write access cannot pass outstanding writes Reordering Axioms Rd A Rd B Rd A Wr B Wr A Wr B Wr A Rd B Allows store buffers 9/18/2018 Wisconsin Multifacet Project

TSOall: A Consistency Model for Supervised Memory Propose formal models TSOall: A Consistency Model for Supervised Memory TSO axioms applied to all accesses—data and metadata + (Simple) Like TSO — (Slow) Prohibits optimizations Thread: sST A sLD B => Store buffers ineffective Tension Ease of Reasoning vs Performance ->[Rd A.m, Wr A.d, Wr A.m] ->[Rd B.m, Rd B.d] 9/18/2018 Wisconsin Multifacet Project

Blast from the Past [Adve and Hill, ISCA1990] Propose formal models Blast from the Past [Adve and Hill, ISCA1990] Ease of Reasoning (SC) vs Performance (RC) Observation: Simple programs rely only on certain SC orders Ignore non-essential orders. Still appears as SC Challenge: Simple? Non-essential orders? Solution: Data-race-freedom For data-race-free programs, RC = SC 9/18/2018 Wisconsin Multifacet Project

Safe Supervision Motivation Propose formal models Safe Supervision Motivation Ease of Reasoning (TSOall) vs Performance (?) Observation: Simple supervised programs rely only on certain TSOall orders Ignore non-essential orders. Still appears as TSOall Challenge: Simple? Non-essential orders? Solution: Safe Supervision For safely supervised programs, ? = TSOall 9/18/2018 Wisconsin Multifacet Project Examples

Wisconsin Multifacet Project Safe Supervision metadata accesses to location A not used to order operations to a different location B Most uses of supervision are safely supervised. E.g., Heap Checker: Initialized/Uninitialized values Transactional Memory: Conflict Detection information Initially, A.m = Empty, B.d = 0 Thread 1: B.d = 1 A.m = Full Thread 2: While (A.m == Empty); Read B.d 9/18/2018 Wisconsin Multifacet Project Definition

TSOdata: Fast Yet Simple Propose formal models TSOdata: Fast Yet Simple Axiom Description Order Total Order on all write accesses Atomicity No intervening accesses for atomic operations Termination All write accesses eventually complete Value Reads return latest value from memory or store buffer Memory Barrier No reordering across a barrier ReadAny Data accesses cannot pass outstanding data reads WriteWrite Data writes cannot pass outstanding data writes Reordering Axioms ->[Rd A.m, Wr A.d, Wr A.m] Thread: sST B sLDA Store buffers can be used ->[Rd B.m, Rd B.d] For safely supervised programs, TSOdata = TSOall 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project TSOdata on OpenSPARC T2 Goal: Explore low-level issues on a real design Late Exceptions with deferred handlers Dump store buffer entries on exception Enhance store buffer to carry Virtual Address (VA) ~200 cycles to read out 4 entries Disable store buffer bypassing for supervised loads Low space overhead for adding metabits (~4%) 9/18/2018 Wisconsin Multifacet Project

Supervised Memory Summary Expects Sequentially-Consistent (SC) hardware Most hardware is not Ad hoc Whither primitives? Informal treatment of memory consistency Ambiguous/Incorrect Explore relaxed memory systems Define Supervised Memory Propose formal memory models 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Motivation Supervised Memory TokenTM [ISCA 2008] StealthTest Conclusion Longer Version 9/18/2018 Wisconsin Multifacet Project

LogTM: Log-based Transactional Memory 9/18/2018 TokenTM Summary Current Hardware TMs Most Transactions Small & Short Running Penalize large/long transactions Too restrictive for wide-spread TM use? Hypothesis Must Support Efficient Large/Long Transactions As Well Is such an HTM even possible? Yes! TokenTM 1. LogTM’s Log to buffer unbounded values 2. Transactional Tokens for unbounded conflict detection Conflict state in memory metabits 9/18/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

Wisconsin Multifacet Project Transactional Tokens Challenge: How to efficiently track Read/Write sets? Token Coherence [Martin03] Read/Write sets for cache coherence Solution: Transactional Tokens T tokens per memory block At least one token to read, All T tokens to write (token conflict detection) Token Metadata <c0,c1,…,ci,…> where 0≤ci≤T is count of tokens held by thread with TID i. 9/18/2018 Wisconsin Multifacet Project

Tokens and Supervised Memory Challenge: Where to store Unbounded, Globally Accessible Token Metadata? unbounded and globally accessible Solution Supervised Memory’s Metadata Piggyback on existing Virtual Memory and Cache Coherence mechanisms Skip Animation 9/18/2018 Wisconsin Multifacet Project

TokenTM: a Large-Transaction TM New Conflict Detection Mechanism Transactional Tokens in Supervised Memory Token Coherence [Martin03] at different level Version Management Save old/new values for unbounded Write set LogTM [Moore06] undo log 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Motivation Supervised Memory TokenTM StealthTest [PACT 2009] Conclusion 9/18/2018 Wisconsin Multifacet Project

StealthTest Summary (1/2) The Problem: fork Overhead LogTM: Log-based Transactional Memory 9/18/2018 StealthTest Summary (1/2) The Problem: fork Overhead Software testing hard Multithreading makes harder Online software testing can help Run tests on deployed software E.g., Delta Execution for patch testing [Tucek et al., ASPLOS 2009] Non-intrusive mechanisms fork (existing) Low Overhead Functionally Hidden Good Scaling fork 9/18/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

StealthTest Summary (2/2) Solution: TM for testing Leverage Transactional Memory for online testing Non-Intrusive? transaction { test(); abort} Fast TM mechanisms Low Overhead Functionally Hidden Good Scaling Demonstrate two uses Delta Execution In vivo Testing StealthTest 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Motivation Supervised Memory TokenTM StealthTest Online Software Testing E.g., Patch Validation StealthTest: TM for online testing Delta Execution using StealthTest In vivo Testing using StealthTest (Optionally) Conclusion 9/18/2018 Wisconsin Multifacet Project

Online Patch Validation Bug fixes can introduce more bugs Patches must be validated Online Validation [Nagaraja et al., OSDI 2004] Increased resource usage Lockstep execution Output Production Input Testing Diff 9/18/2018 Wisconsin Multifacet Project

Delta Execution [Tucek et al., ASPLOS 2009] Online Patch Validation Most patches are small Patched and Un-patched executions similar Delta Execution Run together except when they differ Prior Work Delta Execution Increased Resource Usage O P Lockstep Execution 9/18/2018 Wisconsin Multifacet Project

Delta Execution using fork Install D data Patched execution Testing Production fork Compute D data Isolate D data Merged execution Unpatched execution Time 9/18/2018 Wisconsin Multifacet Project

Multi-threading and fork ‘Park’ all other threads Install D data Patched execution Testing Production fork Compute D data Isolate D data Unpatched execution Merged execution Time Stop all threads to get a consistent memory snapshot 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project fork Poor Performance ~9.8ms for split/~106ms for merge [Tucek et al, ASPLOS 2009] Poor Scalability Web-server response rate reduced by 43% Want an alternate mechanism 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Motivation Supervised Memory TokenTM StealthTest Online Software Testing E.g., Patch Validation StealthTest: TM for online testing Delta Execution using StealthTest In vivo Testing using StealthTest (Optionally) Conclusion 9/18/2018 Wisconsin Multifacet Project

Delta Execution using StealthTest Isolate patched execution Introspect patched execution Monitor delta data access Delta Execution StealthTest Transactional Memory transaction{…} Version Management Tracks new/old values Conflict Detection Monitor accesses Execute on child process Page diffing mprotect fork 9/18/2018 Wisconsin Multifacet Project

StealthTest Interface Isolate patched execution Introspect patched execution Monitor delta data access Delta Execution ST_begin_transaction ST_abort_transaction ST_get_old ST_get_new ST_protect_set ST_protect_clear StealthTest Transactional Memory transaction{…} Version Management Tracks new/old values Conflict Detection Monitor accesses 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Requirements from TM Strong Atomicity [Martin et al., CAL 2006] Transactions isolated from non-transactions => Test transactions isolated from application code Flexible Conflict Resolution Can abort transactions if necessary => Abort tests if they block application Communication from within transactions => Expose result of a test 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Motivation Supervised Memory TokenTM StealthTest Online Software Testing E.g., Patch Validation StealthTest: TM for online testing Delta Execution using StealthTest In vivo Testing using StealthTest (Optionally) Conclusion 9/18/2018 Wisconsin Multifacet Project

Delta Execution using StealthTest Install D data Patched execution Testing fork Production fork Compute and Isolate D data Merged execution Unpatched execution transaction Compute and Isolate D data Patched execution Introspect and rollback Unpatched execution Install D data Merged execution StealthTest Production ST_get_new ST_get_old ST_begin_tran… ST_abort_tran… ST_protect_set 9/18/2018 Wisconsin Multifacet Project

Multi-threaded Delta Execution Install D data Patched execution Testing fork Production fork Compute and Isolate D data Merged execution Unpatched execution transaction Compute and Isolate D data Patched execution Introspect and rollback Unpatched execution Install D data Merged execution StealthTest Original ST_get_new ST_get_old ST_begin_tran… ST_abort_tran… ST_protect_set 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Evaluation (1) Effective? (2) Non-intrusive? Workloads Collection of multi-threaded server apps Same as Tucek et al., ASPLOS 2009 Pin-based TM Emulation 2-way SMP with 2.4GHz Pentium 4 CPUs and 2.5GB RAM 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project (1) Effective? Program Description Patch Patch Verified? fork StealthTest Crafty Chess App Code refactoring P Raytrace Raytracer Result reporting fix Tar Archive Util Incremental archiving fix Apache1 Web Server Buffer overflow fix Apache2 DNSCache DNS Cache Behavior Change MySQL5.0 DB Server Extra permission checks O OpenSSL Security Lib Added bug in TLS handling Squid Web Cache ATPhttpd Works Memory allocation sockets 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project (2) Non-intrusive? Program Description fork ForkOverhead(%) PatchDuration(%) Crafty Chess App 0.1 <0.1 Raytrace Raytracer 0.2 0.5 Tar Archive Util 41 7.3 Apache1 Web Server 2.8 Apache2 12 DNSCache DNS Cache 65 MySQL5.0 DB Server 4.7 5.0 OpenSSL Security Lib Squid Web Cache 2.9 ATPhttpd 0.8 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Motivation Supervised Memory TokenTM StealthTest Online Software Testing E.g., Patch Validation StealthTest: TM for online testing Delta Execution using StealthTest In vivo Testing using StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project StealthTest Summary Software testing hard Online software testing can help Existing mechanisms inadequate StealthTest leverages TM for non-intrusive online testing Demonstrate two uses Delta Execution In vivo Testing Low Overhead Functionally Hidden Good Scaling StealthTest 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Outline Motivation Supervised Memory TokenTM StealthTest Conclusion 9/18/2018 Wisconsin Multifacet Project

Contribution 1: Supervised Memory [Under Submission] Supervised Systems – Useful, Renewed interest Problem SC only, while most systems are not Ad hoc hardware, specific to a supervised system No Formalism, leads to ambiguity/incorrectness Contributions Explore non-SC systems General model for supervision: Supervised Memory Formal Specification 9/18/2018 Wisconsin Multifacet Project

Contribution 2: TokenTM [Bobba et al., ISCA 2008] Transactional Memory, a supervised system Problem “Most transactions are small”, Self-fulfilling assumption Penalize large/long transactions Too restrictive for wide-spread TM use? Contributions TokenTM First HTM to support efficient large/long transactions as well Follow-up: Purdue’s LiteTM [Jafri et al., HPCA 2010] 9/18/2018 Wisconsin Multifacet Project

Contribution 3: StealthTest [Bobba et al., PACT 2009] Using transactional memory for testing Problem Existing fork-based mechanisms High overhead Poor scalability Contributions StealthTest, low-overhead interface for online testing Two StealthTest-based testing frameworks 9/18/2018 Wisconsin Multifacet Project

Other Research and Contributions Performance Pathologies Bobba et al., ISCA 2007 Bobba et al., IEEE Micro Top Picks Jan 2008 LogTM-SE Yen et al., HPCA 2007 Nested LogTM Moravan et al., ASPLOS 2006 LogTM Moore et al., HPCA 2006 GEMS LogTM-SE Implementation Development, Release and Support 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Acknowledgments Advisors Mark Hill, David Wood Mike Swift, Ben Liblit, Shan Lu, Karu Sankaralingam Mikko Lipasti, Jeffrey Naughton Co-authors Kevin Moore, Luke Yen, Haris Volos, Michelle Moravan, Weiwei Xiong, Neelam Goyal Colleagues Alaa Alameldeen, Arkaprava Basu, Brad Beckmann, Polina Dudnik, Dan Gibson, Mike Marty, Somayeh Sardashti, Rathijit Sen, Cong Wang, Yasuko Watanabe, Min Xu Matt Allen, Piramanayagam Arumuga Nainar, Siddharth Barman, Koushik Chakraborthy, Venkat Govindraju, Amit Kumar, Srinath Sridharan, Philip Wells 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Key Contributions Applications Supervised Memory Systems Hardware Software TSOdata and Safe Supervision Tools TokenTM StealthTest 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Backup 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project CPU Trends From “The Free Lunch Is Over” by Herb Sutter 9/18/2018 Wisconsin Multifacet Project

The ‘Re-Birth’ of Parallel Programming Sequential Computers Memory wall, Power wall etc. Attack of the killer CMPs* General-purpose parallel computers How to program? Expose parallelism to software “The Free Lunch is Over” – Herb Sutter, 2005 * Adapted from “Attack of the killer micros” by Eugene Brooks 9/18/2018 Wisconsin Multifacet Project

Parallel Programming is Hard (currently) Hard for programmers Correctness Synchronization, Data races, Atomicity violations Performance Communication, Scheduling, Load-Balancing, Critical Path Hard for tools Compilers, Static Analysis Intractable/Inefficient 9/18/2018 Wisconsin Multifacet Project

Houston, We Have a Problem! Who should solve this problem? Yannis’s Law: Programmer Productivity doubles every 6 years http://ix.cs.uoregon.edu/~yannis/law.html Proebsting's Law: Compiler Advances Double Computing Power Every 18 Years http://research.microsoft.com/en-us/um/people/toddpro/papers/law.htm 9/18/2018 Wisconsin Multifacet Project

Parallel Algorithms vs Moore’s Law “Improvement resulting from … algorithmic speedup is comparable to that resulting from from the hardware speedup due to Moore’s Law over the same length of time” David E. Keyes “A Science-Based Case for Large-Scale Simulation”, July 2003. 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project In the “Landscape of Parallel Computing Research…” [Asanovic et al., 2006] 9/18/2018 Wisconsin Multifacet Project

Why not Tagged Memory? [Gehringer and Keedy, CAN 1985] Type information in tags Arguments do not apply to dynamically-typed languages like Lisp For other languages, Simpler but more specialized designs Compilers improved to make the proposals moot 9/18/2018 Wisconsin Multifacet Project

Existing proposals assume SC Explore relaxed memory systems Existing proposals assume SC Assume SC or don’t deal with multiprocessors Proposal Base Architecture Implementation WWT MIPS SC Tapeworm LogTM SPARC OneTM Informing Memory MIPS, Alpha SafeMem x86 MemTracker DMP 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project DMP Correctness 9/18/2018 Wisconsin Multifacet Project

Non-TSOall Executions 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Propose formal models TSOdata is Complex Empty/full-bits sST Initial State: A.d = 0, A.m = None B.d = 0, B.m = Empty Empty Full T0: dST 1, A sLD B T1: sST B, 1 dLD A sLD sST Can dLD A return 0? Exception 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Safe Supervision 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project 9/18/2018 Wisconsin Multifacet Project

TokenTM Logical Operation Thread X Thread Y PC BEGIN_XACT Undo Log Undo Log PC BEGIN_XACT Load A Load A Store B ABORT Store A COMMIT_XACT COMMIT_XACT Shared Memory Block Data A 0x..00.. B C 0x..10.. Metadata <cx, cy, …> <0,0,…> Insufficient tokens <0,0,…> <1,1,…> <1,0,…> B: 0x..00.. 0x..11.. 0x..00.. <0,0,…> <T,0,…> 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Existing HTM Systems Assumption: Most transactions small & short running Optimized for small transactions Degrade with large, long running transactions Non-localized Overhead, E.g., LogTM-SE [Yen07] false conflicts OneTM [Blundel07] serializes Complex, Expensive Operations, E.g., XTM [Chung06]& PTM [Chuang06] manipulate page tables Premature Optimization? 9/18/2018 Wisconsin Multifacet Project

Why Large Transactions? LogTM: Log-based Transactional Memory 9/18/2018 Why Large Transactions? Programmers may want large (>>cache) and/or long (>> ctx switch) transactions HLL transactions invoke unpredictable lower-level code Replace critical sections containing syscalls or I/O Avoid concurrency bugs [Lu08] But “Most transactions small & short running” Restrict TM to use by gurus (like OS spin locks)? Self fulfilling prophesy? Must Support Efficient Large/Long Transactions As Well 9/18/2018 Wisconsin Multifacet Project UW-Madison Architecture Seminar

Toward a Large-Transaction TM Efficiently detect conflicts between in-flight transactions using Read/Write Sets Unbounded Globally accessible Fast read/write set ops. E.g., Add to read set Clear read set Small Transactions: Low Overhead Large Transactions: Localized Overhead Accessible read/write set (potentially unbounded) N Minimal Changes to Coherence / VM Heavyweight eviction ops Negative acks Additional page tables O 9/18/2018 Wisconsin Multifacet Project

Existing Mechanisms × ×  Synergy between cache coherence and conflict detection Hence, overload cache coherence + Excellent for bounded/small TM But, - ‘Virtualization’ on overflows - Tough to access ‘virtualized’ state Small Transactions: Low Overhead  Minimal Changes to Coherence / VM × Large Transactions: Localized Overhead × 9/18/2018 Wisconsin Multifacet Project

TokenTM: a Large-Transaction TM New Conflict Detection Mechanism Transactional Tokens in Tagged Memory Token Coherence [Martin03] at different level Version Management Save old/new values for unbounded Write set LogTM [Moore06] undo log This Talk 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Transactional Tokens Challenge: How to efficiently track Read/Write sets? Token Coherence [Martin03] Read/Write sets for cache coherence Solution: Transactional Tokens T tokens per memory block At least one token to read, All T tokens to write (token conflict detection) Token Metadata <c0,c1,…,ci,…> where 0≤ci≤T is count of tokens held by thread with TID i. 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Tagged Memory Challenge: Where to store Unbounded, Globally Accessible Token Metadata? Virtual Memory unbounded and globally accessible Solution, similar to OneTM [Blundel07] Tag Virtual Memory Piggyback on existing Virtual Memory and Cache Coherence mechanisms 9/18/2018 Wisconsin Multifacet Project

TokenTM Logical Operation Thread X Thread Y PC BEGIN_XACT Undo Log Undo Log PC BEGIN_XACT Load A Load A Store B ABORT Store A COMMIT_XACT COMMIT_XACT Shared Memory Block Data A 0x..00.. B C 0x..10.. Metadata <cx, cy, …> <0,0,…> Insufficient tokens <0,0,…> <1,1,…> <1,0,…> B: 0x..00.. 0x..11.. 0x..00.. <0,0,…> <T,0,…> 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Storing Metadata Unbounded Difficult to access globally Thread X Thread Y PC PC BEGIN_XACT BEGIN_XACT Undo Log Token log Undo Log Token log Load A Load A Cx CY Store B Store A COMMIT_XACT COMMIT_XACT Software Tagged Memory Hardware Block Data A 0x..00.. B C 0x..10.. Metadata <cx, cy, …> <0,0,…> Metastate (Sum, TID) (0, -) Concise Accessible Lossy Summary 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Hardware Metastate Metadata summary (sum, TID) sum, total number of tokens acquired TID, identify owner when sum = 1 or sum = T (optional) Some summaries, Concise -> Stored in packed field (e.g., State[1:2] , Attr[3:16]) Fast -> Accessed as part of normal memory operation <c0, c1, …, ci, …> (sum, TID) <0, 0, 0, 0> (0, -) <0, 0, 1, 0> (1, 2) <0, T, 0, 0> (T, 2) <0, 1, 1, 1> (3, -) 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Token Logs Distributed structures for unbounded Read/Write sets per-thread stored in program memory (e.g., heap) list of <address, num_tokens> Accessible to hardware for fast ops Add to read set -> Append to token log Token log A: 1 B: T 9/18/2018 Wisconsin Multifacet Project

Double-entry Bookkeeping (Keeping Metadata Consistent) Thread X Thread Y PC PC BEGIN_XACT Token log Token log BEGIN_XACT Logical Token State Load A Load A Store B A: 1 A: 1 Store A Metadata <cx, cy, …> COMMIT_XACT COMMIT_XACT Software <1,0,…> <1,1,…> <0,0,…> Hardware <0,0,…> Block Metastate (Sum, TID) A B C <0,0,…> (2, -) (0, -) (1, X) (0, -) (0, -) 9/18/2018 Wisconsin Multifacet Project

Implementing Hardware Metastate Thread X Thread Y BEGIN_XACT Token log Token log BEGIN_XACT Load A Load A Store B A: 1 Store A COMMIT_XACT COMMIT_XACT Software Load A Load A Coherence State Coherence State Hardware Tag Data Tag Data Sum TID Sum TID Private Caches A 1 X - Modified Exclusive Owned 0x..00.. 0x..00.. 1, X A Shared 0x..00.. 1 X Data A DATA A Fwd_GETS A Metastate (Sum, TID) (0,0) GETS A GETS A Upgrade A Block Directory Data Sum TID 0, - Main Memory A Exclusive @ P1 Not Present Shared @ P1,P2 0x..00.. 0x..00.. - Shared copies cannot update metastate Solution: Fission / Fusion 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Metastate Fission Thread X Thread Y BEGIN_XACT Token log Token log BEGIN_XACT Load A Load A A: 1 A: 1 Store B Store A COMMIT_XACT COMMIT_XACT Software Hardware 1,X fission Load A Coherence State Coherence State Tag Data Sum TID Tag Data Sum TID Private Caches 1,X 0,- A Owned Modified 0x..00.. 1 X A Shared 0x..00.. 1 - Y 0x..00.. Data A GETS A Fwd_GETS A Block Directory Data Sum TID Main Memory A Exclusive @ P1 Shared @ P1,P2 0x..00.. 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Metastate Fusion Metastate Fusion On store, metastate copies fused back Why does fission/fusion work? Store sees ‘complete’ metastate Load sees ‘complete’ metastate, if writer exists ‘partial’ metastate, otherwise 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Hardware Cost Additional metabits in caches/memory Recoded ECC to cull metabits Changes to coherence protocols Additional payload on messages Minimal changes to protocol logic Requires non-silent eviction 9/18/2018 Wisconsin Multifacet Project

Evaluation Methodology Full System Simulation Multifacet GEMS Base System 32-core CMP system, in-order, single-issue cores Private 4-way 32KB writeback split I&D L1 caches Shared 8-way 8 MB writeback L2 On-chip directory @ L2, MESI coherence Packet-switched interconnect in a tiled topology 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project TM Systems LogTM-SE [Yen07] variant Parallel Bloom Filters for conflict detection 4 2Kbit H3 filters + Compact, less hardware overhead - False Conflicts LogTM-SE_Perfect + No False Conflicts - Unimplementable TokenTM 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Results Large Transactions: Localized Overhead Minor degradation with large transactions Comparable on small transactions Small Transactions: Low Overhead 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project 9/18/2018 Wisconsin Multifacet Project

In vivo Testing [Murphy et al. TR 2007, Chu et al. ICST 2008] Run unit tests on deployed software + More testing + More realistic Catch bugs early 9/18/2018 Wisconsin Multifacet Project

In vivo Testing using StealthTest ST_begin_transaction(); try { test(); ST_begin_escape(); fprintf(log, “…”, success); ST_end_escape(); } catch/except() { fprintf(log, “…”, fail); } ST_abort_transaction(NO_RETRY); 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Evaluation Workloads Bugbench Server Workloads STAMP Transactional Memory benchmarks Implementation Intel STM Language-Based TM TL2 STM Library-Based TM Quad-core workstation with RHEL5 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project (1) Effective? Built on Intel STM. Run tests on Bugbench applications Program Description Size (LOC) Bug Type Error Detected? NCOM file compress 1.9K Stack Smash Yes POLY file “unixier” 0.7K GZIP 8.2K Buffer Overflow MAN documentation 4.7K BC calculator 17.0K HTPD1 web server 224K Atomicity SQUD proxy cache 93.5K Possible CVS version control 114.5K Double Free MSQL2 DBMS 514K MSQL3 1028K Works Unsupported Library Calls 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project (2) Non-intrusive? Built on TL2 STM. Run tests on STAMP applications (1000 tests per min) 9/18/2018 Wisconsin Multifacet Project

Atomicity Violation Bugs? 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Degree-2 Transactions Isolate only writes. Implementation Reads in escape action Early Release Add new type of transaction to TM 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project StealthTest Wish List Hardware Support System Calls within Transactions Interaction between Locks and Transactions 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project 9/18/2018 Wisconsin Multifacet Project

Wisconsin Multifacet Project Related Work Binary Translation (SPROCKETS) Code Emulation (STEM) TLS (Oplinger&LAM, PathExpander) 9/18/2018 Wisconsin Multifacet Project

In vivo Testing Motivation Ordering Bug in MySQL In vivo Test: Data Consistency Checks Buggy Code (In mysys/thr_lock.c): void thr_lock_delete(THR_LOCK *lock) { … pthread_mutex_destroy(&lock->mutex); list_delete(thr_lock_thread_list, &lock->list); } 9/18/2018 Wisconsin Multifacet Project