Design and Implementation Issues for Atomicity

Slides:

Advertisements

Similar presentations

Declarative Programming Languages for Multicore Architecures Workshop 15 January 2006.

Advertisements

Design and Implementation Issues for Atomicity Dan Grossman University of Washington Workshop on Declarative Programming Languages for Multicore Architectures.

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.

Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.

Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.

TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.

Submitted by: Omer & Ofer Kiselov Supevised by: Dmitri Perelman Networked Software Systems Lab Department of Electrical Engineering, Technion.

[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.

Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.

EPFL - March 7th, 2008 Interfacing Software Transactional Memory Simplicity vs. Flexibility Vincent Gramoli.

Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.

1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.

Concurrency and Software Transactional Memories Satnam Singh, Microsoft Faculty Summit 2005.

1 Testing Concurrent Programs Why Test?  Eliminate bugs?  Software Engineering vs Computer Science perspectives What properties are we testing for? 

15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.

CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.

Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.

COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.

WG5: Applications & Performance Evaluation Pascal Felber

Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.

Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.

Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.

Kernel Locking Techniques by Robert Love presented by Scott Price.

Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.

CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.

CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.

CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.

AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.

On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)

December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.

Adaptive Software Lock Elision

Software Coherence Management on Non-Coherent-Cache Multicores

Why Events Are A Bad Idea (for high-concurrency servers)

(Nested) Open Memory Transactions in Haskell

Software Perspectives on Transactional Memory

Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun

Transactional Memory : Hardware Proposals Overview

Part 2: Software-Based Approaches

PHyTM: Persistent Hybrid Transactional Memory

Atomic Operations in Hardware

Atomic Operations in Hardware

Faster Data Structures in Transactional Memory using Three Paths

Challenges in Concurrent Computing

Lecture 5: GPU Compute Architecture

Threads and Memory Models Hal Perkins Autumn 2011

Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond

Lecture 5: GPU Compute Architecture for the last time

Foundations of Programming Languages – Course Overview

Dan Grossman University of Washington 18 July 2006

Changing thread semantics

Foundations of Programming Languages – Course Overview

Chapter 10 Transaction Management and Concurrency Control

Threads and Memory Models Hal Perkins Autumn 2009

Shared Memory Programming

Lecture 22: Consistency Models, TM

Hybrid Transactional Memory

CSCI1600: Embedded and Real Time Software

Speculative execution and storage

Kernel Synchronization II

CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization

CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization

CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization

CS333 Intro to Operating Systems

CSCI1600: Embedded and Real Time Software

Programming with Shared Memory Specifying parallelism

CSE 332: Concurrency and Locks

Controlled Interleaving for Transactions

CSE 542: Operating Systems

Presentation transcript:

Design and Implementation Issues for Atomicity Dan Grossman University of Washington Workshop on Declarative Programming Languages for Multicore Architectures 15 January 2006

Atomicity Overview Atomicity: what, why, and why relevant Implementation approaches (hw & sw, me & others) 3 semi-controversial language-design claims 3 semi-controversial language-implementation claims Summary and discussion (experts are lurking)

No deadlock or unfair scheduling (e.g., disabling interrupts) Atomic An easier-to-use and harder-to-implement primitive: withLock: lock->(unit->α)->α let dep acct amt = withLock acct.lk (fun()-> let tmp=acct.bal in acct.bal <- tmp+amt) atomic: (unit->α)->α let dep acct amt = atomic (fun()-> let tmp=acct.bal in acct.bal <- tmp+amt) lock acquire/release (behave as if) no interleaved execution No deadlock or unfair scheduling (e.g., disabling interrupts)

Why better No whole-program locking protocols As code evolves, use atomic with “any data” Instead of “what locks to get” (races) and “in what order” (deadlock) Bad code doesn’t break good atomic blocks: With atomic, “the protocol” is now the runtime’s problem (c.f. garbage collection for memory management) let bad1() = acct.bal <- 123 let bad2() = atomic (fun()->«diverge») let good() = atomic (fun()-> let tmp=acct.bal in acct.bal <- tmp+amt)

threads & shared-memory & parallelism Declarative control For programmers who will see: threads & shared-memory & parallelism atomic directly declares what schedules are allowed (without sacrificing pre-emption and fairness) Moreover, implementations perform better with immutable data, encouraging a functional style

Implementing atomic Two basic approaches: Compute using “shadow memory” then commit Fancy optimistic-concurrency protocols for parallel commits with progress (STMs) [Harris et al. OOPSLA03, PPoPP05, ...] Lock data before access, log changes, rollback and back-off on contention My research focus Key performance issues: locking granularity, avoiding unneeded locking Non-issue: any granularity is correct

An extreme case One extreme: One lock for all data Acquire lock on context-switch-in Release lock only on context-switch-out (after rollback if necessary) Per data-access overhead: Not in atomic In atomic Read none Write logging Ideal on uniprocessors [ICFP05, Manson et al. RTSS05]

lock, maybe rollback, logging In general Naively, locking approach with parallelism looks bad (but note: no communication if already hold lock) Not in atomic In atomic Read lock lock, maybe rollback Write lock, maybe rollback, logging Active research: Hardware: lock = cache-line ownership [Kozyrakis, Rajwar, Leiserson, …] Software (my work-in-progress for Java): Static analysis to avoid locking Dynamic lock coarsening/splitting

Atomicity Overview Atomicity: what, why, and why relevant Implementation approaches (hw & sw, me & others) 3 semi-controversial language-design claims 3 semi-controversial language-implementation claims Summary and discussion

Claim #1 “Strong” atomicity is worth the cost “Weak” says only atomics not interleaved with each other Says nothing about interleaving with non-atomic So: Not in atomic In atomic Read none lock, maybe rollback Write lock, maybe rollback, logging But back to bad synchronization breaking good code! Caveat: Weak=strong if all thread-shared data accessed within atomic (other ways to enforce this)

Adding atomic shouldn’t change “sequential meaning” Claim #2 Adding atomic shouldn’t change “sequential meaning” That is, e and atomic (fun()-> e) should be equivalent in a single-threaded program But it means exceptions must commit, not rollback! Can have “two kinds of exceptions” Caveats: Tough case is “input after output” Not a goal in Haskell (already a separate monad for “transaction variables”)

Nested transactions are worth the cost Claim #3 Nested transactions are worth the cost Allows parallelism within atomic “Participating” threads see uncommitted effects Currently most prototypes (mine included) punt here, but I think many-many-core will drive its need Else programmers will hack up buggy workarounds

Hardware implementations are too low-level and opaque Claim #4 Hardware implementations are too low-level and opaque Extreme case: ISA of “start_atomic” and “end_atomic” Rollback does not require RAM-level rollback! Example: logging a garbage collection Example: rolling back thunk evaluation All I want from hardware: fast conflict detection Caveats: Situation improving fast (we’re talking!) Focus has been on chip design (orthogonal?)

Caveat: unproven; hopefully numbers in a few weeks Claim #5 Simple whole-program optimizations can give strong atomicity for close to the price of weak Lots of data doesn’t need locking: (2/3 of diagram well-known) Not used in atomic Thread local Immutable Caveat: unproven; hopefully numbers in a few weeks

Claim #6 Serialization and locking are key tools for implementing atomicity Particularly in low-contention situations STMs are great too I predict best systems will be hybrids Just as great garbage collectors do some copying, some mark-sweep, and some reference-counting

Summary Strong atomicity is worth the cost Atomic shouldn’t change sequential meaning Nested transactions are worth the cost Hardware is too low-level and opaque Program analysis for “strong for the price of weak” Serialization and locks are key implementation tools Lots omitted: Alternative composition, wait/notify idioms, logging techniques, … www.cs.washington.edu/homes/djg

Plug Relevant workshop before PLDI 2006: TRANSACT: First ACM SIGPLAN Workshop on Languages, Compilers, and Hardware Support for Transactional Computing www.cs.purdue.edu/homes/jv/events/TRANSACT/