Design and Implementation Issues for Atomicity

Slides:



Advertisements
Similar presentations
Declarative Programming Languages for Multicore Architecures Workshop 15 January 2006.
Advertisements

Design and Implementation Issues for Atomicity Dan Grossman University of Washington Workshop on Declarative Programming Languages for Multicore Architectures.
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
Submitted by: Omer & Ofer Kiselov Supevised by: Dmitri Perelman Networked Software Systems Lab Department of Electrical Engineering, Technion.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
EPFL - March 7th, 2008 Interfacing Software Transactional Memory Simplicity vs. Flexibility Vincent Gramoli.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Concurrency and Software Transactional Memories Satnam Singh, Microsoft Faculty Summit 2005.
1 Testing Concurrent Programs Why Test?  Eliminate bugs?  Software Engineering vs Computer Science perspectives What properties are we testing for? 
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
WG5: Applications & Performance Evaluation Pascal Felber
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Adaptive Software Lock Elision
Software Coherence Management on Non-Coherent-Cache Multicores
Why Events Are A Bad Idea (for high-concurrency servers)
(Nested) Open Memory Transactions in Haskell
Software Perspectives on Transactional Memory
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Atomic Operations in Hardware
Atomic Operations in Hardware
Faster Data Structures in Transactional Memory using Three Paths
Challenges in Concurrent Computing
Lecture 5: GPU Compute Architecture
Threads and Memory Models Hal Perkins Autumn 2011
Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond
Lecture 5: GPU Compute Architecture for the last time
Foundations of Programming Languages – Course Overview
Dan Grossman University of Washington 18 July 2006
Changing thread semantics
Foundations of Programming Languages – Course Overview
Chapter 10 Transaction Management and Concurrency Control
Threads and Memory Models Hal Perkins Autumn 2009
Shared Memory Programming
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
CSCI1600: Embedded and Real Time Software
Speculative execution and storage
Kernel Synchronization II
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CS333 Intro to Operating Systems
CSCI1600: Embedded and Real Time Software
Programming with Shared Memory Specifying parallelism
CSE 332: Concurrency and Locks
Controlled Interleaving for Transactions
CSE 542: Operating Systems
Presentation transcript:

Design and Implementation Issues for Atomicity Dan Grossman University of Washington Workshop on Declarative Programming Languages for Multicore Architectures 15 January 2006

Atomicity Overview Atomicity: what, why, and why relevant Implementation approaches (hw & sw, me & others) 3 semi-controversial language-design claims 3 semi-controversial language-implementation claims Summary and discussion (experts are lurking)

No deadlock or unfair scheduling (e.g., disabling interrupts) Atomic An easier-to-use and harder-to-implement primitive: withLock: lock->(unit->α)->α let dep acct amt = withLock acct.lk (fun()-> let tmp=acct.bal in acct.bal <- tmp+amt) atomic: (unit->α)->α let dep acct amt = atomic (fun()-> let tmp=acct.bal in acct.bal <- tmp+amt) lock acquire/release (behave as if) no interleaved execution No deadlock or unfair scheduling (e.g., disabling interrupts)

Why better No whole-program locking protocols As code evolves, use atomic with “any data” Instead of “what locks to get” (races) and “in what order” (deadlock) Bad code doesn’t break good atomic blocks: With atomic, “the protocol” is now the runtime’s problem (c.f. garbage collection for memory management) let bad1() = acct.bal <- 123 let bad2() = atomic (fun()->«diverge») let good() = atomic (fun()-> let tmp=acct.bal in acct.bal <- tmp+amt)

threads & shared-memory & parallelism Declarative control For programmers who will see: threads & shared-memory & parallelism atomic directly declares what schedules are allowed (without sacrificing pre-emption and fairness) Moreover, implementations perform better with immutable data, encouraging a functional style

Implementing atomic Two basic approaches: Compute using “shadow memory” then commit Fancy optimistic-concurrency protocols for parallel commits with progress (STMs) [Harris et al. OOPSLA03, PPoPP05, ...] Lock data before access, log changes, rollback and back-off on contention My research focus Key performance issues: locking granularity, avoiding unneeded locking Non-issue: any granularity is correct

An extreme case One extreme: One lock for all data Acquire lock on context-switch-in Release lock only on context-switch-out (after rollback if necessary) Per data-access overhead: Not in atomic In atomic Read none Write logging Ideal on uniprocessors [ICFP05, Manson et al. RTSS05]

lock, maybe rollback, logging In general Naively, locking approach with parallelism looks bad (but note: no communication if already hold lock) Not in atomic In atomic Read lock lock, maybe rollback Write lock, maybe rollback, logging Active research: Hardware: lock = cache-line ownership [Kozyrakis, Rajwar, Leiserson, …] Software (my work-in-progress for Java): Static analysis to avoid locking Dynamic lock coarsening/splitting

Atomicity Overview Atomicity: what, why, and why relevant Implementation approaches (hw & sw, me & others) 3 semi-controversial language-design claims 3 semi-controversial language-implementation claims Summary and discussion

Claim #1 “Strong” atomicity is worth the cost “Weak” says only atomics not interleaved with each other Says nothing about interleaving with non-atomic So: Not in atomic In atomic Read none lock, maybe rollback Write lock, maybe rollback, logging But back to bad synchronization breaking good code! Caveat: Weak=strong if all thread-shared data accessed within atomic (other ways to enforce this)

Adding atomic shouldn’t change “sequential meaning” Claim #2 Adding atomic shouldn’t change “sequential meaning” That is, e and atomic (fun()-> e) should be equivalent in a single-threaded program But it means exceptions must commit, not rollback! Can have “two kinds of exceptions” Caveats: Tough case is “input after output” Not a goal in Haskell (already a separate monad for “transaction variables”)

Nested transactions are worth the cost Claim #3 Nested transactions are worth the cost Allows parallelism within atomic “Participating” threads see uncommitted effects Currently most prototypes (mine included) punt here, but I think many-many-core will drive its need Else programmers will hack up buggy workarounds

Hardware implementations are too low-level and opaque Claim #4 Hardware implementations are too low-level and opaque Extreme case: ISA of “start_atomic” and “end_atomic” Rollback does not require RAM-level rollback! Example: logging a garbage collection Example: rolling back thunk evaluation All I want from hardware: fast conflict detection Caveats: Situation improving fast (we’re talking!) Focus has been on chip design (orthogonal?)

Caveat: unproven; hopefully numbers in a few weeks Claim #5 Simple whole-program optimizations can give strong atomicity for close to the price of weak Lots of data doesn’t need locking: (2/3 of diagram well-known) Not used in atomic Thread local Immutable Caveat: unproven; hopefully numbers in a few weeks

Claim #6 Serialization and locking are key tools for implementing atomicity Particularly in low-contention situations STMs are great too I predict best systems will be hybrids Just as great garbage collectors do some copying, some mark-sweep, and some reference-counting

Summary Strong atomicity is worth the cost Atomic shouldn’t change sequential meaning Nested transactions are worth the cost Hardware is too low-level and opaque Program analysis for “strong for the price of weak” Serialization and locks are key implementation tools Lots omitted: Alternative composition, wait/notify idioms, logging techniques, … www.cs.washington.edu/homes/djg

Plug Relevant workshop before PLDI 2006: TRANSACT: First ACM SIGPLAN Workshop on Languages, Compilers, and Hardware Support for Transactional Computing www.cs.purdue.edu/homes/jv/events/TRANSACT/