Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University.

Slides:



Advertisements
Similar presentations
TRAMP Workshop Some Challenges Facing Transactional Memory Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign.
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Operating System Transactions
OPERATING SYSTEMS SHOULD PROVIDE TRANSACTIONS Donald E. Porter and Emmett Witchel The University of Texas at Austin.
OPERATING SYSTEM TRANSACTIONS Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel The University of Texas at.
Operating Systems Part III: Process Management (Process Synchronization)
Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based.
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
TxLinux: Using and Managing Hardware Transactional Memory in an Operating System Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan,
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
CS533 - Concepts of Operating Systems 1 CS533 Concepts of Operating Systems Class 8 Synchronization on Multiprocessors.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
G Robert Grimm New York University Scheduler Activations.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
KAUSHIK LAKSHMINARAYANAN MICHAEL ROZYCZKO VIVEK SESHADRI Transactional Memory: Hybrid Hardware/Software Approaches.
Programming Paradigms for Concurrency Part 2: Transactional Memories Vasu Singh
The Linux Kernel: A Challenging Workload for Transactional Memory Hany E. Ramadan Christopher J. Rossbach Emmett Witchel Operating Systems & Architecture.
Maximum Benefit from a Minimal HTM Owen Hofmann, Chris Rossbach, and Emmett Witchel The University of Texas at Austin.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Copyright ©: University of Illinois CS 241 Staff1 Threads Systems Concepts.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Discussion Week 2 TA: Kyle Dewey. Overview Concurrency Process level Thread level MIPS - switch.s Project #1.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
Hardware and Software transactional memory and usages in MRE
Hany E. Ramadan and Emmett Witchel University of Texas at Austin TRANSACT 2/15/2009 The Xfork in the Road to Coordinated Sibling Transactions.
MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
CSCI1600: Embedded and Real Time Software Lecture 17: Concurrent Programming Steven Reiss, Fall 2015.
Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
Where Testing Fails …. Problem Areas Stack Overflow Race Conditions Deadlock Timing Reentrancy.
A BORTING C ONFLICTING T RANSACTIONS IN AN STM PP O PP’09 2/17/2009 Hany Ramadan, Indrajit Roy, Emmett Witchel University of Texas at Austin Maurice Herlihy.
Novel Paradigms of Parallel Programming Prof. Smruti R. Sarangi IIT Delhi.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Kernel Synchronization David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
Adaptive Software Lock Elision
Lecture 20: Consistency Models, TM
Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
PHyTM: Persistent Hybrid Transactional Memory
Atomic Operations in Hardware
Atomic Operations in Hardware
Changing thread semantics
Lecture 6: Transactions
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E
COT 5611 Operating Systems Design Principles Spring 2012
Lecture 22: Consistency Models, TM
Lecture 2 Part 2 Process Synchronization
Software Transactional Memory Should Not be Obstruction-Free
CSCI1600: Embedded and Real Time Software
CSCI1600: Embedded and Real Time Software
Lecture: Consistency Models, TM
Presentation transcript:

Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University of Texas at Austin

Intro Processors now scaling via cores, not clock rate 8 cores now, 64 tomorrow, 4096 next week? Parallel programming increasingly important Software must keep up with hardware advances But parallel programming is really hard Deadlock, priority inversion, data races, etc. STM is here We would like HTM

Difficult HTM Problems Enforcing atomicity and isolation requires conflict detection and rollback TM Hardware only applies to memory and processor state I/O, System calls may have effects that are not isolated, cannot be rolled back mov $norad, %eax mov $canada, %ebx launchmissiles %eax, %ebx

Outline Handling kernel I/O with minimal hardware User-level system call rollback Conclusions

Outline Handling kernel I/O with minimal hardware User-level system call rollback Conclusions

cxspinlocks Cooperative transactional spinlocks allow kernel to take advantage of limited TM hardware Optimistically execute with transactions Fall back on locking when hardware TM is not enough I/O, page table operations overflow? Correctness provided by isolation of lock variable Transactional threads read lock Non-transactional threads write lock

cxspinlock guarantees Multiple transactional threads in critical region Non-transactional thread excludes all others

cxspinlocks in action Thread 1 cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2 cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite

cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA Transactional threads read unlocked lock variable

cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA Transactional threads read unlocked lock variable

cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA Transactional threads read unlocked lock variable

cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite

cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite

cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA

cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA

cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA

cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA Hardware restarts transactions for I/O

cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite lockA

cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite lockA contention managed CAS

cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite lockA contention managed CAS

cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite contention managed CAS

cxspinlocks in action Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite locked lockA: readwrite

cxspinlocks in action Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite locked lockA: readwrite

cxspinlocks in action Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite locked lockA: readwrite

cxspinlocks in action Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite locked lockA: readwrite conditionally add variable to read set

cxspinlocks in action Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite locked lockA: readwrite conditionally add variable to read set

cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite conditionally add variable to read set

cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite lockA

cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite lockA

cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite

Implementing cxspinlocks Return codes: Correctness Hardware returns status code from xbegin to indicate when hardware has failed (I/O) xtest: Performance Conditionally add a memory cell (e.g. lock variable) to the read set based on its value xcas: Fairness Contention managed CAS Non-transactional threads can wait for transactional threads Simple hardware primitives support complicated behaviors without implementing them

Outline Handling kernel I/O with minimal hardware User-level system call rollback Conclusions

User-level system call rollback Open nesting requires user-level syscall rollback Many calls have clear inverses mmap, munmap Even simple calls have many side effects e.g. file write Even simple calls might be irreversible

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

Kernel participation required Users can’t track all syscall side effects Must also track all non-tx syscalls Kernel must manage transactional syscalls Kernel enhances user-level programming model Seamless transactional system calls Strong isolation for system calls?

Related Work Other I/O solutions Hardware open nesting [Moravan 06] Unrestricted transactions [Blundell 07] Transactional Locks Speculative lock elision [Rajwar 01] Transparently Reconciling Transactions & Locks [Welc 06] Simple hardware models Hybrid TM [Damron 06, Shriraman 07] Hardware-accelerated STM [Saha 2006]

Conclusions Kernel can use even minimal hardware TM designs May not get full programming model Simple primitives support complex behavior User-level programs can’t roll back system calls Kernel must participate